Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.
Dataset
This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.
This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.
For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
✅ Step 1: Mount to Dataset
Search for my dataset pytorch-models and add it — this will mount it at:
/kaggle/input/pytorch-models/
✅ Step 2: Check file paths Once mounted, the four files will be available at:
/kaggle/input/pytorch-models/base_models.py
/kaggle/input/pytorch-models/ext_base_models.py
/kaggle/input/pytorch-models/ext_hybrid_models.py
/kaggle/input/pytorch-models/hybrid_models.py
✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):
import shutil
shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')
✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:
import base_models
import ext_base_models
import ext_hybrid_models
import hybrid_models
Or, if you only want to import specific classes or functions:
from base_models import YourModelClass
from ext_base_models import AnotherModelClass
✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:
model = base_models.YourModelClass()
output = model(input_data)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
We have created a 102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The details of the categories and the number of images for each class can be found on this category statistics page.
The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is visualized using isomap with shape and colour features.
> dataset
> train
> valid
> test
- cat_to_name.json
- README.md
- sample_submission.csv
We visualize the categories in the dataset using SIFT features as shape descriptors and HSV as colour descriptor. The images are randomly sampled from the category.
https://i.imgur.com/Tl6TKUC.png" alt="">
Nilsback, M-E. and Zisserman, A.
Automated flower classification over a large number of classes
Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)
Facebook
TwitterBigEarthNet - HDF5 version
This repository contains an export of the existing BigEarthNet dataset in HDF5 format. All Sentinel-2 acquisitions are exported according to TorchGeo's dataset (120x120 pixels resolution). Sentinel-1 is not contained in this repository for the moment. CSV files contain for each satellite acquisition the corresponding HDF5 file and the index. A PyTorch dataset class which can be used to iterate over this dataset can be found here, as well as the script used… See the full description on the dataset page: https://huggingface.co/datasets/lc-col/bigearthnet.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
.pkl or .npz.Some example classes include: - Animals: beaver, dolphin, otter, elephant, snake. - Plants: apple, orange, mushroom, palm tree, pine tree. - Vehicles: bicycle, bus, motorcycle, train, rocket. - Everyday Objects: clock, keyboard, lamp, table, chair.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.
Projects/
Java_projects/
eclipse.zip
guava.zip
guice.zip
hadoop.zip
spark.zip
vaadin.zip
Pharo_projects/
images/
GToolkit.zip
Moose.zip
PetitParser.zip
Pillar.zip
PolyMath.zip
Roassal2.zip
Seaside.zip
vm/
70-x64/Pharo
Scripts/
ClassCommentExtraction.st
SampleSelectionScript.st
Python_projects/
django.zip
ipython.zip
Mailpile.zip
pandas.zip
pipenv.zip
pytorch.zip
requests.zip
Projects/ contains the raw projects of each language that are used to analyze class comments.
- Java_projects/
- eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse.
- guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava.
- guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice
- hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop
- spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark
- vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin
Pharo_projects/
images/ -
GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image. Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image. PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.vm/ -
70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.
Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.
ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.
SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.
Python_projects/
django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Djangoipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPythonMailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpilepandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandaspipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenvpytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorchrequests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides image segmentation data for feral cats, designed for computer vision and machine learning tasks. It builds upon the original public domain dataset by Paul Cashman from Roboflow, with additional preprocessing and multiple data formats for easier consumption.
The dataset is organized into three standard splits: - Train set - Validation set - Test set
Each split contains data in multiple formats: 1. Original JPG images 2. Segmentation mask JPG images 3. Parquet files containing flattened image and mask data 4. Pickle files containing serialized image and mask data
train/: Original training imagesvalid/: Original validation imagestest/: Original test imagestrain_mask/: Corresponding segmentation masks for trainingvalid_mask/: Corresponding segmentation masks for validationtest_mask/: Corresponding segmentation masks for testingtrain_dataset.parquet, valid_dataset.parquet, test_dataset.parquetsplit_at = image_size[0] * image_size[1] * image_channels
[-1, 224, 224, 3])[-1, 224, 224, 1])train_dataset.pkl, valid_dataset.pkl, test_dataset.pklsplit_at = image_size[0] * image_size[1] * image_channelstrain_dataset.csv, valid_dataset.csv, test_dataset.csvAll images were preprocessed with the following operations: - Resized to 224×224 pixels using bilinear interpolation - Segmentation masks were also resized to match the images using nearest neighbor interpolation - Original RLE (Run-Length Encoding) segmentation data converted to binary masks
When used with the provided PyTorch dataset class, images are normalized with: - Mean: [0.48235, 0.45882, 0.40784] - Standard Deviation: [0.00392156862745098, 0.00392156862745098, 0.00392156862745098]
A custom CatDataset class is included for easy integration with PyTorch:
from cat_dataset import CatDataset
# Load from parquet format
dataset = CatDataset(
root="path/to/dataset",
split="train", # Options: "train", "valid", "test"
format="parquet", # Options: "parquet", "pkl"
image_size=[224, 224],
image_channels=3,
mask_channels=1
)
# Use with PyTorch DataLoader
from torch.utils.data import DataLoader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
Loading time benchmarks from the original implementation: - Parquet format: ~1.29 seconds per iteration - Pickle format: ~0.71 seconds per iteration
The pickle format provides the fastest loading times and is recommended for most use cases.
If you use this dataset in your research or projects, please cite:
@misc{feral-cat-segmentation_dataset,
title = {feral-cat-segmentation Dataset},
type = {Open Source Dataset},
author = {Paul Cashman},
howpublished = {\url{https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation}},
url = {https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation},
journal = {Roboflow Universe},
publisher = {Roboflow},
year = {2025},
month = {mar},
note = {visited on 2025-03-19},
}
from ca...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Acute poisoning is a significant global health burden, and the causative agent is often unclear. The primary aim of this pilot study was to develop a deep learning algorithm that predicts the most probable agent a poisoned patient was exposed to from a pre-specified list of drugs. Data were queried from the National Poison Data System (NPDS) from 2014 through 2018 for eight single-agent poisonings (acetaminophen, diphenhydramine, aspirin, calcium channel blockers, sulfonylureas, benzodiazepines, bupropion, and lithium). Two Deep Neural Networks (PyTorch and Keras) designed for multi-class classification tasks were applied. There were 201,031 single-agent poisonings included in the analysis. For distinguishing among selected poisonings, PyTorch model had specificity of 97%, accuracy of 83%, precision of 83%, recall of 83%, and a F1-score of 82%. Keras had specificity of 98%, accuracy of 83%, precision of 84%, recall of 83%, and a F1-score of 83%. The best performance was achieved in the diagnosis of single-agent poisoning in diagnosing poisoning by lithium, sulfonylureas, diphenhydramine, calcium channel blockers, then acetaminophen, in PyTorch (F1-score = 99%, 94%, 85%, 83%, and 82%, respectively) and Keras (F1-score = 99%, 94%, 86%, 82%, and 82%, respectively). Deep neural networks can potentially help in distinguishing the causative agent of acute poisoning. This study used a small list of drugs, with polysubstance ingestions excluded.Reproducible source code and results can be obtained at https://github.com/ashiskb/npds-workspace.git.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Subset of the SMDG-19 for Glaucoma dataset in PyTorch Format
SMDG-19: https://www.kaggle.com/datasets/deathtrooper/multichannel-glaucoma-benchmark-dataset
Contains Train, Val and Test set of Fundus images for Glaucoma Detection
2 Classes (0|1)
1: Glaucoma Present 0: Glaucoma not Present
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🖼️ CIFAR10 (Extracted from PyTorch Vision)
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
ℹ️ Dataset Details
📖 Dataset Description
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The classes are completely mutually exclusive. There is no… See the full description on the dataset page: https://huggingface.co/datasets/p2pfl/CIFAR10.
Facebook
TwitterDataset: SimCATS_GaAs_v1_random_variations_v2 Simulated data from the geometric SimCATS model (GitHub Repository, Paper) for benchmarking of semiconductor quantum dot tuning algorithms. Generated using this Jupyter Notebook and used for the final evaluation in Automated Charge Transition Detection in Quantum Dot Charge Stability Diagrams. Key Facts Contains pink, white & random telegraph noise, transition blurring, and dot jumps Random variations of charge transitions, sensor, and distortions 1.000 randomly sampled configurations with 100 CSDs each (in total: 100.000 CSDs) Usage To load the data, e.g. for calculating metrics, please have a look at SimCATS-Datasets (GitHub Repository, ReadTheDocs). The dataset can be loaded as numpy arrays using the function load_dataset or as PyTorch Dataset class (for machine learning purposes) using the class SimcatsDataset.
Facebook
TwitterDataset Card for Dataset CrashCar
This is the dataset proposed in 'CrashCar101: Procedural Generation for Damage Assessment' [WACV24]
Project Page: https://crashcar.compute.dtu.dk Repository: https://github.com/JensPars/CrashCar_procedural_generation Paper: https://openaccess.thecvf.com/content/WACV2024/papers/Parslov_CrashCar101_Procedural_Generation_for_Damage_Assessment_WACV_2024_paper.pdf
Example dataset class in pytorch import os import torch from glob import glob from… See the full description on the dataset page: https://huggingface.co/datasets/JensParslov/CrashCar.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.
There are two files:
sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only
table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid
The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.
For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT
Below is a sample code snippet to load the data
import webdataset as wds
# path to the uncompressed files, should be a directory with a set of tar files
url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar'
dataset = (
wds.Dataset(url)
.shuffle(1000) # cache 1000 samples and shuffle
.decode()
.to_tuple("json")
.batched(20) # group every 20 examples into a batch
)
# Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch
# You can also iterate through all examples and dump them with your preferred data format
Below we show how the data is organized with two examples.
Text-only
{'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence
's1_all_links': {
'Sils,_Girona': [[0, 4]],
'municipality': [[10, 22]],
'Comarques_of_Catalonia': [[30, 37]],
'Selva': [[41, 46]],
'Catalonia': [[51, 60]]
}, # list of entities and their mentions in the sentence (start, end location)
'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs
{
'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair
's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query
's2s': [ # list of other sentences that contain the common entity pair, or evidence
{
'md5': '2777e32bddd6ec414f0bc7a0b7fea331',
'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.',
's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence
'pair_locs': [ # mentions of the entity pair in the evidence
[[19, 27]], # mentions of entity 1
[[0, 5], [288, 293]] # mentions of entity 2
],
'all_links': {
'Selva': [[0, 5], [288, 293]],
'Comarques_of_Catalonia': [[19, 27]],
'Catalonia': [[40, 49]]
}
}
,...] # there are multiple evidence sentences
},
,...] # there are multiple entity pairs in the query
}
Hybrid
{'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.',
's1_all_links': {...}, # same as text-only
'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only
'table_pairs': [
'tid': 'Major_League_Baseball-1',
'text':[
['World Series Records', 'World Series Records', ...],
['Team', 'Number of Series won', ...],
['St. Louis Cardinals (NL)', '11', ...],
...] # table content, list of rows
'index':[
[[0, 0], [0, 1], ...],
[[1, 0], [1, 1], ...],
...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table.
'value_ranks':[
[0, 0, ...],
[0, 0, ...],
[0, 10, ...],
...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS
'value_inv_ranks': [], # inverse rank
'all_links':{
'St._Louis_Cardinals': {
'2': [
[[2, 0], [0, 19]], # [[row_id, col_id], [start, end]]
] # list of mentions in the second row, the key is row_id
},
'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]},
}
'name': '', # table name, if exists
'pairs': {
'pair': ['American_League', 'National_League'],
's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query
'table_pair_locs': {
'17': [ # mention of entity pair in row 17
[
[[17, 0], [3, 18]],
[[17, 1], [3, 18]],
[[17, 2], [3, 18]],
[[17, 3], [3, 18]]
], # mention of the first entity
[
[[17, 0], [21, 36]],
[[17, 1], [21, 36]],
] # mention of the second entity
]
}
}
]
}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# FiN-2 Large-Scale Real-World PLC-Dataset
## About
#### FiN-2 dataset in a nutshell:
FiN-2 is the first large-scale real-world dataset on data collected in a powerline communication infrastructure. Since the electricity grid is inherently a graph, our dataset could be interpreted as a graph dataset. Therefore, we use the word node to describe points (cable distribution cabinets) of measurement within the low-voltage electricity grid and the word edge to describe connections (cables) in between them. However, since these are PLC connections, an edge does not necessarily have to correspond to a real cable; more on this in our paper.
FiN-2 shows measurements that relate to the nodes (voltage, total harmonic distortion) as well as to the edges (signal-to-noise ratio spectrum, tonemap). In total, FiN-2 is distributed across three different sites with a total of 1,930,762,116 node measurements each for the individual features and 638,394,025 edge measurements each for all 917 PLC channels. All data was collected over a 25-month period from mid-2020 to the end of 2022.
We propose this dataset to foster research in the domain of grid automation and smart grid. Therefore, we provide different example use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection. For more decent information on this dataset, please see our [paper](https://arxiv.org/abs/2209.12693).
* * *
## Content
FiN-2 dataset splits up into two compressed `csv-Files`: *nodes.csv* and *edges.csv*.
All files are provided as a compressed ZIP file and are divided into four parts. The first part can be found in this repo, while the remaining parts can be found in the following:
- https://zenodo.org/record/8328105
- https://zenodo.org/record/8328108
- https://zenodo.org/record/8328111
### Node data
| id | ts | v1 | v2 | v3 | thd1 | thd2 | thd3 | phase_angle1 | phase_angle2 | phase_angle3 | temp |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|112|1605530460|236.5|236.4|236.0|2.9|2.5|2.4|120.0|119.8|120.0|35.3|
|112|1605530520|236.9|236.6|236.6|3.1|2.7|2.5|120.1|119.8|120.0|35.3|
|112|1605530580|236.2|236.4|236.0|3.1|2.7|2.5|120.0|120.0|119.9|35.5|
- id / ts: Unique identifier of the node that is measured and timestemp of the measurement
- v1/v2/v3: Voltage measurements of all three phases
- thd1/thd2/thd3: Total harmonic distortion of all three phases
- phase_angle1/2/3: Phase angle of all three phases
- temp: Temperature in-circuit of the sensor inside a cable distribution unit (in °C)
### Edge data
| src | dst | ts | snr0 | snr1 | snr2 | ... | snr916 |
|----|----|----|----|----|----|----|----|
|62|94|1605528900|70|72|45|...|-53|
|62|32|1605529800|16|24|13|...|-51|
|17|94|1605530700|37|25|24|...|-55|
- src & dst & ts: Unique identifier of the source and target nodes where the spectrum is measured and time of measurement
- snr0/snr1/.../snr916: 917 SNR measurements in tenths of a decibel (e.g. 50 --> 5dB).
### Metadata
Metadata that is provided along with the data covers:
- Number of cable joints
- Cable properties (length, type, number of sections)
- Relative position of the nodes (location, zero-centered gps)
- Adjacent PV or wallbox installations
- Year of installation w.r.t. the nodes and cables
Since the electricity grid is part of the critical infrastructure, it is not possible to provide exact GPS locations.
* * *
## Usage
Simple data access using pandas:
```
import pandas as pd
nodes_file = "nodes.csv.gz" # /path/to/nodes.csv.gz
edges_file = "edges.csv.gz" # /path/to/edges.csv.gz
# read the first 10 rows
data = pd.read_csv(nodes_file, nrows=10, compression='gzip')
# read the row number 5 to 15
data = pd.read_csv(nodes_file, nrows=10, skiprows=[i for i in range(1,6)], compression='gzip')
# ... same for the edges
```
Compressed csv-data format was used to make sharing as easy as possible, however it comes with significant drawbacks for machine learning. Due to the inherent graph structure, a single snapshot of the whole graph consists of a set of node and edge measurements. But due to timeouts, noise and other disturbances, nodes sometimes fail in collecting the data, wherefore the number of measurements for a specific timestamp differs. This, plus the high sparsity of the graph, leads to a high inefficiency when using the csv-format for an ML training.
To utilize the data in an ML pipeline, we recommend other data formats like [datadings](https://datadings.readthedocs.io/en/latest/) or specialized database solutions like [VictoriaMetrics](https://victoriametrics.com/).
### Example use case (voltage forecasting)
Forecasting of the voltage is one potential use cases. The Jupyter notebook provided in the repository gives an overview of how the dataset can be loaded, preprocessed and used for ML training. Thereby, a MinMax scaling was used as simple preprocessing and a PyTorch dataset class was created to handle the data. Furthermore, a vanilla autoencoder is utilized to process and forecast the voltage into the future.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets to NeurIPS 2021 accepted paper "Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction".
Datasets are pytorch files containing a dictionary with training, validation and test sets. Train, validation and test sets are custom dataset classes which inherit from the standard torch dataset class. Corresponding code an be found at https://github.com/HSG-AIML/NeurIPS_2021-Weight_Space_Learning.
Datasets 41, 42, 43 and 44 are our dataset format wrapped around the zoos from Unterthiner et al, 2020 (https://github.com/google-research/google-research/tree/master/dnn_predict_accuracy)
Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GISE-51 is an open dataset of 51 isolated sound events based on the FSD50K dataset. The release also includes the GISE-51-Mixtures subset, a dataset of 5-second soundscapes with up to three sound events synthesized from GISE-51. The GISE-51 release attempts to address some of the shortcomings of recent sound event datasets, providing an open, reproducible benchmark for future research and the freedom to adapt the included isolated sound events for domain-specific applications, which was not possible using existing large-scale weakly labelled datasets. GISE-51 release also included accompanying code for baseline experiments, which can be found at https://github.com/SarthakYadav/GISE-51-pytorch.
Citation
If you use the GISE-51 dataset and/or the released code, please cite our paper:
Sarthak Yadav and Mary Ellen Foster, "GISE-51: A scalable isolated sound events dataset", arXiv:2103.12306, 2021
Since GISE-51 is based on FSD50K, if you use GISE-51 kindly also cite the FSD50K paper:
Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.
About GISE-51 and GISE-51-Mixtures
The following sections summarize key characteristics of the GISE-51 and the GISE-51-Mixtures datasets, including details left out from the paper.
GISE-51
meta/lbl_map.csv for the complete vocabulary.silence_thresholds.txt lists class bins and their corresponding volume threshold. Files that were determined by sox to contain no audio at all were manually clipped. Code for performing silence filtering can be found in scripts/strip_silence_sox.py in the code repository.GISE-51-Mixtures
LICENSE
All audio clips (i.e., found in isolated_events.tar.gz) used in the preparation of the Glasgow Isolated Events Dataset (GISE-51) are designated Creative Commons and were obtained from FSD50K. The source data in isolated_events.tar.gz is based on the FSD50K dataset, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License.
GISE-51 dataset (including GISE-51-Mixtures) is a curated, processed and generated preparation, and is released under Creative Commons Attribution 4.0 International (CC BY 4.0) License. The license is specified in the LICENSE-DATASET file in license.tar.gz.
Baselines
Several sound event recognition experiments were conducted, establishing baseline performance on several prominent convolutional neural network architectures. The experiments are described in Section 4 of our paper, and the implementation for reproducing these experiments is available at https://github.com/SarthakYadav/GISE-51-pytorch.
Files
GISE-51 is available as a collection of several tar archives. All audio files are PCM 16 bit, 22050 Hz. Following lists the contents of these files in detail:
isolated_events.tar.gz: The core GISE-51 isolated events dataset containing train, val and eval subfolders.meta.tar.gz: contains lbl_map.jsonnoises.tar.gz: contains background noises used for GISE-51-Mixtures soundscape generationmixtures_jams.tar.gz: This file contains annotation files in .jams format that, alongside isolated_events.tar.gz and noises.tar.gz can be reused to generate exact GISE-51-Mixtures soundscapes. (Optional, we provide the complete set of GISE-51-Mixtures soundscapes as independent tar archives.)train.tar.gz: GISE-51-Mixtures train set, containing 60k synthetic soundscapes.val.tar.gz: GISE-51-Mixtures val set, containing 10k synthetic soundscapes.eval.tar.gz: GISE-51-Mixtures eval set, containing 10k synthetic soundscapes.train_*.tar.gz: These are tar archives containing training mixtures of a various number of soundscapes, used primarily in Section 4.1 of the paper, which compares val mAP performance v/s number of training soundscapes. A helper script is provided in the code release, prepare_mixtures_lmdb.sh, to prepare data for experiments in Section 4.1.pretrained-models.tar.gz: Contains model checkpoints for all experiments conducted in the paper. More information on these checkpoints can be found in the code release README.
state_dicts for use with transfer learning experiments.license.tar.gz: contains dataset license info.silence_thresholds.txt: contains volume thresholds for various sound event bins used for silence filtering.Contact
In case of queries and clarifications, feel free to contact Sarthak at s.yadav.2@research.gla.ac.uk. (Adding [GISE-51] to the subject of the email would be appreciated!)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# RP-comment-convention-adherence-Java-Python
Replication Package for the paper "Do Comments follow Commenting Conventions? A case study in Java and Python".
It uses the dataset provided by Rani et.al.'s work [How to identify class comment types? A multi-language approach for class
comment classification](https://github.com/poojaruhal/RP-class-comment-classification).
## Structure
```
RQ1/
RQ1_Java_Rules.xlsx
RQ1_Python_Rules.xlsx
RQ2/
RQ1_Java_Comments_Validated.xlsx
RQ1_Python_Comments_Validated.xlsx
Raw-projects/
Java_projects/
eclipse.zip
guava.zip
guice.zip
hadoop.zip
spark.zip
vaadin.zip
Python_projects/
django.zip
ipython.zip
Mailpile.zip
pandas.zip
pipenv.zip
pytorch.zip
requests.zip
Style-guides
```
## Contents of the Replication Package
---
- **RQ1/** - contains the data used to answer RQ1
- `RQ1_Java_Rules.xlsx` - contains comment-related rules extracted from various Java style guidelines. Various tabs in the sheet represent the rules extracted from standard or project-specific guidelines.
Oracle and Google are the standard guidelines, and the remaining are specific to the projects.
- `RQ1_Python_Rules.xlsx` - contains comment-related rules extracted from various Python style guidelines. Various tabs in the sheet represent the rules extracted from standard or project-specific guidelines. PEP, Numpy, and Google are the standard guidelines and the remaining are specific to the projects.
- **RQ2/** - contains the data used to answer RQ2
- `RQ2_Java_Comments_Validated.xlsx` - contains Java comment dataset used from the previous work and validated against the rules from their corresponding guidelines. Various tabs in the sheet represent various Java projects used in the work. The rows in each tab show the sample class comments used to validate against the rules. The rules are shown in the columns.
- `RQ2_Python_Comments_Validated.xlsx` - contains Python comment dataset used from the previous work and validated against the rules from their corresponding guidelines. Various tabs in the sheet represent various Java projects used in the work. The rows in each tab show the sample class comments used to validate against the rules. The rules are shown in the columns.
- **Raw-projects/** contains the raw projects of each language that are used to analyze class comments.
- **Java_projects/**
- `eclipse.zip` - Eclipse project downloaded from the GitHub. More detail about the project is on https://github.com/eclipse
- `guava.zip` - Guava project downloaded from the GitHub. More detail about the project is on https://github.com/google/guava
- `guice.zip` - Guice project downloaded from the GitHub. More detail about the project is on https://github.com/google/guice
- `hadoop.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is on https://github.com/apache/hadoop
- `spark.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is on https://github.com/apache/spark
- `vaadin.zip` - Vaadin project downloaded from the GitHub. More detail about the project is on https://github.com/vaadin/framework
- **Python_projects/**
- `django.zip` - Django project downloaded from the GitHub. More detail about the project is on https://github.com/django.
- `ipython.zip` - IPython project downloaded from the GitHub. More detail about the project is on https://github.com/ipython/ipython
- `Mailpile.zip` - Mailpile project downloaded from the GitHub. More detail about the project is on https://github.com/mailpile/Mailpile
- `pandas.zip` - pandas project downloaded from the GitHub. More detail about the project is on https://github.com/pandas-dev/pandas
- `pipenv.zip` - Pipenv project downloaded from the GitHub. More detail about the project is on https://github.com/pypa/pipenv
- `pytorch.zip` - PyTorch project downloaded from the GitHub. More detail about the project is on https://github.com/pytorch/pytorch
- `requests.zip` - Requests project downloaded from the GitHub. More detail about the project is on https://github.com/psf/requests/
- **Style-guides/**- contains the style guidelines used for the selected projects.
---
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Changen2-S1-15k
Changen2-S1-15k (a building change dataset with 15k pairs and 2 change types), 0.3-1m spatial resolution, RGB bands
Dataset Sources
Repository: https://github.com/Z-Zheng/pytorch-change-models Paper: https://ieeexplore.ieee.org/document/10713915
Citation
BibTeX: @article{zheng_changen2, author={Zheng, Zhuo and Ermon, Stefano and Kim, Dongjun and Zhang, Liangpei and Zhong, Yanfei}, journal={IEEE Transactions on Pattern… See the full description on the dataset page: https://huggingface.co/datasets/EVER-Z/Changen2-S1-15k.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
On the Generalization of WiFi-based Person-centric Sensing in Through-Wall Scenarios
This repository contains the 3DO dataset proposed in [1].
PyTroch Dataloader
A minimal PyTorch dataloader for the 3DO dataset is provided at: https://github.com/StrohmayerJ/3DO
Dataset Description
The 3DO dataset comprises 42 five-minute recordings (~1.25M WiFi packets) of three human activities performed by a single person, captured in a WiFi through-wall sensing scenario over three consecutive days. Each WiFi packet is annotated with a 3D trajectory label and a class label for the activities: no person/background (0), walking (1), sitting (2), and lying (3). (Note: The labels returned in our dataloader example are walking (0), sitting (1), and lying (2), because background sequences are not used.)
The directories 3DO/d1/, 3DO/d2/, and 3DO/d3/ contain the sequences from days 1, 2, and 3, respectively. Furthermore, each sequence directory (e.g., 3DO/d1/w1/) contains a csiposreg.csv file storing the raw WiFi packet time series and a csiposreg_complex.npy cache file, which stores the complex Channel State Information (CSI) of the WiFi packet time series. (If missing, csiposreg_complex.npy is automatically generated by the provided dataloader.)
Dataset Structure:
/3DO
├── d1 <-- day 1 subdirectory
└── w1 <-- sequence subdirectory
└── csiposreg.csv <-- raw WiFi packet time series
└── csiposreg_complex.npy <-- CSI time series cache
├── d2 <-- day 2 subdirectory
├── d3 <-- day 3 subdirectory
In [1], we use the following training, validation, and test split:
Subset Day Sequences
Train 1 w1, w2, w3, s1, s2, s3, l1, l2, l3
Val 1 w4, s4, l4
Test 1 w5 , s5, l5
Test 2 w1, w2, w3, w4, w5, s1, s2, s3, s4, s5, l1, l2, l3, l4, l5
Test 3 w1, w2, w4, w5, s1, s2, s3, s4, s5, l1, l2, l4
w = walking, s = sitting and l= lying
Note: On each day, we additionally recorded three ten-minute background sequences (b1, b2, b3), which are provided as well.
Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].
[1] Strohmayer, J., Kampel, M. (2025). On the Generalization of WiFi-Based Person-Centric Sensing in Through-Wall Scenarios. In: Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham. https://doi.org/10.1007/978-3-031-78354-8_13
BibTeX citation:
@inproceedings{strohmayerOn2025, author="Strohmayer, Julian and Kampel, Martin", title="On the Generalization of WiFi-Based Person-Centric Sensing in Through-Wall Scenarios", booktitle="Pattern Recognition", year="2025", publisher="Springer Nature Switzerland", address="Cham", pages="194--211", isbn="978-3-031-78354-8" }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.
Dataset
This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.
This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.
For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.