39 datasets found

STEAD subsample 4 CDiffSD
zenodo.org
bin
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniele Trappolini; Daniele Trappolini (2024). STEAD subsample 4 CDiffSD [Dataset]. http://doi.org/10.5281/zenodo.11094536
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11094536
Dataset updated
Apr 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniele Trappolini; Daniele Trappolini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 15, 2024
Description
STEAD Subsample Dataset for CDiffSD Training

Overview

This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.

Dataset Files

The dataset includes the following files:

train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.

noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.

test: Used for the testing phase, structured similarly to train.

noise_test: Used for the testing phase, contains noise data for testing.

Each file is structured to support the training and evaluation of seismic denoising models.

Data

The HDF5 files named noise contain two main datasets:

traces: This dataset includes N number of events, with each event being 6000 in size, representing the length of the traces. Each trace is organized into three channels in the following order: E (East-West), N (North-South), Z (Vertical).

metadata: This dataset contains the names of the traces for each event.

Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:

p_arrival: Contains the arrival indices of P-waves, expressed in counts.

s_arrival: Contains the arrival indices of S-waves, also expressed in counts.

Usage

To load these files in a Python environment, use the following approach:

```python

import h5py
import numpy as np

# Open the HDF5 file in read mode
with h5py.File('train_noise.hdf5', 'r') as file:
# Print all the main keys in the file
print("Keys in the HDF5 file:", list(file.keys()))

if 'traces' in file:
# Access the dataset
data = file['traces'][:10] # Load the first 10 traces

if 'metadata' in file:
# Access the dataset
trace_name = file['metadata'][:10] # Load the first 10 metadata entries```

Ensure that the path to the file is correctly specified relative to your Python script.

Requirements

To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

```bash
pip install numpy
pip install h5py
```
Data from: Robotic manipulation datasets for offline compositional...
data.niaid.nih.gov
datadryad.org
zip
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel Hussing; Jorge Mendez; Anisha Singrodia; Cassandra Kent; Eric Eaton (2024). Robotic manipulation datasets for offline compositional reinforcement learning [Dataset]. http://doi.org/10.5061/dryad.9cnp5hqps
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.9cnp5hqps
Dataset updated
Jun 6, 2024
Dataset provided by
Massachusetts Institute of Technology
University of Pennsylvania
Authors
Marcel Hussing; Jorge Mendez; Anisha Singrodia; Cassandra Kent; Eric Eaton
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Offline reinforcement learning (RL) is a promising direction that allows RL agents to be pre-trained from large datasets avoiding recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, and 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components. This submission provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite Mendez et al., 2022. In every task in CompoSuite, a robot arm is used to manipulate an object to achieve an objective all while trying to avoid an obstacle. There are for components for each of these four axes that can be combined arbitrarily leading to a total of 256 tasks. The component choices are * Robot: IIWA, Jaco, Kinova3, Panda* Object: Hollow box, box, dumbbell, plate* Objective: Push, pick and place, put in shelf, put in trashcan* Obstacle: None, wall between robot and object, wall between goal and object, door between goal and object The four included datasets are collected using separate agents each trained to a different degree of performance, and each dataset consists of 256 million transitions. The degrees of performance are expert data, medium data, warmstart data and replay data: * Expert dataset: Transitions from an expert agent that was trained to achieve 90% success on every task.* Medium dataset: Transitions from a medium agent that was trained to achieve 30% success on every task.* Warmstart dataset: Transitions from a Soft-actor critic agent trained for a fixed duration of one million steps.* Medium-replay-subsampled dataset: Transitions that were stored during the training of a medium agent up to 30% success. These datasets are intended for the combined study of compositional generalization and offline reinforcement learning. Methods The datasets were collected by using several deep reinforcement learning agents trained to the various degrees of performance described above on the CompoSuite benchmark (https://github.com/Lifelong-ML/CompoSuite) which builds on top of robosuite (https://github.com/ARISE-Initiative/robosuite) and uses the MuJoCo simulator (https://github.com/deepmind/mujoco). During reinforcement learning training, we stored the data that was collected by each agent in a separate buffer for post-processing. Then, after training, to collect the expert and medium dataset, we run the trained agents for 2000 trajectories of length 500 online in the CompoSuite benchmark and store the trajectories. These add up to a total of 1 million state-transitions tuples per dataset, totalling a full 256 million datapoints per dataset. The warmstart and medium-replay-subsampled dataset contain trajectories from the stored training buffer of the SAC agent trained for a fixed duration and the medium agent respectively. For medium-replay-subsampled data, we uniformly sample trajectories from the training buffer until we reach more than 1 million transitions. Since some of the tasks have termination conditions, some of these trajectories are trunctated and not of length 500. This sometimes results in a number of sampled transitions larger than 1 million. Therefore, after sub-sampling, we artificially truncate the last trajectory and place a timeout at the final position. This can in some rare cases lead to one incorrect trajectory if the datasets are used for finite horizon experimentation. However, this truncation is required to ensure consistent dataset sizes, easy data readability and compatibility with other standard code implementations. The four datasets are split into four tar.gz folders each yielding a total of 12 compressed folders. Every sub-folder contains all the tasks for one of the four robot arms for that dataset. In other words, every tar.gz folder contains a total of 64 tasks using the same robot arm and four tar.gz files form a full dataset. This is done to enable people to only download a part of the dataset in case they do not need all 256 tasks. For every task, the data is separately stored in an hdf5 file allowing for the usage of arbitrary task combinations and mixing of data qualities across the four datasets. Every task is contained in a folder that is named after the CompoSuite elements it uses. In other words, every task is represented as a folder named
Simulated calibration dataset for 4D scanning transmission electron...
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Colin Ophus; Benjamin Savitzky; Colin Ophus; Benjamin Savitzky (2020). Simulated calibration dataset for 4D scanning transmission electron microscopy [Dataset]. http://doi.org/10.5281/zenodo.3592520
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3592520
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Colin Ophus; Benjamin Savitzky; Colin Ophus; Benjamin Savitzky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
4D-STEM data frequently requires a number of calibrations in order to make accurate measurement: for instance, in various cases, it can be essential to measure and correct for diffraction shifts, account for ellipticity in the diffraction patterns, or determine the rotational offset between the real and diffraction planes.

We've prepared a simulated 4D-STEM dataset which includes diffraction shifting, elliptical distortion, and an r-space/k-space rotational offset. Two HDF5 files each include the simulated data for two different electron probes: a standard probe, using a circular probe-forming aperture, and a 'bullseye' probe, using a patterned aperture. Each HDF5 file contains the following data objects:

(a) the 'experimental' 4D-STEM scan of a strained single-crystal gold nanoparticle (size: (100,84,250,250) )

(b) a 4D-STEM scan of a calibration sample of polycrystalline gold (size: (100,84,250,250) )

(c) a stack of diffraction images of the electron probe over vacuum (size: (250,250,20) )

(d) a single image of the electron probe over the sample and far from focus, such that the CBED forms a shadow image (size: (512,512) )
Data from: UA - Gaussian Depth Disc (GDD dataset)
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Dec 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rhys Evans; Rhys Evans; Ester Devlieghere; Robrecht Keijzer; Joris Dirckx; Sam Van Der Jeught; Sam Van Der Jeught; Ester Devlieghere; Robrecht Keijzer; Joris Dirckx (2023). UA - Gaussian Depth Disc (GDD dataset) [Dataset]. http://doi.org/10.5281/zenodo.10404434
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10404434
Dataset updated
Dec 19, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rhys Evans; Rhys Evans; Ester Devlieghere; Robrecht Keijzer; Joris Dirckx; Sam Van Der Jeught; Sam Van Der Jeught; Ester Devlieghere; Robrecht Keijzer; Joris Dirckx
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

Dear reader,

Welcome! You must be an avid profilometry person to be interested in downloading our dataset.

Before you start tinkering with the dataset package please do install the requirements.txt libraries for a more easy step into operating this system.

We hope to have made the hierarchy of the package as clear as possible! Also note that this system was written in VScode.

Find your way to the examples folder, there you can find "entire_dataset". This folder contains a script to divide the original h5 file containing all data

into whatever sub-options you'd like. An example divided dataset has already been given namely the 80/20 division of respectively training and validation

data in the "example_dataset" folder.

In the folder models you will find the two models mentioned in the publication related to this dataset. These two were published with the dataset since they

had either the highest performance on the training and validation dataset (DenseNet) or on the random physical object test (UNet).

A training script is included (training_script.py) to show you how these models were created, so if you wish to add new models to the networks.py file in the classes folder, you can!

The validation jupyter notebook contains two visualisation tools to quickly and neatly show the performance of your model on the recorded dataset.

Lastly to test on the recorded object you can run the "test_physical_data.py" script.

We hope this helps you in your research and we hope it further improves any and all research within the single shot profilometry field! 😊

Kind regards,

Rhys Evans,

InViLab,

University of Antwerp, Belgium
d
Dataset: Nanoscale crystal grain characterization via linear polarization...
b2find.dkrz.de
Updated Nov 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Dataset: Nanoscale crystal grain characterization via linear polarization X-ray ptychography - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/f2034bc1-fa20-55a2-87c1-64e9514a89a0
Explore at:
Dataset updated
Nov 3, 2023
Description
All raw data and metadata of ptychography scans are assembled into HDF5 files. These include acquired frames X-ray pixel array detectors, parameters, component positions and settings of the instruments. The data and metadata curation follows the convention of NeXus-NXsas as close as practical.
modelforge curated dataset: SPICE 1 OpenFF
zenodo.org
application/gzip
Updated Oct 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Iacovella; Christopher Iacovella; Marcus Wieder; Marcus Wieder; John Chodera; John Chodera (2024). modelforge curated dataset: SPICE 1 OpenFF [Dataset]. http://doi.org/10.5281/zenodo.13883717
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13883717
Dataset updated
Oct 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christopher Iacovella; Christopher Iacovella; Marcus Wieder; Marcus Wieder; John Chodera; John Chodera
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Curated SPICE 1 OpenFF Dataset:

1000 conformer test set restricted to elements H, C, N, O, F, Cl, S, version "nc_1000_v1_HCNOFClS":

This provides a curated hdf5 file for a subset of the SPICE 1 OpenFF dataset (Open Force Field initiative default level of theory) designed to be compatible with modelforge, an infrastructure to implement and train NNPs. This subset is limited to molecules containing any of the following 7 elements: H, C, N, O, F, Cl, and S. This datafile includes 1000 total conformers for 100 unique molecules.

Changes: In this version, for each record `total_charge` is stored as an array of shape (N_conformers, 1), i.e., a value for each conformer; previously this was just a single value for each record as the charge state doesn't change for each conformer.

When applicable, the units of properties are provided in the datafile, encoded as strings compatible with the openff-units package. For more information about the structure of the data file, please see the following:

https://github.com/choderalab/modelforge/wiki/Dataset-and-curation#curation-module

This curated dataset was generated using the modelforge software at commit

Link to the source code at this commit:

Link to the script file used to generate the dataset:

Source Dataset:

Small-molecule/Protein Interaction Chemical Energies (SPICE).

The SPICE dataset contains 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated using B3LYP-D3BJ/DZVP level of theory, using Psi4 1.4.1.
This is the default theory used for force field development by the Open Force Field Initiative.

This includes the following collections from the MolSSI qcarchive (these are also included in the standard SPICE 1 dataset):

"SPICE Solvated Amino Acids Single Points Dataset v1.1",

"SPICE Dipeptides Single Points Dataset v1.2",

"SPICE DES Monomers Single Points Dataset v1.1",

"SPICE DES370K Single Points Dataset v1.0",

"SPICE PubChem Set 1 Single Points Dataset v1.2",

"SPICE PubChem Set 2 Single Points Dataset v1.2",

"SPICE PubChem Set 3 Single Points Dataset v1.2",

"SPICE PubChem Set 4 Single Points Dataset v1.2",

"SPICE PubChem Set 5 Single Points Dataset v1.2",

"SPICE PubChem Set 6 Single Points Dataset v1.2",

This does not include the following collections (which are part of the standard SPICE 1 dataset):

"SPICE Ion Pairs Single Points Dataset v1.1",

"SPICE DES370K Single Points Dataset Supplement v1.0",

Citations:

Original SPICE 1 publication:

Eastman, P., Behara, P.K., Dotson, D.L. et al. SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials. Sci Data 10, 11 (2023). https://doi.org/10.1038/s41597-022-01882-6
Z
CODE-test: An annotated 12-lead ECG dataset
data.niaid.nih.gov
zenodo.org
Updated Jun 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CODE-test: An annotated 12-lead ECG dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3625006
Explore at:
Dataset updated
Jun 7, 2021
Dataset provided by
Andersson, Carl R.
Ribeiro, Manoel Horta
Meira Jr., Wagner
Ribeiro, Antonio Luiz P.
Canazart, Jéssica A.
Schön, Thomas B.
Ribeiro, Antonio H
Gomes, Paulo R.
Ferreira, Milton P.
Macfarlane, Peter W.
Paixão, Gabriela M.
Oliveira, Derick M.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Annotated 12 lead ECG dataset

Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students. It is used as test set on the paper: "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4.

It contain annotations about 6 different ECGs abnormalities: - 1st degree AV block (1dAVb); - right bundle branch block (RBBB); - left bundle branch block (LBBB); - sinus bradycardia (SB); - atrial fibrillation (AF); and, - sinus tachycardia (ST).

Companion python scripts are available in: https://github.com/antonior92/automatic-ecg-diagnosis

Citation Ribeiro, A.H., Ribeiro, M.H., Paixão, G.M.M. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 11, 1760 (2020). https://doi.org/10.1038/s41467-020-15432-4

Bibtex: ``` @article{ribeiro_automatic_2020, title = {Automatic Diagnosis of the 12-Lead {{ECG}} Using a Deep Neural Network}, author = {Ribeiro, Ant{^o}nio H. and Ribeiro, Manoel Horta and Paix{~a}o, Gabriela M. M. and Oliveira, Derick M. and Gomes, Paulo R. and Canazart, J{\'e}ssica A. and Ferreira, Milton P. S. and Andersson, Carl R. and Macfarlane, Peter W. and Meira Jr., Wagner and Sch{"o}n, Thomas B. and Ribeiro, Antonio Luiz P.}, year = {2020}, volume = {11}, pages = {1760}, doi = {https://doi.org/10.1038/s41467-020-15432-4}, journal = {Nature Communications}, number = {1} }

```

Folder content:

ecg_tracings.hdf5: The HDF5 file containing a single dataset named tracings. This dataset is a (827, 4096, 12) tensor. The first dimension correspond to the 827 different exams from different patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: {DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}.

The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples) we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should be multiplied by 1000 in order to obtain the signals in V.

In python, one can read this file using the following sequence: python import h5py with h5py.File(args.tracings, "r") as f: x = np.array(f['tracings'])

The file attributes.csv contain basic patient attributes: sex (M or F) and age. It contain 827 lines (plus the header). The i-th tracing in ecg_tracings.hdf5 correspond to the i-th line.

annotations/: folder containing annotations csv format. Each csv file contain 827 lines (plus the header). The i-th line correspond to the i-th tracing in ecg_tracings.hdf5 correspond to the in all csv files. The csv files all have 6 columns 1dAVb, RBBB, LBBB, SB, AF, ST corresponding to weather the annotator have detect the abnormality in the ECG (=1) or not (=0).

cardiologist[1,2].csv contain annotations from two different cardiologist.

gold_standard.csv gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a third senior specialist, aware of the annotations from the other two, decided the diagnosis.

dnn.csv prediction from the deep neural network described in the paper. THe threshold is set in such way it maximizes the F1 score.

cardiology_residents.csv annotations from two 4th year cardiology residents (each annotated half of the dataset).

emergency_residents.csv annotations from two 3rd year emergency residents (each annotated half of the dataset).

medical_students.csv annotations from two 5th year medical students (each annotated half of the dataset).
Custom Silicone Mask Attack Dataset (CSMAD)
zenodo.org
data.niaid.nih.gov
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sushil Bhattacharjee; Sushil Bhattacharjee; Amir Mohammadi; Sébastien Marcel; Sébastien Marcel; Amir Mohammadi (2023). Custom Silicone Mask Attack Dataset (CSMAD) [Dataset]. http://doi.org/10.34777/1aer-9584
Explore at:
Unique identifier
https://doi.org/10.34777/1aer-9584
Dataset updated
Mar 8, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sushil Bhattacharjee; Sushil Bhattacharjee; Amir Mohammadi; Sébastien Marcel; Sébastien Marcel; Amir Mohammadi
Description
The Custom Silicone Mask Attack Dataset (CSMAD) contains presentation attacks made of six custom-made silicone masks. Each mask cost about USD 4000. The dataset is designed for face presentation attack detection experiments.

The Custom Silicone Mask Attack Dataset (CSMAD) has been collected at the Idiap Research Institute. It is intended for face presentation attack detection experiments, where the presentation attacks have been mounted using a custom-made silicone mask of the person (or identity) being attacked.

The dataset contains videos of face-presentations, as a set of files specifying the experimental protocol corresponding the experiments presented in the corresponding publication.

Reference

If you publish results using this dataset, please cite the following publication.

Sushil Bhattacharjee, Amir Mohammadi and Sebastien Marcel: "Spoofing Deep Face Recognition With Custom Silicone Masks." in Proceedings of International Conference on Biometrics: Theory, Applications, and Systems (BTAS), 2018.
10.1109/BTAS.2018.8698550
http://publications.idiap.ch/index.php/publications/show/3887

Data Collection

Face-biometric data has been collected from 14 subjects to create this dataset. Subjects participating in this data-collection have played three roles: targets, attackers, and bona-fide clients. The subjects represented in the dataset are referred to here with letter-codes: A .. N. The subjects A..F have also been targets. That is, face-data for these six subjects has been used to construct their corresponding flexible masks (made of silicone). These masks have been made by Nimba Creations Ltd., a special effects company.

Bona fide presentations have been recorded for all subjects A..N. Attack presentations (presentations where the subject wears one of 6 masks) have been recorded for all six targets, made by different subjects. That is, each target has been attacked several times, each time by a different attacker wearing the mask in question. This is one way of increasing the variability in the dataset. Another way we have augmented the variability of the dataset is by capturing presentations under different illumination conditions. Presentations have been captured in four different lighting conditions:

flourescent ceiling light only

halogen lamp illuminating from the left of the subject only

halogen lamp illuminating from the right only

both halogen lamps illuminating from both sides simultaneously

All presentations have been captured with a green uniform background. See the paper mentioned above for more details of the data-collection process.

Dataset Structure

The dataset is organized in three subdirectories: ‘attack’, ‘bonafide’, ‘protocols’. The two directories: ‘attack’ and ‘bonafide’ contain presentation-videos and still images for attacks and bona fide presentations, respectively. The folder ‘protocols’ contains text files specifying the experimental protocol for vulnerability analysis of face-recognition (FR) systems.

The number of data-files per category are as follows:

‘bonafide’: 87 videos, and 17 still images (in .JPG format). The still images are frontal face images captured using a Nikon Coolpix digital camera.

‘attack’: 159, organized in two sub-folders – ‘WEAR’ (108 videos), and ‘STAND’ (51 videos)

The folder ‘attack/WEAR’ contains videos where the attack has been made by a person (attacker) wearing the mask of the target being attacked. The ‘attack/STAND’ folder contains videos where the attack has been made using a the target’s mask mounted on an appropriate stand.

Video File Format

The video files for the face-presentations are in ‘hdf5’ format (with file-extensions ‘.h5’. The folder structure of the hdf5 file is shown in Figure 1. Each file contains data collected using two cameras:

RealSense SR300 (from Intel): collects images/videos in visible-light (RGB color) , near infrared (NIR) @ 860nm wavelength, and depth maps

Compact Pro (from Seek Thermal): collects thermal (long-wave infrared (LWIR)) images.

As shown in Figure 1, frames from the different channels (color, infrared, depth, thermal) from he two cameras are stored in separate directory-hierarchies in the hdf5 file. Each file respresents a video of approximately 10 seconds, or roughly, 300 frames.

In the hdf5 file, the directory for SR300 also contains a subdirectory named ‘aligned_color_to_depth’. This folder contains post-processed data, where the frames of depth channel have been aligned with those of the color channel based on the time-stamps of the frames.

Experimental Protocol

The ‘protocols’ folder contains text files that specify the protocols for vulnerability analysis experiments reported in the paper mentioned above. Please see the README file in the protocols folder for details.
Z
BirdVox-70k: a dataset for species-agnostic flight call detection in...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farnsworth, Andrew (2020). BirdVox-70k: a dataset for species-agnostic flight call detection in half-second clips [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1226426
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Bello, Juan Pablo
Lostanlen, Vincent
Farnsworth, Andrew
Kelling, Steve
Salamon, Justin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BirdVox-70k: a dataset for avian flight call detection in half-second clips

Version 1.0, April 2018.

Created By

Vincent Lostanlen (1, 2, 3), Justin Salamon (2, 3), Andrew Farnsworth (1), Steve Kelling (1), and Juan Pablo Bello (2, 3).

(1): Cornell Lab of Ornithology (CLO) (2): Center for Urban Science and Progress, New York University (3): Music and Audio Research Lab, New York University

https://wp.nyu.edu/birdvox

Description

The BirdVox-70k dataset contains 70k half-second clips from 6 audio recordings in the BirdVox-full-night dataset, each about ten hours in duration. These recordings come from ROBIN autonomous recording units, placed near Ithaca, NY, USA during the fall 2015. They were captured on the night of September 23rd, 2015, by six different sensors, originally numbered 1, 2, 3, 5, 7, and 10.

Andrew Farnsworth used the Raven software to pinpoint every avian flight call in time and frequency. He found 35402 flight calls in total. He estimates that about 25 different species of passerines (thrushes, warblers, and sparrows) are present in this recording. Species are not labeled in BirdVox-70k, but it is possible to tell apart thrushes from warblers and sparrrows by looking at the center frequencies of their calls. The annotation process took 102 hours.

The dataset can be used, among other things, for the research,development and testing of bioacoustic classification mode ls, including the reproduction of the results reported in [1].

For details on the hardware of ROBIN recording units, we refer the reader to [2].

[1] V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection. Proc. IEEE ICASSP, 2018.

[2] J. Salamon, J. P. Bello, A. Farnsworth, M. Robbins, S. Keen, H. Klinck, and S. Kelling. Towards the Automatic Classification of Avian Flight Calls for Bioacoustic Monitoring. PLoS One, 2016.

@inproceedings{lostanlen2018icassp, title = {BirdVox-full-night: a dataset and benchmark for avian flight call detection}, author = {Lostanlen, Vincent and Salamon, Justin and Farnsworth, Andrew and Kelling, Steve and Bello, Juan Pablo}, booktitle = {Proc. IEEE ICASSP}, year = {2018}, published = {IEEE}, venue = {Calgary, Canada}, month = {April}, }

Data Files

BirdVox-70k contains the recordings as HDF5 files, sampled at 24 kHz, with a single channel (mono). Each HDF5 file corresponds to a different sensor. The name of the HDF5 dataset in each file is "waveforms".

Metadata Files

Contrary to BirdVox-full-night, BirdVox-70k is not shipped with a metadata file. Rather, the metadata is included in the keys of the elements in the HDF5 files themselves, whose values are the waveforms.

An example of BirdVox-70k key is:

unitID_TIMESTAMP_FREQ_LABEL

where

ID is the identifier of the unit (01, 02, 03, 05, 07, or 10)

TIMESTAMP is the timestamp of the center of the clip in the BirdVox-full-night recording. This timestamp is measured in samples at 24 kHz. It is accurate at about 10 ms.

FREQ is the center frequency of the flight call, measured in Hertz. It is accurate at about 1 kHz. When the clip is negative, i.e. does not contain any flight call, it is set equal to zero by convention.

LABEL is the label of the clip, positive (1) or negative (0).

Example:

unit01_085256784_03636_1

is a positive clip in unit 01, with timestamp 085256784 (3552.37 seconds after dividing by the sample rate 24000), center frequency 3636 Hz.

Another example:

unit05_284775340_00000_0

is a negative clip in unit 05, with timestamp 284775340 (11865.64 seconds).

The approximate GPS coordinates of the sensors (latitudes and longitudes rounded to 2 decimal points) and UTC timestamps corresponding to the start of the recording for each sensor are included as CSV files in the main directory.

Please acknowledge BirdVox-70k in academic research

When BirdVox-70k is used for academic research, we would highly appreciate it if scientific publications of works partly based on this dataset cite the following publication:

V. Lostanlen, J. Salamon, A. Farnsworth, S. Kelling, J. Bello. BirdVox-full-night: a dataset and benchmark for avian flight call detection, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.

The creation of this dataset was supported by NSF grants 1125098 (BIRDCAST) and 1633259 (BIRDVOX), a Google Faculty Award, the Leon Levy Foundation, and two anonymous donors.

Conditions of Use

Dataset created by Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, and Juan Pablo Bello.

The BirdVox-70k dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/

The dataset and its contents are made available on an "as is" basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, Cornell Lab of Ornithology is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the BirdVox-70k dataset or any part of it.

Feedback

Please help us improve BirdVox-70k by sending your feedback to: vincent.lostanlen@gmail.com and af27@cornell.edu

In case of a problem, please include as many details as possible.

Acknowledgements

Jessie Barry, Ian Davies, Tom Fredericks, Jeff Gerbracht, Sara Keen, Holger Klinck, Anne Klingensmith, Ray Mack, Peter Marchetto, Ed Moore, Matt Robbins, Ken Rosenberg, and Chris Tessaglia-Hymes.

We acknowledge that the land on which the data was collected is the unceded territory of the Cayuga nation, which is part of the Haudenosaunee (Iroquois) confederacy.
d
Magnetotelluric Data from the Gabbs Valley Region, Nevada, 2020
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Magnetotelluric Data from the Gabbs Valley Region, Nevada, 2020 [Dataset]. https://catalog.data.gov/dataset/magnetotelluric-data-from-the-gabbs-valley-region-nevada-2020
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Gabbs Valley Range, Nevada
Description
This data set consists of 59 wideband magnetotelluric (MT) stations collected by the U.S. Geological Survey in July and August of 2020 as part of a 1-year project funded by the Energy Resources Program of the U.S. Geological Survey to demonstrate full crustal control on geothermal systems in the Great Basin. Each station had 5 components, 3 orthogonal magnetic induction coils and 2 horizontal orthogonal electric dipoles. Data were collected for an average of 18 hours on a repeating schedule of alternating sampling rates of 256 samples/second for 7 hours and 50 minutes and 4096 samples/second for 10 minutes. The schedules were set such that each station was recording the same schedule to allow for remote reference processing. Data were processed with a bounded-influence robust remote reference processing scheme (BIRRP v5.2.1, Chave and Thomson, 2004). Data quality is good for periods of 0.007 - 2048 with some noise in the higher periods and less robust estimates at the longer periods. Files included in this publication include measured electric- and magnetic-field time series (.h5 files) as well as estimated impedance and vertical-magnetic field transfer functions (.edi files). An image of the MT response is supplied (.png file) where the impedance tensor is plotted on the to two panels, the induction vectors in the middle panel (up is geographic North), and the phase tensor in the bottom panel (up is geographic North). The real induction vectors point towards strong conductors. Phase tensor ellipses align in the direction of electrical current flow and warmer color represents the subsurface becoming more conductive and cooler colors more resistive.
d
Data from: Utah FORGE: Neubrex Well 16B(78)-32 DAS Data - April 2024
catalog.data.gov
data.openei.org
+2more
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neubrex Energy Services (US), LLC (2025). Utah FORGE: Neubrex Well 16B(78)-32 DAS Data - April 2024 [Dataset]. https://catalog.data.gov/dataset/utah-forge-neubrex-well-16b78-32-das-data-april-2024-6df28
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Neubrex Energy Services (US), LLC
Description
This dataset comprises Distributed Acoustic Sensing (DAS) data collected from the Utah FORGE monitoring well 16B(78)-32 (the producer well) during hydraulic fracture stimulation operations conducted in April 2024. The data were acquired continuously over the stimulation period at a temporal sampling rate of 10,000 Hz (10 kS/s) and a spatial resolution of approximately 3.35 feet (1.02109 meters). The measurements were captured using a Neubrex NBX-S4100 Time Gated Digital DAS interrogator unit connected to a single-mode fiber optic cable, which was permanently installed within the casing string. All recorded channels correspond to downhole segments of the fiber optic cable, from a measured depth (MD) of 5,369.35 feet to 10,352.11 feet. The DAS data reflect raw acoustic energy generated by physical processes within and surrounding the well during stimulation activities at wells 16A(78)-32 and 16B(78)-32. These data have potential applications in analyzing cross-well strain, far-field strain rates (including microseismic activity), induced seismicity, and seismic imaging. Metadata embedded in the attributes of the HDF5 files include detailed information on the measured depths of the channels, interrogation parameters, and other acquisition details. The dataset also includes a recording of a seminar held on September 19, 2024, where Neubrex's Chief Operating Officer presented insights into the data collection, analysis, and preliminary findings. The raw data files, stored in HDF5 format, are organized chronologically according to the recording intervals from April 9 to April 24, 2024, with each file corresponding to a 12-second recording interval.

Dataset for "Adjoint Waveform Tomography for Crustal and Upper Mantle...

zenodo.org

bin, csv, nc

Updated Aug 4, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Arthur Rodgers; Arthur Rodgers (2023). Dataset for "Adjoint Waveform Tomography for Crustal and Upper Mantle Structure the Middle East and Southwest Asia for Improved Waveform Simulations Using Openly Available Broadband Data" [Dataset]. http://doi.org/10.5281/zenodo.8212589

Explore at:

csv, bin, ncAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8212589

Dataset updated

Aug 4, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Arthur Rodgers; Arthur Rodgers

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Middle East

Description

This dataset contains the MESWA (Middle East and Southwest Asia) seismic model and auxiliary data used in the creation of the model (Rodgers, 2023). MESWA is a three-dimensional model of the seismic properties of crust and upper mantle of the Middle East and Southwest Asia. The MESWA model is provided in NetCDF format (readable by for example, xarray, Hoyer & Hamman, 2017) and HDF5 format for viewing with ParaView (Ahrens et al., 2005) and interaction with Salvus (Afanasiev et al., 2019).

Also included are the earthquake source parameters for all 327 Global Centroid Moment Tensor events considered in this study in ASCII text format. Also included are lists of the selected 192 inversion events and 66 validation events in ASCII text format. Lastly, we include a list of all receivers used in the creation and validation of MESWA. This is a simple ASCII file with the event name and receiver name (composed of the network_code and station_code).

The following table provides a listing of the files in the dataset:

File	Description
MESWA.nc	MESWA model in NetCDF format
MESWA.h5	MESWA model in HDF5 format, used by Salvus
MESWA.xmdf	Auxiliary file for MESWA.h5, used to import model into Paraview
events_project.csv	Table of event source parameters for all 327 events considered in the project
inversion_events_192.csv	Table of 192 inversion events (ASCII comma separated value)
validation_events_66.csv	Table of 66 validation events (ASCII comma separated value)
events_receivers_inversion.csv	Table of waveform (event-receiver-channel) data used in the inversion (ASCII comma separated value)
events_receivers_validation.csv	Table of waveform (event-receiver-channel) data used in the validation (ASCII comma separated value)

References

Afanasiev, M, C Boehm, M van Driel, L Krischer, M Rietmann, DA May, MG Knepley, and A Fichtner (2019). Modular and flexible spectral-element waveform modelling in two and three dimensions, Geophys. J. Int., 216(3), 1675–1692, doi: 10.1093/gji/ggy469

Ahrens, J., Geveci, B., & Law, C. (2005). Paraview: An end-user tool for large data visualization. The Visualization Handbook, 717(8). https://doi.org/10.1016/b978-012387582-2/50038-1

Hoyer, S., & Hamman, J. (2017). Xarray: N-D labeled arrays and datasets in Python. Journal of Open Research Software, 5(1). https://doi.org/10.5334/jors.148

Rodgers, A. (2023). Adjoint Waveform Tomography for Crustal and Upper Mantle Structure the Middle East and Southwest Asia for Improved Waveform Simulations Using Openly Available Broadband Data, technical report, LLNL-TR- 851939.

Acknowledgements

This project was support by Lawrence Livermore National Laboratory’s Laboratory Directed Research and Development project 20-ERD-008 and the National Nuclear Security Administration. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-MI-852402

d
Raw data of the SACLA XFEL experiment 2021/05, Proposal Number: 2021A8026 on...
b2find.dkrz.de
Updated Oct 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Raw data of the SACLA XFEL experiment 2021/05, Proposal Number: 2021A8026 on FeRh (Determination of sub‐ps lattice dynamics in FeRh thin films) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/c4cb6c82-252f-56be-b238-7dad353e8a4e
Explore at:
Dataset updated
Oct 20, 2023
Description
SACLA XFEL experiment 2021/05, Proposal Number: 2021A8026 FeRh magnetism/lattice, All raw data and metadata are assembled into HDF5 files. These include acquired frames of X-ray pixel array detectors, parameters, component positions and settings of the instruments. The data and metadata was obtained from the SACLA database (for the run number relevant for our experiment) with the SACLA data converter.
D
Data from: Data related to Panzer: A Machine Learning Based Approach to...
darus.uni-stuttgart.de
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Panzer; Tim Panzer (2024). Data related to Panzer: A Machine Learning Based Approach to Analyze Supersecondary Structures of Proteins [Dataset]. http://doi.org/10.18419/DARUS-4576
Explore at:
zip(233564534), zip(751065193), zip(1580497), txt(12722886780), txt(5189799861), application/x-ipynb+json(33477), application/x-ipynb+json(15623), txt(227517604), application/x-ipynb+json(3593)Available download formats
Unique identifier
https://doi.org/10.18419/DARUS-4576
Dataset updated
Nov 27, 2024
Dataset provided by
DaRUS
Authors
Tim Panzer; Tim Panzer
License
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4576
Time period covered
Nov 1, 1976 - Feb 29, 2024
Dataset funded by
DFG
Description
This entry contains the data used to implement the bachelor thesis. It was investigated how embeddings can be used to analyze supersecondary structures. Abstract of the thesis: This thesis analyzes the behavior of supersecondary structures in the context of embeddings. For this purpose, data from the Protein Topology Graph Library was provided with embeddings. This resulted in a structured graph database, which will be used for future work and analyses. In addition, different projections were made into the two-dimensional space to analyze how the embeddings behave there. In the Jupyter Notebook 1_data_retrival.ipynb the download process of the graph files from the Protein Topology Graph Library (https://ptgl.uni-frankfurt.de) can be found. The downloaded .gml files can also be found in graph_files.zip. These form graphs that represent the relationships of supersecondary structures in the proteins. These form the data basis for further analyses. These graph files are then processed in the Jupyter Notebook 2_data_storage_and_embeddings.ipynb and entered into a graph database. The sequences of the supersecondary and secondary structures from the PTGL can be found in fastas.zip. The embeddings were also calculated using the ESM model of the Facebook Research Group (huggingface.co/facebook/esm2_t12_35M_UR50D), which can be found in three .h5 files. These are then added there subsequently. The whole process in this notebook serves to build up the database, which can then be searched using Cypher querys. In the Jupyter Notebook 3_data_science.ipynb different visualizations and analyses are then carried out, which were made with the help of UMAP. For the installation of all dependencies, it is recommended to create a Conda environment and then install all packages there. To use the project, PyEED should be installed using the snapshot of the original repository (source repository: https://github.com/PyEED/pyeed). The best way to install PyEED is to execute the pip install -e . command in the pyeed_BT folder. The dependencies can also be installed by using poetry and the .toml file. In addition, seaborn, h5py and umap-learn are required. These can be installed using the following commands: pip install h5py==3.12.1 pip install seaborn==0.13.2 umap-learn==0.5.7
g
MLS/Aura Level 2 Diagnostics, Miscellaneous Grid V005 (ML2DGM) at GES DISC
gimi9.com
data.nasa.gov
+2more
Updated Dec 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MLS/Aura Level 2 Diagnostics, Miscellaneous Grid V005 (ML2DGM) at GES DISC [Dataset]. https://gimi9.com/dataset/data-gov_mls-aura-level-2-diagnostics-miscellaneous-grid-v005-ml2dgm-at-ges-disc
Explore at:
Dataset updated
Dec 5, 2024
Description
ML2DGM is the EOS Aura Microwave Limb Sounder (MLS) product containing the minor frame diagnostic quantities on a miscellaneous grid. These include items such as tangent pressure, chi-square describing various fits to the measured radiances, number of radiances used in various retrieval phases, etc. This product contains a second auxiliary file which includes cloud-induced radiances inferred for selected spectral channels. The data version is 5.0. Data coverage is from August 8, 2004 to current. Spatial coverage is near-global (-82 degrees to +82 degrees latitude), with each profile spaced 1.5 degrees or ~165 km along the orbit track (roughly 15 orbits per day). Vertical resolution varies between species and typically ranges from 3 - 6 km. Users of the ML2DGM data product should read the EOS MLS Level 2 Version 5 Quality Document for more information. The data are stored in the version 5 Hierarchical Data Format, or HDF5. Each file contains sets of HDF5 dataset objects (n-dimensional arrays) for each diagnostics measurement. The dataset objects represent data and geolocation fields; included in the file are file attributes and metadata. There are two files per day (MLS-Aura_L2AUX-DGM and MLS-Aura_L2AUX-Cloud).
e
Advanced Terrestrial Simulator (ATS) evaluation dataset at 7 catchments...
knb.ecoinformatics.org
dataone.org
+1more
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soumendra Bhanja; Ethan Coon; Scott Painter (2023). Advanced Terrestrial Simulator (ATS) evaluation dataset at 7 catchments across the continental United States [Dataset]. http://doi.org/10.15485/1872248
Explore at:
Unique identifier
https://doi.org/10.15485/1872248
Dataset updated
Apr 7, 2023
Dataset provided by
ESS-DIVE
Authors
Soumendra Bhanja; Ethan Coon; Scott Painter
Time period covered
Oct 1, 1980 - Sep 30, 2014
Area covered

Description
This dataset comprises of the input files and other files required for Advanced Terrestrial Simulator (ATS) simulations at 7 catchments across the continental United States. ATS is an integrated surface-subsurface hydrology model. We include Jupyter notebooks (within scripts folder) for individual catchments showing information (including data sources, river network, soil, geology, landuse types etc.) on preparing the machine readable input files. ATS observation output files are provided in the output folder. Figures and analyses (.xlsx sheets) are also provided. The catchments include, Taylor River Upstream (Colorado); (b) Cossatot River (Arkansas); (c) Panther Creek (Alabama); (d) Little Tennessee River (North Carolina and Georgia); (e) Mayo River (Virginia); (f) Flat Brook (New Jersey); (g) Neversink River headwaters (New York). Readme files are provided inside the directories providing more details. Files types include: .xml, .h5, .xlsx, .png, .ipynb, .py, .nc, .txt. All of the files types can be accessed by open source software, details on software requirements are following: .xml (any text editors including notepad and textedit), .h5 (in python using hdf libraries), .xlsx (WPS Office Spreadsheets, OpenOffice Calc, LibreOffice Calc, Microsoft Office etc.), .png (any image viewer), .ipynb (Jupyter notebook), .py (any text editors including notepad and textedit), .nc (using python or other open source software).
Z
PPMLES – Perturbed-Parameter ensemble of MUST Large-Eddy Simulations
data.niaid.nih.gov
zenodo.org
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rochoux, Melanie (2024). PPMLES – Perturbed-Parameter ensemble of MUST Large-Eddy Simulations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11394346
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
Jaravel, Thomas
Rochoux, Melanie
Lumet, Eliott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset description

This repository contains the PPMLES (Perturbed-Parameter ensemble of MUST Large-Eddy Simulations) dataset, which corresponds to the main outputs of 200 large-eddy simulations (LES) of microscale pollutant dispersion that replicate the MUST field experiment [Biltoft. 2001, Yee and Biltoft. 2004] for varying meteorological forcing parameters.

The goal of the PPMLES dataset is to provide a comprehensive dataset to better understand the complex interactions between the atmospheric boundary layer (ABL), the urban environment, and pollutant dispersion. It was originally used to assess the impact of the meteorological uncertainty on microscale pollutant prediction and to build a surrogate model that can replace the costly LES model [Lumet et al. 2024b]. The total computational cost of the PPMLES dataset is estimated to be about 6 million core hours.

For each sample of meteorological forcing parameters (inlet wind direction and friction velocity), the AVBP solver code [Schonfeld and Rudgyard. 1999, Gicquel et al. 2011] was used to perform LES at very high spatio-temporal resolution (1e-3s time step, 30cm discretization length) to provide a fine representation of the pollutant concentration and wind velocity statistics within the urban-like canopy. The total computational cost of the PPMLES dataset is estimated to be about 6 million core hours.

File list

The data is stored in HDF5 files, which can be efficiently processed in Python using the h5py module.

input_parameters.h5: list of the 200 input parameter samples (alpha_inlet, ustar) obtained using the Halton sequence that defines the PPMLES ensemble.

ave_fields.h5: lists of the main field statistics predicted by each of the 200 LES samples over the 200-s reference window [Yee and Biltoft. 2004], including:

c: the time-averaged pollutant concentration in ppmv (dim = (n_samples, n_nodes) = (200, 1878585)),

(u, v, w): the time-averaged wind velocity components in m/s,

crms: the root mean square concentration fluctuations in ppmv,

tke: the turbulent kinetic energy in m^2/s^2,

(uprim_cprim, vprim_cprim, wprim_cprim): the pollutant turbulent transport components

uncertainty.h5: lists of the estimated aleatory uncertainty induced by the internal variability of the LES (variability_#) [Lumet et al. 2024a] for each of the fields in ave_fields.h5. Also includes the stationary bootstrap [Politis and Romano. 1994] parameters (n_replicates, block_length) used to estimate the uncertainty for each field and each sample.

mesh.h5: the tetrahedral mesh on which the fields are discretized, composed of about 1.8 millions of nodes.

time_series.h5: HDF5 file consisting of 200 groups (Sample_NNN) each containing the time series of the pollutant concentration (c) and wind velocity components (u, v, w) predicted by the LES sample #NNN at 93 locations.

probe_network.dat: provides the location of each of the 93 probes corresponding to the positions of the experimental campaign sensors [Biltoft. 2001].

Code examples

A) Dataset reading

Imports

import h5py import numpy as np

Load the input parameters list into a numpy array (shape = (200, 2))

inputf = h5py.File('PPMLES/input_parameters.h5', 'r') input_parameters = np.array((inputf['alpha_inlet'], inputf['friction_velocity'])).T### Load the domain mesh node coordinatesmeshf = h5py.File('../PPMLES/mesh.h5', 'r')mesh_nodes = np.array((meshf['Nodes']['x'], meshf['Nodes']['y'], meshf['Nodes']['z'])).T

Load the set of time-averaged LES fields and their associated uncertainty

var = 'c' # Can be: 'c', 'u', 'v', 'w', 'crms', 'tke', 'uprim_cprim', 'vprim_cprim', or 'wprim_cprim' fieldsf = h5py.File('PPMLES/ave_fields.h5', 'r') fields_list = fieldsf[var] uncertaintyf = h5py.File('PPMLES/uncertainty_ave_fields.h5', 'r') uncertainty_list = uncertaintyf[var]

Time series reading example

timeseriesf = h5py.File('PPMLES/time_series.h5', 'r') var = 'c' # Can be: 'c', 'u', 'v', or 'w' probe = 32 # Integer between 0 and 92, see probe_network.csv time_list = [] time_series_list = [] for i in range(200): time_list.append(np.array(timeseriesf[f'Sample_{i+1:03}']['time'])) time_series_list.append(np.array(timeseriesf[f'Sample_{i+1:03}'][var][probe]))

B) Interpolation of one-field from the unstructured grid to a new structured grid

Imports

import h5py import numpy as np from scipy.interpolate import griddata

Load the mean concentration field sample #028

fieldsf = h5py.File('PPMLES/ave_fields.h5', 'r') c = fieldsf['c'][27]

Load the unstructured grid

meshf = h5py.File('PPMLES/mesh.h5', 'r') unstructured_nodes = np.array((meshf['Nodes']['x'], meshf['Nodes']['y'], meshf['Nodes']['z'])).T

Structured grid definition

x0, y0, z0 = -16.9, -115.7, 0. lx, ly, lz = 205.5, 232.1, 20. resolution = 0.75 x_grid, y_grid, z_grid = np.meshgrid(np.linspace(x0, x0 + lx, int(lx/resolution)), np.linspace(y0, y0 + ly, int(ly/resolution)), np.linspace(z0, z0 + lz, int(lz/resolution)), indexing='ij')

Interpolation of the field on the new grid

c_interpolated = griddata(unstructured_nodes, c, (x_grid.flatten(), y_grid.flatten(), z_grid.flatten()), method='nearest')

C) Expression of all time series over the same time window with the same time discretization

Imports

import h5py import numpy as np from scipy.interpolate import griddata

Define a common time discretization over the 200-s analysis period

common_time = np.arange(0., 200., 0.05) u_series_list = np.zeros((200, np.shape(common_time)[0]))

Interpolate the u-compnent velocity time series at probe DPID10 over this time discretization

timeseriesf = h5py.File('PPMLES/time_series.h5', 'r') for i in range(200): sample_time = np.array(timeseriesf[f'Sample_{i+1:03}']['time']) -
np.array(timeseriesf[f'Sample_{i+1:03}']['Parameters']['t_spinup']) # Offset the spinup time u_series_list[i] = griddata(sample_time, timeseriesf[f'Sample_{i+1:03}']['u'][9], common_time, method='linear')

D) Surrogate model construction example

The training and validation of a POD-GPR surrogate model [Marrel et al. 2015] learning from the PPMLES dataset is given in the following GitHub repository. This surrogate model was successfully used by Lumet et al. 2024b to emulate the LES mean concentration prediction for varying meteorological forcing parameters.

Acknowledgments

This work was granted access to the HPC resources from GENCI-TGCC/CINES (A0062A10822, project 2020-2022). The authors would like to thank Olivier Vermorel for the preliminary development of the LES model, and Simon Lacroix for his proofreading.
IceBridge UAF L1B HF Geolocated Radar Echo Strength Profiles V001 - Dataset...
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Feb 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.staging.idas-ds1.appdat.jsc.nasa.gov (2025). IceBridge UAF L1B HF Geolocated Radar Echo Strength Profiles V001 - Dataset - NASA Open Data Portal [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/icebridge-uaf-l1b-hf-geolocated-radar-echo-strength-profiles-v001
Explore at:
Dataset updated
Feb 18, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This data set contains radar echograms acquired by the University of Alaska Fairbanks High-Frequency Radar Sounder over select glaciers in Alaska. The data are provided in HDF5 formatted files, which include important metadata for interpreting the data. Browse images are also available.
Wi-Fi and Bluetooth I/Q Recordings in the 2.4 GHz and 5 GHz Bands with...
catalog.data.gov
data.nist.gov
Updated Dec 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Wi-Fi and Bluetooth I/Q Recordings in the 2.4 GHz and 5 GHz Bands with Low-Cost Software Defined Radios [Dataset]. https://catalog.data.gov/dataset/wi-fi-and-bluetooth-i-q-recordings-in-the-2-4-ghz-and-5-ghz-bands-with-low-cost-software-d
Explore at:
Dataset updated
Dec 8, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This dataset consists of baseband in-phase/quadrature (I/Q) radio frequency recordings of Wi-Fi and Bluetooth radiated emissions in the 2.4 GHz and 5 GHz unlicensed bands collected with low-cost software defined radios. A NIST technical note describing the data collection methods is pending publication. All I/Q captures are one second in duration, with a sampling rate of 30 mega samples per second (MS/s), and a center frequency of 2437 MHz for the 2.4 GHz band captures and 5825 MHz for the 5 GHz band captures. In total, the data consist of 900 one second captures, organized into five Hierarchical Data Format 5 (HDF5) files, where each HDF5 file has a size of 20.1 GB and consists of 180 one second captures. There is a metadata file associated with each data file in comma-separated values (CSV) format that contains relevant parameters such as center frequency, bandwidth, sampling rate, bit depth, receive gain, antenna and hardware information. There are two additional CSV files containing estimated gain calibration and noise floor values.
d
Flow Redirection and Induction in Steady State (FLORIS) Wind Plant Power...
catalog.data.gov
data.openei.org
+3more
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flow Redirection and Induction in Steady State (FLORIS) Wind Plant Power Production Data Sets [Dataset]. https://catalog.data.gov/dataset/flow-redirection-and-induction-in-steady-state-floris-wind-plant-power-production-data-set
Explore at:
Dataset updated
Jan 3, 2024
Dataset provided by
National Renewable Energy Laboratory
Description
This dataset contains turbine- and plant-level power outputs for 252,500 cases of diverse wind plant layouts operating under a wide range of yawing and atmospheric conditions. The power outputs were computed using the Gaussian wake model in NREL's FLOw Redirection and Induction in Steady State (FLORIS) model, version 2.3.0. The 252,500 cases include 500 unique wind plants generated randomly by a specialized Plant Layout Generator (PLayGen) that samples randomized realizations of wind plant layouts from one of four canonical configurations: (i) cluster, (ii) single string, (iii) multiple string, (iv) parallel string. Other wind plant layout parameters were also randomly sampled, including the number of turbines (25-200) and the mean turbine spacing (3D-10D, where D denotes the turbine rotor diameter). For each layout, 500 different sets of atmospheric conditions were randomly sampled. These include wind speed in 0-25 m/s, wind direction in 0 deg.-360 deg., and turbulence intensity chosen from low (6%), medium (8%), and high (10%). For each atmospheric inflow scenario, the individual turbine yaw angles were randomly sampled from a one-sided truncated Gaussian on the interval 0 deg.-30 deg. oriented relative to wind inflow direction. This random data is supplemented with a collection of yaw-optimized samples where FLORIS was used to determine turbine yaw angles that maximize power production for the entire plant. To generate this data, a subset of cases were selected (50 atmospheric conditions from 50 layouts each for a total of additional 2,500 cases) for which FLORIS was re-run with wake steering control optimization. The IEA onshore reference turbine, which has a 130 m rotor diameter, a 110 m hub height, and a rated power capacity of 3.4 MW was used as the turbine for all simulations. The simulations were performed using NREL's Eagle high performance computing system in February 2021 as part of the Spatial Analysis for Wind Technology Development project funded by the U.S. Department of Energy Wind Energy Technologies Office. The data was collected, reformatted, and preprocessed for this OEDI submission in May 2023 under the Foundational AI for Wind Energy project funded by the U.S. Department of Energy Wind Energy Technologies Office. This dataset is intended to serve as a benchmark against which new artificial intelligence (AI) or machine learning (ML) tools may be tested. Baseline AI/ML methods for analyzing this dataset have been implemented, and a link to their repository containing those models has been provided. The .h5 data file structure can be found in the GitHub repository under explore_wind_plant_data_h5.ipynb.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniele Trappolini; Daniele Trappolini (2024). STEAD subsample 4 CDiffSD [Dataset]. http://doi.org/10.5281/zenodo.11094536

STEAD subsample 4 CDiffSD

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.11094536

Dataset updated

Apr 30, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Daniele Trappolini; Daniele Trappolini

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Apr 15, 2024

Description

STEAD Subsample Dataset for CDiffSD Training

Overview

This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.

Dataset Files

The dataset includes the following files:

train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.
noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.
test: Used for the testing phase, structured similarly to train.
noise_test: Used for the testing phase, contains noise data for testing.

Each file is structured to support the training and evaluation of seismic denoising models.

Data

The HDF5 files named noise contain two main datasets:

traces: This dataset includes N number of events, with each event being 6000 in size, representing the length of the traces. Each trace is organized into three channels in the following order: E (East-West), N (North-South), Z (Vertical).
metadata: This dataset contains the names of the traces for each event.

Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:

p_arrival: Contains the arrival indices of P-waves, expressed in counts.
s_arrival: Contains the arrival indices of S-waves, also expressed in counts.

Usage

To load these files in a Python environment, use the following approach:

```python

import h5py
import numpy as np

# Open the HDF5 file in read mode
with h5py.File('train_noise.hdf5', 'r') as file:
# Print all the main keys in the file
print("Keys in the HDF5 file:", list(file.keys()))

if 'traces' in file:
# Access the dataset
data = file['traces'][:10] # Load the first 10 traces

if 'metadata' in file:
# Access the dataset
trace_name = file['metadata'][:10] # Load the first 10 metadata entries```

Ensure that the path to the file is correctly specified relative to your Python script.

Requirements

To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

```bash
pip install numpy
pip install h5py
```

Clear search

Close search

Google apps

Main menu

STEAD subsample 4 CDiffSD

STEAD Subsample Dataset for CDiffSD Training

Overview

Dataset Files

Data

Usage

Requirements

Data from: Robotic manipulation datasets for offline compositional...

Simulated calibration dataset for 4D scanning transmission electron...

Data from: UA - Gaussian Depth Disc (GDD dataset)

Dataset: Nanoscale crystal grain characterization via linear polarization...

modelforge curated dataset: SPICE 1 OpenFF

Curated SPICE 1 OpenFF Dataset:

1000 conformer test set restricted to elements H, C, N, O, F, Cl, S, version "nc_1000_v1_HCNOFClS":

Source Dataset:

Citations:

CODE-test: An annotated 12-lead ECG dataset

Annotated 12 lead ECG dataset

```

Folder content:

Custom Silicone Mask Attack Dataset (CSMAD)

BirdVox-70k: a dataset for species-agnostic flight call detection in...

BirdVox-70k: a dataset for avian flight call detection in half-second clips

Created By

Description

Data Files

Metadata Files

Please acknowledge BirdVox-70k in academic research

Conditions of Use

Feedback

Acknowledgements

Magnetotelluric Data from the Gabbs Valley Region, Nevada, 2020

Data from: Utah FORGE: Neubrex Well 16B(78)-32 DAS Data - April 2024

Dataset for "Adjoint Waveform Tomography for Crustal and Upper Mantle...

Raw data of the SACLA XFEL experiment 2021/05, Proposal Number: 2021A8026 on...

Data from: Data related to Panzer: A Machine Learning Based Approach to...

MLS/Aura Level 2 Diagnostics, Miscellaneous Grid V005 (ML2DGM) at GES DISC

Advanced Terrestrial Simulator (ATS) evaluation dataset at 7 catchments...

PPMLES – Perturbed-Parameter ensemble of MUST Large-Eddy Simulations

Imports

Load the input parameters list into a numpy array (shape = (200, 2))

Load the set of time-averaged LES fields and their associated uncertainty

Time series reading example

Imports

Load the mean concentration field sample #028

Load the unstructured grid

Structured grid definition

Interpolation of the field on the new grid

Imports

Define a common time discretization over the 200-s analysis period

Interpolate the u-compnent velocity time series at probe DPID10 over this time discretization

IceBridge UAF L1B HF Geolocated Radar Echo Strength Profiles V001 - Dataset...

Wi-Fi and Bluetooth I/Q Recordings in the 2.4 GHz and 5 GHz Bands with...

Flow Redirection and Induction in Steady State (FLORIS) Wind Plant Power...

STEAD subsample 4 CDiffSD

STEAD Subsample Dataset for CDiffSD Training

Overview

Dataset Files

Data

Usage

Requirements