2 datasets found
  1. STEAD subsample 4 CDiffSD

    • zenodo.org
    bin
    Updated Apr 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele Trappolini; Daniele Trappolini (2024). STEAD subsample 4 CDiffSD [Dataset]. http://doi.org/10.5281/zenodo.11094536
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniele Trappolini; Daniele Trappolini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 15, 2024
    Description

    STEAD Subsample Dataset for CDiffSD Training

    Overview

    This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.

    Dataset Files

    The dataset includes the following files:

    • train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.
    • noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.
    • test: Used for the testing phase, structured similarly to train.
    • noise_test: Used for the testing phase, contains noise data for testing.

    Each file is structured to support the training and evaluation of seismic denoising models.

    Data

    The HDF5 files named noise contain two main datasets:

    • traces: This dataset includes N number of events, with each event being 6000 in size, representing the length of the traces. Each trace is organized into three channels in the following order: E (East-West), N (North-South), Z (Vertical).
    • metadata: This dataset contains the names of the traces for each event.

    Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:

    • p_arrival: Contains the arrival indices of P-waves, expressed in counts.
    • s_arrival: Contains the arrival indices of S-waves, also expressed in counts.


    Usage

    To load these files in a Python environment, use the following approach:

    ```python

    import h5py
    import numpy as np

    # Open the HDF5 file in read mode
    with h5py.File('train_noise.hdf5', 'r') as file:
    # Print all the main keys in the file
    print("Keys in the HDF5 file:", list(file.keys()))

    if 'traces' in file:
    # Access the dataset
    data = file['traces'][:10] # Load the first 10 traces

    if 'metadata' in file:
    # Access the dataset
    trace_name = file['metadata'][:10] # Load the first 10 metadata entries```

    Ensure that the path to the file is correctly specified relative to your Python script.

    Requirements

    To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

    ```bash
    pip install numpy
    pip install h5py
    ```

  2. Curated list of HAR datasets

    • zenodo.org
    • data.niaid.nih.gov
    bin, text/x-python
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matej Králik; Matej Králik (2025). Curated list of HAR datasets [Dataset]. http://doi.org/10.5281/zenodo.3831958
    Explore at:
    bin, text/x-pythonAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matej Králik; Matej Králik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A curated list of preprocessed & ready to use under a minute Human Activity Recognition datasets.

    All the datasets are preprocessed in HDF5 format, created using the h5py python library. Scripts used for data preprocessing are provided as well (Load.ipynb and load_jordao.py)

    Each HDF5 file contains at least the keys:

    • x a single array of size [sample count, temporal length, sensor channel count], contains the actual sensor data. Metadata contains the names of individual sensor channel count. All samples are zero-padded for constant length in the file, original lengths before padding available under the meta keys.
    • y a single array of size [sample count] with integer values for target classes (zero-based). Metadata contains the names of the target classes.
    • meta contain various metadata, depends on the dataset (original length before padding, subject no., trial no., etc.)

    Usage example

    import h5py
    
    with h5py.File(f'data/waveglove_multi.h5', 'r') as h5f:
       x = h5f['x']
       y = h5f['y']['class']
       print(f'WaveGlove-multi: {x.shape[0]} samples')
       print(f'Sensor channels: {h5f["x"].attrs["channels"]}')
       print(f'Target classes: {h5f["y"].attrs["labels"]}')
       first_sample = x[0]
    # Output:   
    # WaveGlove-multi: 10044 samples
    # Sensor channels: ['acc1-x' 'acc1-y' 'acc1-z' 'gyro1-x' 'gyro1-y' 'gyro1-z' 'acc2-x'
    # 'acc2-y' 'acc2-z' 'gyro2-x' 'gyro2-y' 'gyro2-z' 'acc3-x' 'acc3-y'
    # 'acc3-z' 'gyro3-x' 'gyro3-y' 'gyro3-z' 'acc4-x' 'acc4-y' 'acc4-z'
    # 'gyro4-x' 'gyro4-y' 'gyro4-z' 'acc5-x' 'acc5-y' 'acc5-z' 'gyro5-x'
    # 'gyro5-y' 'gyro5-z']
    # Target classes: ['null' 'hand swipe left' 'hand swipe right' 'pinch in' 'pinch out'
    # 'thumb double tap' 'grab' 'ungrab' 'page flip' 'peace' 'metal']
    

    Current list of datasets:

    • WaveGlove-single (waveglove_single.h5)
    • WaveGlove-multi (waveglove_multi.h5)
    • uWave (uwave.h5)
    • OPPORTUNITY (opportunity.h5)
    • PAMAP2 (pamap2.h5)
    • SKODA (skoda.h5)
    • MHEALTH (non overlapping windows) (mhealth.h5)
    • Six datasets with all four predefined train/test folds
      as preprocessed by Jordao et al. originally in WearableSensorData
      (FNOW, LOSO, LOTO and SNOW prefixed .h5 files)
  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniele Trappolini; Daniele Trappolini (2024). STEAD subsample 4 CDiffSD [Dataset]. http://doi.org/10.5281/zenodo.11094536
Organization logo

STEAD subsample 4 CDiffSD

Explore at:
binAvailable download formats
Dataset updated
Apr 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Daniele Trappolini; Daniele Trappolini
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Apr 15, 2024
Description

STEAD Subsample Dataset for CDiffSD Training

Overview

This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.

Dataset Files

The dataset includes the following files:

  • train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.
  • noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.
  • test: Used for the testing phase, structured similarly to train.
  • noise_test: Used for the testing phase, contains noise data for testing.

Each file is structured to support the training and evaluation of seismic denoising models.

Data

The HDF5 files named noise contain two main datasets:

  • traces: This dataset includes N number of events, with each event being 6000 in size, representing the length of the traces. Each trace is organized into three channels in the following order: E (East-West), N (North-South), Z (Vertical).
  • metadata: This dataset contains the names of the traces for each event.

Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:

  • p_arrival: Contains the arrival indices of P-waves, expressed in counts.
  • s_arrival: Contains the arrival indices of S-waves, also expressed in counts.


Usage

To load these files in a Python environment, use the following approach:

```python

import h5py
import numpy as np

# Open the HDF5 file in read mode
with h5py.File('train_noise.hdf5', 'r') as file:
# Print all the main keys in the file
print("Keys in the HDF5 file:", list(file.keys()))

if 'traces' in file:
# Access the dataset
data = file['traces'][:10] # Load the first 10 traces

if 'metadata' in file:
# Access the dataset
trace_name = file['metadata'][:10] # Load the first 10 metadata entries```

Ensure that the path to the file is correctly specified relative to your Python script.

Requirements

To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

```bash
pip install numpy
pip install h5py
```

Search
Clear search
Close search
Google apps
Main menu