Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A curated list of preprocessed & ready to use under a minute Human Activity Recognition datasets.
All the datasets are preprocessed in HDF5 format, created using the h5py python library. Scripts used for data preprocessing are provided as well (Load.ipynb and load_jordao.py)
Each HDF5 file contains at least the keys:
x a single array of size [sample count, temporal length, sensor channel count], contains the actual sensor data. Metadata contains the names of individual sensor channel count. All samples are zero-padded for constant length in the file, original lengths before padding available under the meta keys.
y a single array of size [sample count] with integer values for target classes (zero-based). Metadata contains the names of the target classes.
meta contain various metadata, depends on the dataset (original length before padding, subject no., trial no., etc.)
Usage example
import h5py
with h5py.File(f'data/waveglove_multi.h5', 'r') as h5f: x = h5f['x'] y = h5f['y']['class'] print(f'WaveGlove-multi: {x.shape[0]} samples') print(f'Sensor channels: {h5f["x"].attrs["channels"]}') print(f'Target classes: {h5f["y"].attrs["labels"]}') first_sample = x[0]
Current list of datasets:
WaveGlove-single (waveglove_single.h5)
WaveGlove-multi (waveglove_multi.h5)
uWave (uwave.h5)
OPPORTUNITY (opportunity.h5)
PAMAP2 (pamap2.h5)
SKODA (skoda.h5)
MHEALTH (non overlapping windows) (mhealth.h5)
Six datasets with all four predefined train/test folds as preprocessed by Jordao et al. originally in WearableSensorData (FNOW, LOSO, LOTO and SNOW prefixed .h5 files)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Study information Design ideation study (N = 24) using eye tracking technology. Participants solved a total of twelve design problems while receiving inspirational stimuli on a monitor. Their task was to generate as many solutions to each problem and explain their solution briefly by thinking aloud. The study allows for getting further insight into how inspirational stimuli improve idea fluency during design ideation. This dataset features processed data from the experiment. Eye tracking data includes gaze data, fixation data, blink data, and pupillometry data for all participants. The study is based on the following research paper and follows the same experimental setup: Goucher-Lambert, K., Moss, J., & Cagan, J. (2019). A neuroimaging investigation of design ideation with and without inspirational stimuli—understanding the meaning of near and far stimuli. Design Studies, 60, 1-38. DOI Dataset Most files in the dataset are saved as CSV files or other human readable file formats. Large files are saved in Hierarchical Data Format (HDF5/H5) to allow for smaller file sizes and higher compression. All data is described thoroughly in 00_ReadMe.txt. The following processed data is included in the dataset: Concatenated annotations file of experimental flow for all participants (CSV). All eye tracking raw data in concatenated files. Annotated with only participant ID. (CSV/HDF5) Annotated eye tracking data for ideation routines only. A subset of the files above. (CSV/HDF5) Audio transcriptions from Google Cloud Speech-to-Text API of each recording with annotations. (CSV) Raw API response for each transcription. These files include time offset for each word in a recording. (JSON) Data for questionnaire feedback and ideas generated during the experiment. (CSV) Data for the post-experiment survey, including demographic information (TSV). Python code used for the open-source experimental setup and dataset construction is hosted at GitHub. Repository also includes code of how the dataset has been further processed.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is derived from the ISIC Archive with the following changes:
If the "benign_malignant" column is null and the "diagnosis" column is "vascular lesion", the target is set to null.
DISCLAIMER I'm not a dermatologist and I'm not affiliated with ISIC in any way. I don't know if my approach to setting the target value is acceptable by the ISIC competition. Use at your own risk.
import os
import multiprocessing as mp
from PIL import Image, ImageOps
import glob
from functools import partial
def list_jpg_files(folder_path):
# Ensure the folder path ends with a slash
if not folder_path.endswith('/'):
folder_path += '/'
# Use glob to find all .jpg files in the specified folder (non-recursive)
jpg_files = glob.glob(folder_path + '*.jpg')
return jpg_files
def resize_image(image_path, destination_folder):
# Open the image file
with Image.open(image_path) as img:
# Get the original dimensions
original_width, original_height = img.size
# Calculate the aspect ratio
aspect_ratio = original_width / original_height
# Determine the new dimensions based on the aspect ratio
if aspect_ratio > 1:
# Width is larger, so we will crop the width
new_width = int(256 * aspect_ratio)
new_height = 256
else:
# Height is larger, so we will crop the height
new_width = 256
new_height = int(256 / aspect_ratio)
# Resize the image while maintaining the aspect ratio
img = img.resize((new_width, new_height))
# Calculate the crop box to center the image
left = (new_width - 256) / 2
top = (new_height - 256) / 2
right = (new_width + 256) / 2
bottom = (new_height + 256) / 2
# Crop the image if it results in shrinking
if new_width > 256 or new_height > 256:
img = img.crop((left, top, right, bottom))
else:
# Add black edges if it results in scaling up
img = ImageOps.expand(img, border=(int(left), int(top), int(left), int(top)), fill='black')
# Resize the image to the final dimensions
img = img.resize((256, 256))
img.save(os.path.join(destination_folder, os.path.basename(image_path)))
source_folder = ""
destination_folder = ""
images = list_jpg_files(source_folder)
with mp.Pool(processes=12) as pool:
images = pool.map(partial(resize_image, destination_folder=destination_folder), images)
print("All images resized")
This code will shrink (down-sample) the image if it is larger than 256x256. But if the image is smaller than 256x256, it will add either vertical or horizontal black edges after scaling up the image. In both scenarios, it will keep the center of the input image in the center of the output image.
The HDF5 file is created using the following code:
import os
import pandas as pd
from PIL import Image
import h5py
import io
import numpy as np
# File paths
base_folder = "./isic-2020-256x256"
csv_file_path = 'train-metadata.csv'
image_folder_path = 'train-image/image'
hdf5_file_path = 'train-image.hdf5'
# Read the CSV file
df = pd.read_csv(os.path.join(base_folder, csv_file_path))
# Open an HDF5 file
with h5py.File(os.path.join(base_folder, hdf5_file_path), 'w') as hdf5_file:
for index, row in df.iterrows():
isic_id = row['isic_id']
image_file_path = os.path.join(base_folder, image_folder_path, f'{isic_id}.jpg')
if os.path.exists(image_file_path):
# Open the image file
with Image.open(image_file_path) as img:
# Convert the image to a byte buffer
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format=img.format)
img_byte_arr = img_byte_arr.getvalue()
hdf5_file.create_dataset(isic_id, data=np.void(img_byte_arr))
else:
print(f"Image file for {isic_id} not found.")
print("HDF5 file created successfully.")
To read the hdf5 file, use the following code:
import h5py
from PIL import Image
with h...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository provides the underlying data and neural network training scripts associated with the manuscript titled "A Transformer Network for High-Throughput Materials Characterization with X-ray Photoelectron Spectroscopy" by Simperl and Werner published in the Journal of Applied Physics (https://doi.org/10.1063/5.0296600) (2025)
All data files are released under the Creative Commons Attribution 4.0 International (CC-BY) license, while all code files are distributed under the MIT license.
The repository contains simulated X-ray photoelectron spectroscopy (XPS) spectra stored as hdf5 files in the zipped (h5_files.zip) folder, which was generated using the software developed by the authors. The NIST Standard Reference Database 100 – Simulation of Electron Spectra for Surface Analysis (SESSA) is freely available at https://www.nist.gov/srd/nist-standard-reference-database-100.
The neural network architecture is implemented using the PyTorch Lightning framework and is fully available within the attached materials as Transformer_SimulatedSpectra.py contained in the python_scripts.zip.
The trained model and the list of materials for the train, test and validation sets are contained in the models.zip folder.
The repository contains all the data necessary to replot the figures from the manuscript. These data are available in the form of .csv files or .h5 files for the spectra. In addition, the repository also contains a Python script (Plot_Data_Manuscript.ipynb) which is contained in the python_scripts.zip file.
The dataset and accompanying Python code files included in this repository were used to train a transformer-based neural network capable of directly inferring chemical concentrations from simulated survey X-ray photoelectron spectroscopy (XPS) spectra of bulk compounds.
The spectral dataset provided here represents the raw output from the SESSA software (version 2.2.2), prior to the normalization procedure described in the associated manuscript. This step of normalisation is of paramount importance for the effective training of the neural network.
The repository contains the Python scripts utilised to execute the spectral simulations and the neural network training on the Vienna Scientific Cluster (VSC5) which is part of the Austrian Scientific Computing Infrastructure (ASC). In order to obtain guidance on the proper configuration of the Command Line Interface (CLI) tools required for SESSA, users are advised to consult the official SESSA manual, which is available at the following address: https://nvlpubs.nist.gov/nistpubs/NSRDS/NIST.NSRDS.100-2024.pdf.
To run the neural network training we provided the requirements_nn_training.txt file that contains all the necessary python packages and version numbers. All other python scripts can be run locally with the python libraries listed in requirements_data_analysis.txt.
HDF5 (in zip folder): As described in the manuscript, we simulate X-ray photoelectron spectra for each of the 7,587 inorganic [1] and organic [2] materials in our dataset. To reflect realistic experimental conditions, each simulated spectrum was augmented by systematically varying parameters such as peak width, peak shift, and peak type—all configurable within the SESSA software—as well as by applying statistical Poisson noise to simulate varying signal-to-noise ratios. These modifications account for experimentally observed and material-specific spectral broadening, peak shifts, and detector-induced noise. Each material is represented by an individual HDF5 (.h5) file, named according to its chemical formula and mass density (in g/cm³). For example, the file for SiO2 with a density of 2.196 gcm-3 is named SiO2_2.196.h5. For more complex chemical formulas, such as Co(ClO4)2 with a density of 3.33 gcm-3, the file is named Co_ClO4_2_3.33.h5. Within each HDF5 file, the metadata for each spectrum is stored alongside a fixed energy axis and the corresponding intensity values. The spectral data are organized hierarchically by augmentation parameters in the following directory structure, e.g. for Ac_10.0.h5 we have SNR_0/WIDTH_0.3/SHIFT_-3.0/PEAK_gauss/Ac_10.0/. These files can be easily inspected with H5Web in Visual Studio Code or using h5py in Python or any other h5 interpretable program.
Session Files: The .ses files are SESSA specific input files that can be directly loaded into SESSA to specify certain input parameters for the initilization (ini), the geometry (geo) and the simulation parameters (sim_para) and are required by the python script Simulation_Script_VSC_json.py to run the simulation on the cluster.
Json Files: The two json files (MaterialsListVSC_gauss.json, MaterialsListVSC_lorentz.json) are used as the input files to the Python script Simulation_Script_VSC_json.py. These files contain all the material specific information for the SESSA simulation.
csv files: The csv files are used to generate the plots from the manuscript described in the section "Plotting Scripts".
npz files: The two .npz files (element_counts.npz, single_elements.npz) are python arrays that are needed by the Transformer_SimulatedSpectra.py script and contain the number of each single element in the dataset and an array of each single element present, respectively.
There is one python file that sets the communication with SESSA:
Simulation_Script_VSC_json.py: This script uses the functions of the VSC_function.py script (therefore needs to be placed in the same directory as this script) and can be called with the following command:
python3 Simulation_Script_VSC_json.py MaterialsListVSC_gauss.json 0
It simulates the spectrum for the material at index 0 in the .json file and with the corresponding parameters specified in the .json file.
It is important that before running this script the following paths need to be specified:
To run SESSA on a computing cluster it is important to have a working Xvfb (virtual frame buffer) or a similar tool available to which any graphical output from SESSA can be written to.
Before running the training script it is important to normalize the data such that the squared integral of the spectrum is 1 (as described in the manuscript) and shown in the code: normalize_spectra.py
For the neural network training we use the Transformer_SimulatedSpectra.py where the external functions used are specified in external_functions.py. This script contains the full description of the neural network architecture, the hyperparameter tuning and the Wandb logging.
In the models.zip folder the fully trained network final_trained_model.ckpt presented in the manuscript is available as well as the list of training, validation and testing materials (test_materials_list.pt, train_materials_list.pt, val_materials_list.pt) where the corresponding spectra are extracted from the hdf5 files. The file types .ckpt and .pt can be read in by using the pytorch specific load functions in Python, e.g.
torch.load(train_materials_list)
normalize_spectra.py: To run this script properly it is important to set up a python environment with the necessary libraries specified in the requirements_data_analysis.txt file. Then it can be called with
python3 normalize_spectra.py
where it is important to specify the path to the .h5 files containing the unnormalized spectra.
Transformer_SimulatedSpectra.py: To run this script properly on the cluster it is important to set up a python environment with the necessary libraries specified in the requirements_nn_training.txt file. This script also relies on external_functions.py, single_elements.npz and element_counts.npz (that should be placed in the same directory as the python script) file. This is important for creating the datasets for training, validation and testing and ensures that all the single elements appear in the testing set. You can call this script (on the cluster) within a slurm script to start the GPU training.
python3 Transformer_SimulatedSpectra.py
It is important that before running this script the following paths need to be specified:
Facebook
TwitterThe goal of introducing the Rescaled CIFAR-10 dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.
The Rescaled CIFAR-10 dataset was introduced in the paper:
[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.
with a pre-print available at arXiv:
[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.
Importantly, the Rescaled CIFAR-10 dataset contains substantially more natural textures and patterns than the MNIST Large Scale dataset, introduced in:
[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2
and is therefore significantly more challenging.
The Rescaled CIFAR-10 dataset is provided on the condition that you provide proper citation for the original CIFAR-10 dataset:
[4] Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., University of Toronto.
and also for this new rescaled version, using the reference [1] above.
The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.
The Rescaled CIFAR-10 dataset is generated by rescaling 32×32 RGB images of animals and vehicles from the original CIFAR-10 dataset [4]. The scale variations are up to a factor of 4. In order to have all test images have the same resolution, mirror extension is used to extend the images to size 64x64. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].
There are 10 distinct classes in the dataset: “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”. In the dataset, these are represented by integer labels in the range [0, 9].
The dataset is split into 40 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 40 000 samples from the original CIFAR-10 training set. The validation dataset, on the other hand, is formed from the final 10 000 image batch of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original CIFAR-10 test set.
The training dataset file (~5.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:
cifar10_with_scale_variations_tr40000_vl10000_te10000_outsize64-64_scte1p000_scte1p000.h5
Additionally, for the Rescaled CIFAR-10 dataset, there are 9 datasets (~1 GB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2k/4, with k being integers in the range [-4, 4]:
cifar10_with_scale_variations_te10000_outsize64-64_scte0p500.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p595.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p707.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p841.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p000.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p189.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p414.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p682.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte2p000.h5
These dataset files were used for the experiments presented in Figures 9, 10, 15, 16, 20 and 24 in [1].
The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.
The training dataset can be loaded in Python as:
with h5py.File(`
x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)
We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:
x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))
The test datasets can be loaded in Python as:
with h5py.File(`
x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)
The test datasets can be loaded in Matlab as:
x_test = h5read(`
The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).
You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:
Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.
Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris
Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/
The file downloaded is iris.data and is formatted as a comma delimited file.
This small data collection was created to help you test your skills with ingesting various data formats.
This file was processed to convert the data in the following formats:
* csv - comma separated values format
* tsv - tab separated values format
* parquet - parquet format
* feather - feather format
* parquet.gzip - compressed parquet format
* h5 - hdf5 format
* pickle - Python binary object file - pickle format
* xslx - Excel format
* npy - Numpy (Python library) binary format
* npz - Numpy (Python library) binary compressed format
* rds - Rds (R specific data format) binary format
I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.
Use these data formats to test your skills in ingesting data in various formats.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Abstract: This code has been used for the numerical experiments in the thesis "On dynamical low-rank integrators for matrix differential equations" by Stefan Schrammer, see https://www.doi.org/10.5445/IR/1000148853. TechnicalRemarks: #### Instructions: The scripts inside the subfolders are intended to reproduce the figures from the thesis On dynamical low-rank integrators for matrix differential equations. by Stefan Schrammer We provide two different versions of the code: - Code_Prom_wo_ref.zip provides the scripts for computing and plotting the data for all numerical experiments. - Code_Prom_incl_ref.zip addtionally provides the reference solutions to all considered problems as hdf5-files. Requirements The codes are tested with Ubuntu 20.04.2 LTS and Python 3.8.5 and the following version of its modules: numpy 1.19.2 scipy 1.5.2 numba 0.51.2 colorama 0.4.4 h5py 2.10.0 matplotlib 3.3.2 tikzplotlib 0.9.6 Generation of figures (tex files containing the data are also created) In the folder fracginz open a console and run the commands 1. to create the data for Figures (7.1) and (7.2) python3 fgl.py 2. to create Figures (7.1) and (7.2) python3 fgl_results.py In the folder fracschr open a console and run the commands 3. to create the data for Figure (7.3) python3 fsr.py 4. to create Figure (7.3) python3 fsr_results.py In the folder laserplasma open a console and run the commands 5. to create the data for Figure (7.4) python3 lpi_hom.py 6. to create Figure (7.4) python3 lpi_hom_plots.py 7. to create the data for Figures (7.5), (7.6), and (7.7) python3 lpi.py 8. to create Figures (7.5), (7.6), and (7.7) python3 lpi_globalerr.py python3 lpi_1d_plot python3 lpi_svals_maxint.py In the folder sinegordon open a console and run the commands 9. to create the data for Figures (7.8) and (7.9) python3 sineg.py 10. to create Figures (7.8) and (7.9) python3 sineg_globalerr_ranks.py If the reference solutions shall be recomputed, uncomment the line
Facebook
TwitterThe aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition.
Accurate 3D point clouds can (easily and cheaply) be adquired nowdays from different sources:
However there is a lack of large 3D datasets (you can find a good one here based on triangular meshes); it's especially hard to find datasets based on point clouds (wich is the raw output from every 3D sensing device).
This dataset contains 3D point clouds generated from the original images of the MNIST dataset to bring a familiar introduction to 3D to people used to work with 2D datasets (images).
In the 3D_from_2D notebook you can find the code used to generate the dataset.
You can use the code in the notebook to generate a bigger 3D dataset from the original.
The entire dataset stored as 4096-D vectors obtained from the voxelization (x:16, y:16, z:16) of all the 3D point clouds.
In adition to the original point clouds, it contains randomly rotated copies with noise.
The full dataset is splitted into arrays:
Example python code reading the full dataset:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
X_train = hf["X_train"][:]
y_train = hf["y_train"][:]
X_test = hf["X_test"][:]
y_test = hf["y_test"][:]
5000 (train), and 1000 (test) 3D point clouds stored in HDF5 file format. The point clouds have zero mean and a maximum dimension range of 1.
Each file is divided into HDF5 groups
Each group is named as its corresponding array index in the original mnist dataset and it contains:
x, y, z coordinates of each 3D point in the point cloud.nx, ny, nz components of the unit normal associate to each point.Example python code reading 2 digits and storing some of the group content in tuples:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
a = hf["0"]
b = hf["1"]
digit_a = (a["img"][:], a["points"][:], a.attrs["label"])
digit_b = (b["img"][:], b["points"][:], b.attrs["label"])
Simple Python class that generates a grid of voxels from the 3D point cloud. Check kernel for use.
Module with functions to plot point clouds and voxelgrid inside jupyter notebook. You have to run this locally due to Kaggle's notebook lack of support to rendering Iframes. See github issue here
Functions included:
array_to_color
Converts 1D array to rgb values use as kwarg color in plot_points()
plot_points(xyz, colors=None, size=0.1, axis=False)
plot_voxelgrid(v_grid, cmap="Oranges", axis=False)
Facebook
TwitterTo better understand the heat production, electricity generation performance, and economic viability of closed-loop geothermal systems in hot-dry rock, the Closed-Loop Geothermal Working Group -- a consortium of several national labs and academic institutions has tabulated time-dependent numerical solutions and levelized cost results of two popular closed-loop heat exchanger designs (u-tube and co-axial). The heat exchanger designs were evaluated for two working fluids (water and supercritical CO2) while varying seven continuous independent parameters of interest (mass flow rate, vertical depth, horizontal extent, borehole diameter, formation gradient, formation conductivity, and injection temperature). The corresponding numerical solutions (approximately 1.2 million per heat exchanger design) are stored as multi-dimensional HDF5 datasets and can be queried at off-grid points using multi-dimensional linear interpolation. A Python script was developed to query this database and estimate time-dependent electricity generation using an organic Rankine cycle (for water) or direct turbine expansion cycle (for CO2) and perform a cost assessment. This document aims to give an overview of the HDF5 database file and highlights how to read, visualize, and query quantities of interest (e.g., levelized cost of electricity, levelized cost of heat) using the accompanying Python scripts. Details regarding the capital, operation, and maintenance and levelized cost calculation using the techno-economic analysis script are provided.
This data submission will contain results from the Closed Loop Geothermal Working Group study that are within the public domain, including publications, simulation results, databases, and computer codes.
GeoCLUSTER is a Python-based web application created using Dash, an open-source framework built on top of Flask that streamlines the building of data dashboards. GeoCLUSTER provides users with a collection of interactive methods for streamlining the exploration and visualization of an HDF5 dataset. The GeoCluster app and database are contained in the compressed file geocluster_vx.zip, where the "x" refers to the version number. For example, geocluster_v1.zip is Version 1 of the app. This zip file also contains installation instructions.
**To use the GeoCLUSTER app in the cloud, click the link to "GeoCLUSTER on AWS" in the Resources section below. To use the GeoCLUSTER app locally, download the geocluster_vx.zip to your computer and uncompress this file. When uncompressed this file comprises two directories and the geocluster_installation.pdf file. The geo-data app contains the HDF5 database in condensed format, and the GeoCLUSTER directory contains the GeoCLUSTER app in the subdirectory dash_app, as app.py. The geocluster_installation.pdf file provides instructions on installing Python, the needed Python modules, and then executing the app.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is derived from the ISIC Archive with the following changes:
If the "benign_malignant" column is null and the "diagnosis" column is "vascular lesion", the target is set to null.
DISCLAIMER I'm not a dermatologist and I'm not affiliated with ISIC in any way. I don't know if my approach to setting the target value is acceptable by the ISIC competition. Use at your own risk.
import os
import multiprocessing as mp
from PIL import Image, ImageOps
import glob
from functools import partial
def list_jpg_files(folder_path):
# Ensure the folder path ends with a slash
if not folder_path.endswith('/'):
folder_path += '/'
# Use glob to find all .jpg files in the specified folder (non-recursive)
jpg_files = glob.glob(folder_path + '*.jpg')
return jpg_files
def resize_image(image_path, destination_folder):
# Open the image file
with Image.open(image_path) as img:
# Get the original dimensions
original_width, original_height = img.size
# Calculate the aspect ratio
aspect_ratio = original_width / original_height
# Determine the new dimensions based on the aspect ratio
if aspect_ratio > 1:
# Width is larger, so we will crop the width
new_width = int(256 * aspect_ratio)
new_height = 256
else:
# Height is larger, so we will crop the height
new_width = 256
new_height = int(256 / aspect_ratio)
# Resize the image while maintaining the aspect ratio
img = img.resize((new_width, new_height))
# Calculate the crop box to center the image
left = (new_width - 256) / 2
top = (new_height - 256) / 2
right = (new_width + 256) / 2
bottom = (new_height + 256) / 2
# Crop the image if it results in shrinking
if new_width > 256 or new_height > 256:
img = img.crop((left, top, right, bottom))
else:
# Add black edges if it results in scaling up
img = ImageOps.expand(img, border=(int(left), int(top), int(left), int(top)), fill='black')
# Resize the image to the final dimensions
img = img.resize((256, 256))
img.save(os.path.join(destination_folder, os.path.basename(image_path)))
source_folder = ""
destination_folder = ""
images = list_jpg_files(source_folder)
with mp.Pool(processes=12) as pool:
images = pool.map(partial(resize_image, destination_folder=destination_folder), images)
print("All images resized")
This code will shrink (down-sample) the image if it is larger than 256x256. But if the image is smaller than 256x256, it will add either vertical or horizontal black edges after scaling up the image. In both scenarios, it will keep the center of the input image in the center of the output image.
The HDF5 file is created using the following code:
import os
import pandas as pd
from PIL import Image
import h5py
import io
import numpy as np
# File paths
base_folder = "./isic-2018-task-3-256x256"
csv_file_path = 'train-metadata.csv'
image_folder_path = 'train-image/image'
hdf5_file_path = 'train-image.hdf5'
# Read the CSV file
df = pd.read_csv(os.path.join(base_folder, csv_file_path))
# Open an HDF5 file
with h5py.File(os.path.join(base_folder, hdf5_file_path), 'w') as hdf5_file:
for index, row in df.iterrows():
isic_id = row['isic_id']
image_file_path = os.path.join(base_folder, image_folder_path, f'{isic_id}.jpg')
if os.path.exists(image_file_path):
# Open the image file
with Image.open(image_file_path) as img:
# Convert the image to a byte buffer
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format=img.format)
img_byte_arr = img_byte_arr.getvalue()
hdf5_file.create_dataset(isic_id, data=np.void(img_byte_arr))
else:
print(f"Image file for {isic_id} not found.")
print("HDF5 file created successfully.")
To read the hdf5 file, use the following code:
import h5py
from PIL import Image
...
Facebook
TwitterThe goal of introducing the Rescaled Fashion-MNIST dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.
The Rescaled Fashion-MNIST dataset was introduced in the paper:
[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.
with a pre-print available at arXiv:
[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.
Importantly, the Rescaled Fashion-MNIST dataset is more challenging than the MNIST Large Scale dataset, introduced in:
[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.
The Rescaled Fashion-MNIST dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:
[4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747
and also for this new rescaled version, using the reference [1] above.
The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.
The Rescaled FashionMNIST dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72, with the object in the frame always centred. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].
There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].
The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.
The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:
fashionmnist_with_scale_variations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5
Additionally, for the Rescaled FashionMNIST dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2k/4, with k being integers in the range [-4, 4]:
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p500.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p595.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p707.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p841.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p000.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p189.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p414.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p682.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte2p000.h5
These dataset files were used for the experiments presented in Figures 6, 7, 14, 16, 19 and 23 in [1].
The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.
The training dataset can be loaded in Python as:
with h5py.File(`
x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)
We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:
x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))
The test datasets can be loaded in Python as:
with h5py.File(`
x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)
The test datasets can be loaded in Matlab as:
x_test = h5read(`
The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.
There is also a closely related Fashion-MNIST with translations dataset, which in addition to scaling variations also comprises spatial translations of the objects.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains zipfile which includes h5 file needed to reproduce the figures for arxiv:2507.02458 and sample python codes to generate figures as well as signal injection codes to generate TD signal and PSD.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contents:
bee_yazeed-20231001T170032.h5 - SXCT scan of a wasp performed at beamline ID10-BEATS of SESAME. SESAME_wasp_yazeed.avi - 3D video rendering of phase-contrast CT reconstruction of bee_yazeed-20231001T170032. The dataset was reconstructed using alrecon. The video was created using ORS Dragonfly. H5 dataset information:
Raw experimental data (sinogram, flat fields and dark fields) and metadata are stored in a common .H5 file. The HDF5 file is organized hierarchically following the Scientific Data Exchange (DXfile) community standard. How to reconstruct:
You can use Silx to read and explore the .H5 dataset. The file can be read within Python using the DXChange package. See the ID10-BEATS beamline user guide for a detailed description on how to process and reconstruct the scan.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Abstract: This dataset contains the DMFT/QMC results for the example of the two-orbital Hubbard model shown in the article "Thermodynamic Stability at the Two-Particle Level". It contains parameters, one-particle Green's functions, and observables in w2dynamics output format as well as patches for relevant functionality not contained in current versions of w2dynamics at the time of publication and scripts used for post-processing of the data and the creation of some of the graphs. For size reasons, the data files containing the corresponding two-particle Green's functions are split into multiple subdatasets whose identifiers are listed above. TechnicalRemarks: The data files are contained in directories named beta35 and beta50 for the inverse temperature used in the respective calculations, with files containing the two-particle Green's functions contained in the subdatasets listed above and indicated by names containing 'G2'. All calculations were performed for two-orbital Hubbard models on a Bethe lattice with density-density interaction with fixed ratios between the interaction coefficients. The individual file names contain the inverse temperature, e.g. 'b35' for beta=35, Hubbard-U interaction strength, e.g. 'U1.44' for U=1.44, and usually the chemical potential μ, e.g. 'mu1.26000' for μ=1.26. The file name segment 'ma...' present in some file names redundantly gives the difference of the used chemical potential from that necessary for half-filling. In the coexistence region, the phase of the solution depends on the procedure which is indicated by the name segment 'upward' / 'downward' / 'instable' (also sometimes shortened to just the initial letter) indicating the insulating or strongly correlated metallic phase, the weakly correlated metallic phase, and the unstable phase respectively. For some of the files containing unstable solutions, the targeted value of the quasiparticle weight Z calculated from the self-energy value at the first Matsubara frequency is given in the 'Ztarget...' segment instead of an approximate value of the chemical potential (which is not preset as a fixed parameter for calculating unstable solutions). File names of files containing two-particle Green's functions additionally contain 's...' indicating separate calculations differing only in the used PRNG seed that allow further statistical post-processing beyond that done automatically by w2dynamics. n(mu) plots as shown in Figs. 2 and 3 of the article can be created using the script 'kappa_2band_create_mu_n_plot.py' by calling it with the appropriate arguments, e.g. using commands like python kappa_2band_create_mu_n_plot.py -r "kappa_2band_bethe_dens_b35_U([0-9.]*)_([muZtarget0-9.]*).*hdf.*" --axisgroup 1 -k '$U/D = {grp[0]}$' --imsiwsort --nmin 2.0 --nmax 2.08 --mumin 0.0 --mumax 0.15 --nmu --onecolsize .hdf5.zst in the beta35 directory to create a plot like in Fig. 2 and python kappa_2band_create_mu_n_plot.py -r "kappa_2band_bethe_dens_b50_U([0-9.])_([muZtarget0-9.]*).*hdf.*" --axisgroup 1 -k '$U/D = {grp[0]}$' --imsiwsort --nmin 2.0 --nmax 2.14 --mumin 0.0 --mumax 0.22 --nmu --onecolsize *.hdf5.zst in the beta50 directory to create a plot like in Fig. 3. The script 'chi_d_orblt_diagonalize.py' can be used to compute and diagonalize the generalized susceptibility by passing a data file with the one-particle Green's function as argument after '--onepfile' and one with the corresponding two-particle Green's function after '--twopfile'. From the created .npz files, a plot like in Fig. 1 of the supplemental material can be created using the script 'chi_eigenbasis_multi_barcontribs.py' by calling it with the appropriate arguments, e.g. python chi_eigenbasis_multi_barcontribs.py --force-centrosymm-contribs --onecolsize --bargraph 2 --beta 50 --hopping 0.5 --contrib real --barorder contrib kappa_2band_bethe_dens_b50_U1.4910_mu1.4924_u_chi_orblt.npz kappa_2band_bethe_dens_b50_U1.4915_mu1.4937_u_chi_orblt.npz kappa_2band_bethe_dens_b50_U1.4920_mu1.49510_u_chi_orblt.npz kappa_2band_bethe_dens_b50_U1.4930_mu1.49780_u_chi_orblt.npz kappa_2band_bethe_dens_b50_U1.50_mu1.51740_u_chi_orblt.npz --tickstrings '$U/D = 1.4910$' '$U/D = 1.4915$' '$U/D = 1.4920$' '$U/D = 1.4930$' '$U/D = 1.5000$' to create a similar plot showing the same data after the listed .npz files with the generalized susceptibility data have been created. Patches in the patch directory can be applied to w2dynamics 1.1.5 as published on GitHub to add functionality that allows performing calculations converging toward unstable solutions like those contained in this data set. This information is also contained in the markdown-formatted file README.md contained in the datasets. Other: We are grateful for funding support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy through the Würzburg-Dresden Cluster of Excellence on Complexity and Topology in Quantum Matter ct.qmat (EXC 2147, Project ID 390858490) as well as through the Collaborative Research Center SFB 1170 ToCoTronics (Project ID 258499086).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This deposition contains the results from a simulation of reconstructions of undersampled atomic force microscopy (AFM) images. The reconstructions were obtained using weighted iterative thresholding compressed sensing algorithms.
The deposition consists of:
The HDF5 database is licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/) . Since the CC BY 4.0 license is not well suited for source code, the Python script is licensed under the BSD 2-Clause license (http://opensource.org/licenses/BSD-2-Clause) .
The files are provided as-is with no warranty as detailed in the above mentioned licenses.
The database is split into ten parts:
These tem parts must be concatenated before the database can be extracted from the tar.xz archive. On Unix-like systems this may be done using:
$ cat weighted_it_reconstructions.hdf5.tar.xz.part-* > weighted_it_reconstructions.hdf5.tar.xz
after which the archive may be extracted, e.g., using:
$ tar xfJ weighted_it_reconstructions.hdf5.tar.xz
WARNING: The extracted HDF5 database has a size of 114 GiB.
The simulation results in the database are based on "Atomic Force Microscopy Images of Cell Specimens" and "Atomic Force Microscopy Images of Various Specimens" by Christian Rankl licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). The original images are available at http://dx.doi.org/10.5281/zenodo.17573 and http://dx.doi.org/10.5281/zenodo.60434. The original images are provided as-is without warranty of any kind. Both the original images as well as adapted images are part of the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This kit includes an additional revised master file, lyso009a_0087.JF07T32V01_master_rev.h5 that provides compliance with the October 2019 NXmx specification as proposed in https://github.com/HDRMX/definitions.git
To create a new NeXus master file, assuming DIALS is installed in the folder $DIALS, use this command:
libtbx.python $DIALS/modules/cctbx_project/xfel/swissfel/jf16m_cxigeom2nexus.py unassembled_file=lyso009a_0087.JF07T32V01.h5 geom_file=16M_bernina_backview_optimized_adu_quads.geom wavelength=1.368479 detector_distance=97.830 mask_file=lyso009a_0087.JF07T32V01.mask.h5
Geometry file is in CrystFEL format but has been realigned to group the modules hierarchically into quadrants.
View the data using DIALS: dials.image_viewer lyso009a_0087.JF07T32V01_master.h5
Process the data using DIALS, treating the images as stills, assuming 64 cores available on the system: dials.stills_process mp.nproc=64 lyso009a_0087.JF07T32V01_master.h5 dispersion.gain=10 known_symmetry.space_group=P43212 known_symmetry.unit_cell=77,77,37,90,90,90 refinement_protocol.d_min_start=2.5
Download DIALS at dials.github.io.
After the DIALS run, for full NXmx compliance you will need the jungfrau portions of the script that was used to generate lyso009a_0087.JF07T32V01_master_rev.h5
cp Therm_6_2.nxs Therm_6_2_rev.nxs
cp Therm_6_2_master.h5 Therm_6_2_master_rev.h5
cp jungfrau/lyso009a_0087.JF07T32V01_master.h5 jungfrau/lyso009a_0087.JF07T32V01_master_rev.h5
export curdat=date +%FT%T.%3
export LD_LIBRARY_PATH=$HOME/lib
export HDF5_PLUGIN_PATH=$HOME/lib
export PATH=$HOME/bin:$PATH
h5copy -i Therm_6_2_rev.nxs -o Therm_6_2_master_rev.h5 -s /entry/instrument/name -d /entry/instrument/name -f ref
h5copy -i Therm_6_2_rev.nxs -o Therm_6_2_master_rev.h5 -s /entry/instrument/source -d /entry/source -f ref
h5copy -i Therm_6_2_rev.nxs -o Therm_6_2_rev.nxs -s /entry/instrument/source -d /entry/source -f ref
h5copy -i jungfrau/lyso009a_0087.JF07T32V01_master.h5 -o jungfrau/lyso009a_0087.JF07T32V01_master_rev.h5 -s /entry/sample/beam -d /entry/instrument/beam -f ref
export end_time=h5dump -d "/entry/end_time" Therm_6_2_master.h5 | grep ":" | sed 's/^.........//'|sed 's/.\$//'
echo "end_time: $end_time"
python << 'EOL'
import h5py as h5
import numpy as np
import os
end_time=os.environ['end_time']
curdat=os.environ['curdat']
fvds = h5.File('Therm_6_2_rev.nxs','r+')
fmaster = h5.File('Therm_6_2_master_rev.h5','r+')
jungfrau= h5.File('jungfrau/lyso009a_0087.JF07T32V01_master_rev.h5','r+')
fvds_keys=fvds.keys()
fmaster_keys=fmaster.keys()
jungfrau_keys=jungfrau.keys()
fvds_entry=fvds['entry']
fmaster_entry=fmaster['entry']
jungfrau_entry=jungfrau['entry']
fvds_entry_keys=fvds_entry.keys()
fmaster_entry_keys=fmaster_entry.keys()
jungfrau_entry_keys=jungfrau_entry.keys()
fvds_entry_instrument=fvds['entry']['instrument']
fmaster_entry_instrument=fmaster['entry']['instrument']
jungfrau_entry_instrument=jungfrau['entry']['instrument']
fvds_entry_instrument_keys=fvds_entry_instrument.keys()
fmaster_entry_instrument_keys=fmaster_entry_instrument.keys()
jungfrau_entry_instrument_keys=jungfrau_entry_instrument.keys()
fvds_entry_instrument_name=(fvds['entry']['instrument']['name'])
fmaster_entry_instrument_name=(fmaster['entry']['instrument']['name'])
jungfrau['entry']['instrument'].create_dataset("name", data=np.string_("Paul Scherrer Institute SwissFEL Aramis 1 (Alvra)"))
jungfrau_entry_instrument_name=(jungfrau['entry']['instrument']['name'])
fvds_entry_instrument_short_name=fvds_entry_instrument.attrs['short_name']
fmaster_entry_instrument_short_name=fmaster_entry_instrument.attrs['short_name']
jungfrau_entry_instrument_name.attrs.modify('short_name',np.string_("Alvra"))
jungfrau_entry_instrument_short_name=jungfrau_entry_instrument_name.attrs['short_name']
zero_offset=fmaster_entry_instrument['detector']['module']['fast_pixel_direction'].attrs['offset']
fmaster_det_z=fmaster_entry_instrument['transformations']['det_z']
fvds_det_z=fvds_entry_instrument['transformations']['det_z']
print('fvds_keys: ',fvds_keys)
print('fmaster_keys: ',fmaster_keys)
print('jungfrau_keys: ',jungfrau_keys)
print('fvds_entry_keys: ',fvds_entry_keys)
print('fmaster_entry_keys: ',fmaster_entry_keys)
print('jungfrau_entry_keys: ',jungfrau_entry_keys)
print('fvds_entry_instrument_keys: ',fvds_entry_instrument_keys)
print('fmaster_entry_instrument_keys: ',fmaster_entry_instrument_keys)
print('jungfrau_entry_instrument_keys: ',jungfrau_entry_instrument_keys)
print('fvds_entry_instrument_name: ',fvds_entry_instrument_name)
print('fmaster_entry_instrument_name: ',fmaster_entry_instrument_name)
print('jungfrau_entry_instrument_name: ',jungfrau_entry_instrument_name)
print('fvds_entry_instrument_short_name: ',fvds_entry_instrument_short_name)
print('fmaster_entry_instrument_short_name: ',fmaster_entry_instrument_short_name)
print('jungfrau_entry_instrument_short_name: ',jungfrau_entry_instrument_short_name)
print('fmaster_entry_instrument_detector_module_fast_pixel_direction_offset: ',zero_offset)
print('fmaster_entry_instrument_detector_detector_z_det_z: ',fmaster_det_z)
print('fmaster_entry_end_time: ',end_time)
fmaster.attrs.modify('file_time',np.string_(end_time))
fmaster.attrs.modify('file_name',np.string_('Therm_6_2_master_rev.h5'))
fmaster.attrs.modify('HDF5_Version',np.string_('hdf5-1.8.18'))
fvds.attrs.modify('file_time',np.string_(end_time))
fvds.attrs.modify('file_name',np.string_('Therm_6_2_master_rev.h5'))
fvds.attrs.modify('HDF5_Version',np.string_('hdf5-1.10.5'))
jungfrau.attrs.modify('file_time',np.string_(curdat))
jungfrau.attrs.modify('file_name',np.string_('lyso009a_0087.JF07T32V01_master.h5'))
jungfrau.attrs.modify('HDF5_Version',np.string_('hdf5-1.10.5'))
fvds_entry_instrument_name.attrs.modify('short_name',np.string_(fvds_entry_instrument.attrs['short_name']))
fmaster_entry_instrument_name.attrs.modify('short_name',np.string_(fmaster_entry_instrument.attrs['short_name']))
fmaster_entry_instrument['attenuator']['attenuator_transmission'].attrs.modify('units',np.string_(""))
fmaster_entry_instrument['detector']['count_time'].attrs.modify('units',np.string_("s"))
fvds_entry_instrument_name.attrs.modify('short_name',np.string_(fvds_entry_instrument.attrs['short_name']))
fvds_entry_instrument['attenuator']['attenuator_transmission'].attrs.modify('units',np.string_(""))
fvds_entry_instrument['detector']['count_time'].attrs.modify('units',np.string_("s"))
fmaster_det_z.attrs.modify('offset',zero_offset)
fvds_det_z.attrs.modify('offset',zero_offset)
fmaster_entry['sample']['transformations']['phi'].attrs.modify('offset',zero_offset)
fmaster_entry['sample']['transformations']['chi'].attrs.modify('offset',zero_offset)
fmaster_entry['sample']['transformations']['sam_x'].attrs.modify('offset',zero_offset)
fmaster_entry['sample']['transformations']['sam_y'].attrs.modify('offset',zero_offset)
fmaster_entry['sample']['transformations']['sam_z'].attrs.modify('offset',zero_offset)
fmaster_entry['sample']['transformations']['omega'].attrs.modify('offset',zero_offset)
fvds_entry['sample']['transformations']['phi'].attrs.modify('offset',zero_offset)
fvds_entry['sample']['transformations']['chi'].attrs.modify('offset',zero_offset)
fvds_entry['sample']['transformations']['sam_x'].attrs.modify('offset',zero_offset)
fvds_entry['sample']['transformations']['sam_y'].attrs.modify('offset',zero_offset)
fvds_entry['sample']['transformations']['sam_z'].attrs.modify('offset',zero_offset)
fvds_entry['sample']['transformations']['omega'].attrs.modify('offset',zero_offset)
print(fmaster['entry']['instrument']['name'].attrs['short_name'])
print(fmaster['entry']['instrument']['name'].attrs['short_name'].shape)
print(fmaster['entry']['instrument']['name'].attrs['short_name'].dtype)
print("/entry/instrument/ELE_D0/pixel_mask_applied :",jungfrau_entry_instrument['ELE_D0']['pixel_mask_applied'])
del jungfrau_entry_instrument['ELE_D0']['pixel_mask_applied']
jungfrau_entry_instrument['ELE_D0'].create_dataset("pixel_mask_applied",dtype='int8', data=1)
print("/entry/instrument/ELE_D0/pixel_mask_applied :",jungfrau_entry_instrument['ELE_D0']['pixel_mask_applied'])
jungfrau_entry_source=jungfrau_entry.create_group('source')
jungfrau_entry_source=jungfrau_entry['source']
jungfrau_entry_source.attrs.modify('NX_class',np.string_("NXsource"))
jungfrau_entry_source.create_dataset("name",data=np.string_("Paul Scherrer Institute SwissFEL"))
jungfrau_entry_source['name'].attrs.modify('short_name',np.string_("SwissFEL"))
jungfrau_entry_instrument['beam'].create_dataset('total_flux',dtype='float64',data=1000000000000.) jungfrau_entry_instrument['beam']['total_flux'].attrs.modify('units',np.string_('/pulse')) del jungfrau_entry['sample']['beam'] del fvds_entry_instrument.attrs['short_name'] del fmaster_entry_instrument.attrs['short_name'] del fmaster_entry_instrument['source'] fvds.close() fmaster.close() jungfrau.close() quit() EOL $HOME/bin/nxvalidate -a NXmx -l /home/yaya/hdrmx_rev_29Sep19/hdrmx/definitions Therm_6_2_master_rev.h5 $HOME/bin/nxvalidate -a NXmx -l /home/yaya/hdrmx_rev_29Sep19/hdrmx/definitions Therm_6_2_rev.nxs $HOME/bin/nxvalidate -a NXmx -l /home/yaya/hdrmx_rev_29Sep19/hdrmx/definitions jungfrau/lyso009a_0087.JF07T32V01_master_rev.h5
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
UED_RAW_sorted: During the experiment, we scanned along the delay stage to vary the time delay between the pump and the probe. Each h5 file contains two diffraction patterns (pump on and pump off) of one second of exposure at a given delay. We merged h5 file with the same delay with the home-made software 'data_explorer' in https://github.com/remiclaude/UED_interface and extract a pickle file containing the diffraction patterns along the delay for imgON (with the pump) and imgOFF (without the pump) and the metadata. - UED_PROCESSED: The jupyter notebook 'treat_pickle.ipynb' in https://github.com/remiclaude/UED_processing used the pickles files in folder 'RAW_sorted' and process them: removes hot pixel, shift images to keep the unscattered beam at the same position, and average the diffraction map along the symmetry axis.
The python files to process and create the figures shown in the publication are shown in the section "related work"
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A curated list of preprocessed & ready to use under a minute Human Activity Recognition datasets.
All the datasets are preprocessed in HDF5 format, created using the h5py python library. Scripts used for data preprocessing are provided as well (Load.ipynb and load_jordao.py)
Each HDF5 file contains at least the keys:
x a single array of size [sample count, temporal length, sensor channel count], contains the actual sensor data. Metadata contains the names of individual sensor channel count. All samples are zero-padded for constant length in the file, original lengths before padding available under the meta keys.
y a single array of size [sample count] with integer values for target classes (zero-based). Metadata contains the names of the target classes.
meta contain various metadata, depends on the dataset (original length before padding, subject no., trial no., etc.)
Usage example
import h5py
with h5py.File(f'data/waveglove_multi.h5', 'r') as h5f: x = h5f['x'] y = h5f['y']['class'] print(f'WaveGlove-multi: {x.shape[0]} samples') print(f'Sensor channels: {h5f["x"].attrs["channels"]}') print(f'Target classes: {h5f["y"].attrs["labels"]}') first_sample = x[0]
Current list of datasets:
WaveGlove-single (waveglove_single.h5)
WaveGlove-multi (waveglove_multi.h5)
uWave (uwave.h5)
OPPORTUNITY (opportunity.h5)
PAMAP2 (pamap2.h5)
SKODA (skoda.h5)
MHEALTH (non overlapping windows) (mhealth.h5)
Six datasets with all four predefined train/test folds as preprocessed by Jordao et al. originally in WearableSensorData (FNOW, LOSO, LOTO and SNOW prefixed .h5 files)