The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition.
Accurate 3D point clouds can (easily and cheaply) be adquired nowdays from different sources:
However there is a lack of large 3D datasets (you can find a good one here based on triangular meshes); it's especially hard to find datasets based on point clouds (wich is the raw output from every 3D sensing device).
This dataset contains 3D point clouds generated from the original images of the MNIST dataset to bring a familiar introduction to 3D to people used to work with 2D datasets (images).
In the 3D_from_2D notebook you can find the code used to generate the dataset.
You can use the code in the notebook to generate a bigger 3D dataset from the original.
The entire dataset stored as 4096-D vectors obtained from the voxelization (x:16, y:16, z:16) of all the 3D point clouds.
In adition to the original point clouds, it contains randomly rotated copies with noise.
The full dataset is splitted into arrays:
Example python code reading the full dataset:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
X_train = hf["X_train"][:]
y_train = hf["y_train"][:]
X_test = hf["X_test"][:]
y_test = hf["y_test"][:]
5000 (train), and 1000 (test) 3D point clouds stored in HDF5 file format. The point clouds have zero mean and a maximum dimension range of 1.
Each file is divided into HDF5 groups
Each group is named as its corresponding array index in the original mnist dataset and it contains:
x, y, z
coordinates of each 3D point in the point cloud.nx, ny, nz
components of the unit normal associate to each point.Example python code reading 2 digits and storing some of the group content in tuples:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
a = hf["0"]
b = hf["1"]
digit_a = (a["img"][:], a["points"][:], a.attrs["label"])
digit_b = (b["img"][:], b["points"][:], b.attrs["label"])
Simple Python class that generates a grid of voxels from the 3D point cloud. Check kernel for use.
Module with functions to plot point clouds and voxelgrid inside jupyter notebook. You have to run this locally due to Kaggle's notebook lack of support to rendering Iframes. See github issue here
Functions included:
array_to_color
Converts 1D array to rgb values use as kwarg color
in plot_points()
plot_points(xyz, colors=None, size=0.1, axis=False)
plot_voxelgrid(v_grid, cmap="Oranges", axis=False)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided dataset comprises 43 instances of temporal bone volume CT scans. The scans were performed on human cadaveric specimen with a resulting isotropic voxel size of (99 \times 99 \times 99 \, \, \mathrm{\mu m}^3). Voxel-wise image labels of the fluid space of the bony labyrinth, subdivided in the three semantic classes cochlear volume, vestibular volume and semicircular canal volume are provided. In addition, each dataset contains JSON-like descriptor data defining the voxel coordinates of the anatomical landmarks: (1) apex of the cochlea, (2) oval window and (3) round window. The dataset can be used to train and evaluate algorithmic machine learning models for automated innear ear analysis in the context of the supervised learning paradigm.
Usage Notes
The datasets are formatted in the HDF5 format developed by the HDF5 Group. We utilized and thus recommend the usage of Python bindings pyHDF to handle the datasets.
The flat-panel volume CT raw data, labels and landmarks are saved in the HDF5-internal file structure using the respective group and datasets:
raw/raw-0 label/label-0 landmark/landmark-0 landmark/landmark-1 landmark/landmark-2
Array raw and label data can be read from the file by indexing into an opened h5py file handle, for example as numpy.ndarray. Further metadata is contained in the attribute dictionaries of the raw and label datasets.
Landmark coordinate data is available as an attribute dict and contains the coordinate system (LPS or RAS), IJK voxel coordinates and label information. The helicotrema or cochlea top is globally saved in landmark 0, the oval window in landmark 1 and the round window in landmark 2. Read as a Python dictionary, exemplary landmark information for a dataset may reads as follows:
{'coordsys': 'LPS', 'id': 1, 'ijk_position': array([181, 188, 100]), 'label': 'CochleaTop', 'orientation': array([-1., -0., -0., -0., -1., -0., 0., 0., 1.]), 'xyz_position': array([ 44.21109689, -139.38058589, -183.48249736])}
{'coordsys': 'LPS', 'id': 2, 'ijk_position': array([222, 182, 145]), 'label': 'OvalWindow', 'orientation': array([-1., -0., -0., -0., -1., -0., 0., 0., 1.]), 'xyz_position': array([ 48.27890112, -139.95991131, -179.04103763])}
{'coordsys': 'LPS', 'id': 3, 'ijk_position': array([223, 209, 147]), 'label': 'RoundWindow', 'orientation': array([-1., -0., -0., -0., -1., -0., 0., 0., 1.]), 'xyz_position': array([ 48.33120126, -137.27135678, -178.8665465 ])}
To better understand the heat production, electricity generation performance, and economic viability of closed-loop geothermal systems in hot-dry rock, the Closed-Loop Geothermal Working Group -- a consortium of several national labs and academic institutions has tabulated time-dependent numerical solutions and levelized cost results of two popular closed-loop heat exchanger designs (u-tube and co-axial). The heat exchanger designs were evaluated for two working fluids (water and supercritical CO2) while varying seven continuous independent parameters of interest (mass flow rate, vertical depth, horizontal extent, borehole diameter, formation gradient, formation conductivity, and injection temperature). The corresponding numerical solutions (approximately 1.2 million per heat exchanger design) are stored as multi-dimensional HDF5 datasets and can be queried at off-grid points using multi-dimensional linear interpolation. A Python script was developed to query this database and estimate time-dependent electricity generation using an organic Rankine cycle (for water) or direct turbine expansion cycle (for CO2) and perform a cost assessment. This document aims to give an overview of the HDF5 database file and highlights how to read, visualize, and query quantities of interest (e.g., levelized cost of electricity, levelized cost of heat) using the accompanying Python scripts. Details regarding the capital, operation, and maintenance and levelized cost calculation using the techno-economic analysis script are provided. This data submission will contain results from the Closed Loop Geothermal Working Group study that are within the public domain, including publications, simulation results, databases, and computer codes. GeoCLUSTER is a Python-based web application created using Dash, an open-source framework built on top of Flask that streamlines the building of data dashboards. GeoCLUSTER provides users with a collection of interactive methods for streamlining the exploration and visualization of an HDF5 dataset. The GeoCluster app and database are contained in the compressed file geocluster_vx.zip, where the "x" refers to the version number. For example, geocluster_v1.zip is Version 1 of the app. This zip file also contains installation instructions. **To use the GeoCLUSTER app in the cloud, click the link to "GeoCLUSTER on AWS" in the Resources section below. To use the GeoCLUSTER app locally, download the geocluster_vx.zip to your computer and uncompress this file. When uncompressed this file comprises two directories and the geocluster_installation.pdf file. The geo-data app contains the HDF5 database in condensed format, and the GeoCLUSTER directory contains the GeoCLUSTER app in the subdirectory dash_app, as app.py. The geocluster_installation.pdf file provides instructions on installing Python, the needed Python modules, and then executing the app.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of Room Impulse Responses measured at the Acoustic Technology group facilities, DTU Electro. The measurements were carried out in building 355, room 008, otherwise known as the "sound field control" room.
The dataset consists of measurements of the acoustic impulse response in a regular square grid of 80x80 cm2. The measurements consist of 30x30 room impulse responses (e.g. 900 RIRs) where the microphone, a 1/2 inch free field condenser measurement microphone (Br"uel&Kj"aer, N\ae rum, Denmark) was automatically positioned using a UR5 (Universal Robots, Odense, Denmark) scanning robotic arm. A BM6 loudspeaker (Dynaudio, Skanderborg, Denmark) placed in a room corner was used to excite the room with 5s wideband logarithmic sweeps.
The data are available as H5 (Hierarchical Data Formats) with the following structure:
Dataset "Sound field control room - HDF5 datastructure"
Dataset.attributes ├── calibration (1) '94.33 dB @ 1.14 kHz' ├── fs (1) 48000 Hz ├── grid_bottom (3x900) [[2.76, 2.32, 2.73, ... 2.738 2.76..., ...]] m ├── loudspeaker_position (3) [5.779 0.263 0.904] m ├── room_dimensions (3) [6.124 5.771 3.073] m ├── sweep_amplitude (1) 0.4 ├── sweep_duration (1) 5 s ├── sweep_range (2) [32. 12000.] Hz ├── temperature (1) 20.3 °C └── true_calibration (1) '94.19 dB @ 1.17 kHz' Dataset.groups ├── Noise_recs_bottom (12x3x144000) float64 ├── RIRs_bottom (900x25439) float64 (e.g. 900 positions) └── sweep_signal (288000) float64
Use with Matlab: https://mathworks.com/help/matlab/hdf5-files.html Use with Python: https://docs.h5py.org/en/stable/
This dataset is part of the project Danish Sound Cluster project titled "Physics-informed Neural Networks for Sound Field Reconstruction" [url]. Find more datasets here: Project Page
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains fluoroscopy images extracted from four videos of canulation experiments with an aorta phantom and six videos of in-vivo catheterisation procedures: four Transcatheter Aortic Valve Implantations (TAVI) and two diagnostic catheterisation procedures. Please refer to the README.docxThe Phantom.hdf5 file contains the 2000 (Dataset-2 in the paper) images extracted from the four fluoroscopy videos from catheterization experiments carried out on a silicon aorta phantom in an angiography suite.The T1T2.hdf5 and T3-T6.hdf5 files contain images extracted from the six fluoroscopy videos during in-vivo endovascular operations (Dataset-3 in the paper). Specifically, 836 frames were extracted from TAVI (data groups T1, T2, T3 andT4) and 371 from diagnostic catheterization (data groups T5 andT6). Each data group contains the following number of images: T1 – 286, T2 – 150, T3 – 200, T4 – 200, T5 – 143, T6 – 228.Binary segmentation masks of the interventional catheter are provided as ground truth. A semiautomated tracking method with manual initialisation (http://ieeexplore.ieee.org/document/7381624/) was employed to obtain the catheter annotations as the 2D coordinates of the catheter restricted to a manually selected region of interest (ROI). The method employs a b-spline tube model as a prior for the catheter shape to restrict the search space and deal with potential missing measurements. This is combined with a probabilistic framework that estimates the pixel-wise posteriors between the foreground (catheter) and background delimited by the b-spline tube contour. The output of the algorithm was manually checked and corrected to provide the final catheter segmentation.The annotations are provided in the files: “Phantom_label.hdf5”, “T1T2_label.hdf5” and “T3-T6_label.hdf5”. All annotations consist of full-scale (256x256 px) binary masks where background pixels have a “0” value, while a value equal to “1” denotes the catheter pixels.Example python code (MAIN.py) is provided to access the data and the labels and visualize them.Citing the datasetThe dataset should be cited using its DOI whenever research making use of this dataset is reported in any academic publication or research report. Please also cite the following publication:Marta Gherardini, Evangelos Mazomenos, Arianna Menciassi, Danail Stoyanov, “Catheter segmentation in X-ray fluoroscopy using synthetic data and transfer learning with light U-nets”, Computer Methods and Programs in Biomedicine, Volume 192, Aug 2020, 105420, doi:10.1016/j.cmpb.2020.105420.To find out more about our research team, visit the Surgical Robot Vision and Wellcome/EPSRC Centre for Interventional and Surgical Science websites.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
EyeFi Dataset
This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.
Clarification/Bug report: Please note that the order of antennas and subcarriers in .h5 files is not written clearly in the README.md file. The order of antennas and subcarriers are as follows for the 90 csi_real
and csi_imag
values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. Please see the description below. The newer version of the dataset contains this information in README.md. We are sorry for the inconvenience.
Data Collection Setup
In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.
The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.
To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.
List of Files Here is a list of files included in the dataset:
|- 1_person |- 1_person_1.h5 |- 1_person_2.h5 |- 2_people |- 2_people_1.h5 |- 2_people_2.h5 |- 2_people_3.h5 |- 3_people |- 3_people_1.h5 |- 3_people_2.h5 |- 3_people_3.h5 |- 5_people |- 5_people_1.h5 |- 5_people_2.h5 |- 5_people_3.h5 |- 5_people_4.h5 |- 10_people |- 10_people_1.h5 |- 10_people_2.h5 |- 10_people_3.h5 |- Kitchen |- 1_person |- kitchen_1_person_1.h5 |- kitchen_1_person_2.h5 |- kitchen_1_person_3.h5 |- 3_people |- kitchen_3_people_1.h5 |- training |- shuffuled_train.h5 |- shuffuled_valid.h5 |- shuffuled_test.h5 View-Dataset-Example.ipynb README.md
In this dataset, folder 1_person/
, 2_people/
, 3_people/
, 5_people/
, and 10_people/
contains data collected from the lab area whereas Kitchen/
folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.
The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from 1_person/
folder collected in the lab area (1_person_1.h5
and 1_person_2.h5
).
Why multiple files in one folder?
Each folder contains multiple files. For example, 1_person
folder has two files: 1_person_1.h5
and 1_person_2.h5
. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like 1_person_1.h5
, 1_person_2.h5
) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.
Special note:
For 1_person_1.h5
, this file is generated by the same person who is holding the phone, and 1_person_2.h5
contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.
Access the data To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.
Each file is structured as (except the files under "training/" folder):
|- csi_imag |- csi_real |- nPaths_1 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_2 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_3 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_4 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- num_obj |- obj_0 |- cam_aoa |- coordinates |- obj_1 |- cam_aoa |- coordinates ... |- timestamp
The csi_real
and csi_imag
are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 csi_real
and csi_imag
values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. nPaths_x
group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with x
number of multiple paths specified during calculation. Under the nPath_x
group are offset_xx
subgroup where xx
stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:
Antennas | Offset 1 (rad) | Offset 2 (rad) |
---|---|---|
1 & 2 | 1.1899 | -2.0071 |
1 & 3 | 1.3883 | -1.8129 |
The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the offset_xx
naming. For example, offset_12
is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.
The num_obj
field is used to store the number of human subjects present in the scene. The obj_0
is always the subject who is holding the phone. In each file, there are num_obj
of obj_x
. For each obj_x1
, we have the coordinates
reported from the camera and cam_aoa
, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the training
folder) . It reflects the way the person carried the phone moved in the space (for obj_0
) and everyone else walked (for other obj_y
, where y
> 0).
The timestamp
is provided here for time reference for each WiFi packets.
To access the data (Python):
import h5py
data = h5py.File('3_people_3.h5','r')
csi_real = data['csi_real'][()] csi_imag = data['csi_imag'][()]
cam_aoa = data['obj_0/cam_aoa'][()] cam_loc = data['obj_0/coordinates'][()]
For file inside training/
folder:
Files inside training folder has a different data structure:
|- nPath-1 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-2 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-3 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-4 |- aoa |- csi_imag |- csi_real |- spotfi
The group nPath-x
is the number of multiple path specified during the SpotFi calculation. aoa
is the camera generated angle of arrival (AoA) (can be considered as ground truth), csi_image
and csi_real
is the imaginary and real component of the CSI value. spotfi
is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across 1_person_1.h5
and 1_person_2.h5
. All the rows under the same nPath-x
group are aligned (i.e., first row of aoa
corresponds to the first row of csi_imag
, csi_real
, and spotfi
. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the 1_person_1.h5
and 1_person_2.h5
files.
Citation If you use the dataset, please cite our paper:
@inproceedings{eyefi2020, title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching}, author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar}, booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)},
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Information
This dataset presents long-term term indoor solar harvesting traces and jointly monitored with the ambient conditions. The data is recorded at 6 indoor positions with diverse characteristics at our instituate at ETH Zurich in Zurich, Switzerland.
The data is collected with a measurement platform [3] consisting of a solar panel (AM-5412) connected to a bq25505 energy harvesting chip that stores the harvested energy in a virtual battery circuit. Two TSL45315 light sensors placed on opposite sides of the solar panel monitor the illuminance level and a BME280 sensor logs ambient conditions like temperature, humidity and air pressure.
The dataset contains the measurement of the energy flow at the input and the output of the bq25505 harvesting circuit, as well alse the illuminance, temperature, humidity and air pressure measurements of the ambient sensors. The following timestamped data columns are available in the raw measurement format, as well as preprocessed and filtered HDF5 datasets:
V_in
- Converter input/solar panel output voltage, in voltI_in
- Converter input/solar panel output current, in ampereV_bat
- Battery voltage (emulated through circuit), in voltI_bat
- Net Battery current, in/out flowing current, in ampereEv_left
- Illuminance left of solar panel, in luxEv_right
- Illuminance left of solar panel, in luxP_amb
- Ambient air pressure, in pascalRH_amb
- Ambient relative humidity, unit-less between 0 and 1T_amb
- Ambient temperature, in centigrade CelsiusThe following publication presents and overview of the dataset and more details on the deployment used for data collection. A copy of the abstract is included in this dataset, see the file abstract.pdf
.
L. Sigrist, A. Gomez, and L. Thiele. Dataset: Tracing Indoor Solar Harvesting. In Proceedings of the 2nd Workshop on Data Acquisition To Analysis (DATA '19), 2019. [under submission]
Folder Structure and Files
processed/
- This folder holds the imported, merged and filtered datasets of the power and sensor measurements. The datasets are stored in HDF5 format and split by measurement position posXX
and and power and ambient sensor measurements. The files belonging to this folder are contained in archives named yyyy_mm_processed.tar
, where yyyy
and mm
represent the year and month the data was published. A separate file lists the exact content of each archive (see below).raw/
- This folder holds the raw measurement files recorded with the RocketLogger [1, 2] and using the measurement platform available at [3]. The files belonging to this folder are contained in archives named yyyy_mm_raw.tar
, where yyyy
and mm
represent the year and month the data was published. A separate file lists the exact content of each archive (see below).LICENSE
- License information for the dataset.README.md
- The README file containing this information.abstract.pdf
- A copy of the above mentioned abstract submitted to the DATA '19 Workshop, introducing this dataset and the deployment used to collect it.raw_import.ipynb
[open in nbviewer] - Jupyter Python notebook to import, merge, and filter the raw dataset from the raw/
folder. This is the exact code used to generate the processed dataset and store it in the HDF5 format in the processed/
folder.raw_preview.ipynb
[open in nbviewer] - This Jupyter Python notebook imports the raw dataset directly and plots a preview of the full power trace for all measurement positions.processing_python.ipynb
[open in nbviewer] - Jupyter Python notebook demonstrating the import and use of the processed dataset in Python. Calculates column-wise statistics, includes more detailed power plots and the simple energy predictor performance comparison included in the abstract.processing_r.ipynb
[open in nbviewer] - Jupyter R notebook demonstrating the import and use of the processed dataset in R. Calculates column-wise statistics and extracts and plots the energy harvesting conversion efficiency included in the abstract. Furthermore, the harvested power is analyzed as a function of the ambient light level.Dataset File Lists
Processed Dataset Files
The list of the processed datasets included in the yyyy_mm_processed.tar
archive is provided in yyyy_mm_processed.files.md
. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.
Raw Dataset Files
A list of the raw measurement files included in the yyyy_mm_raw.tar
archive(s) is provided in yyyy_mm_raw.files.md
. The markdown formatted table lists the name of all files, their size in bytes, as well as the SHA-256 sums.
Dataset Revisions
v1.0 (2019-08-03)
Initial release.
Includes the data collected from 2017-07-27 to 2019-08-01. The dataset archive files related to this revision are 2019_08_raw.tar
and 2019_08_processed.tar
.
For position pos06, the measurements from 2018-01-06 00:00:00 to 2018-01-10 00:00:00 are filtered (data inconsistency in file indoor1_p27.rld
).
Dataset Authors, Copyright and License
References
[1] L. Sigrist, A. Gomez, R. Lim, S. Lippuner, M. Leubin, and L. Thiele. Measurement and validation of energy harvesting IoT devices. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[2] ETH Zurich, Computer Engineering Group. RocketLogger Project Website, https://rocketlogger.ethz.ch/.
[3] L. Sigrist. Solar Harvesting and Ambient Tracing Platform, 2019. https://gitlab.ethz.ch/tec/public/employees/sigristl/harvesting_tracing
LSD4WSD V2.0 Learning SAR Dataset for Wet Snow Detection - Full Analysis Version. The aim of this dataset is to provide a basis for automatic learning to detect wet snow. It is based on Sentinel-1 SAR GRD satellite images acquired between August 2020 and August 2021 over the French Alps. The new version of this dataset is no longer simply restricted to a classification task, and provides a set of metadata for each sample. Modification and improvements of the version 2.0.0 : Number of massif: add 7 new massif to cover the all Sentinel-1 images (cf info.pdf
). Acquisition: add images of the descending pass in addition to those originally used in the ascending pass. Sample: reduction in the size of the samples considered to 15 by 15 to facilitate evaluation at the central pixel. Sample: increased density of extracted windows, with a distance of approximately 500 meters between the centers of the windows. Sample: removal of the pre-processing involving the use of logarithms. Sample: removal of the pre-processing involving the normalisation. Labels: new structure for the labels part: dictionary with keys: topography
, metadata
and physics
. Labels: physics
: addition of direct information from the CROCUS model for 3 simulations: Liquid Water Content, snow height and minimum snowpack temperature. Labels: topography
: information on the slope, altitude and average orientation of the sample. Labels: metadata
: information on the date of the sample, the mountain massif and the run (ascending or descending). Dataset: removal of the train/test split* We leave it up to the user to use the Group Kfold method to validate the models using the alpine massif information. Finally, it consists of 2467516 samples of size 15 by 15 by 9. For each sample, the 9 metadata are provided, using in particular the Crocus physical model: topography: elevation (meters) (average), orientation (degrees) (average), slope (degrees) (average), metadata: name of the alpine massif, date of acquisition, type of acquisition (ascending/descending), physics Liquid Water Content (km/m2), snow height (m), minimum snowpack temperature (Celsius degree). The 9 channels are in the following order: Sentinel-1 polarimetric channels: VV, VH and the combination C: VV/VH in linear, Topographical features: altitude, orientation, slope Polarimetric ratio with a reference summer image: VV/VVref, VH/VHref, C/Cref* ** The reference image selected is that of August 9th 2020, as a reference image without snow (cf. Nagler&al) An overview of the distribution and a summary of the sample statistics can be found in the file info.pdf. The data is stored in .hdf5 format with gzip compression. We provide a python script to read and request the data. The script is dataset_load.py. It is based on the h5py, numpy and pandas libraries. It allows to select a part or the whole dataset using requests on the metadata. The script is documented and can be used as described in the README.md file The processing chain is available at the following Github address. The authors would like to acknowledge the support from the National Centre for Space Studies (CNES) in providing computing facilities and access to SAR images via the PEPS platform. The authors would like to deeply thank Mathieu Fructus for running the Crocus simulations. Erratum : In the dataloader file, the name of the "aquisition" column must be added twice, see the correction below.: dtst_ld = Dataset_loader(path_dataset,shuffle=False,descrp=["date","massif","aquisition","aquisition","elevation","slope","orientation","tmin","hsnow","tel",],) If you have any comments, questions or suggestions, please contact the authors: matthieu.gallet@univ-smb.fr fatima.karbou@meteo.fr abdourrahmane.atto@univ-smb.fr emmanuel.trouve@univ-smb.fr
The data provided for this challenge was measured using the Nanoscale-Ordered Materials Diffractometer (NOMAD) at the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory. The data is stored in a hdf5 file following the NeXus standard and can be read with tools built for either. While the NeXus format is self-describing, there is benefit to explaining some details. The data is stored in 4 NXentries in the file. The NXentries that begin with “amorphous_SiO2†are for the amorphous data, and the NXentries that begin with “crystalbolite_SiO2†are for the crystalline material. Solutions that were produced by the scientist are in the entries that end with “_byhand†. Each of the NXdata groups are the plottable data with the “signal†, “axes", and (in the case of by-hand components) “auxiliary_signals†describing which fields should be used. The by-hand component ranges are listed in a “component†attribute of the various signals. The filtered Sr data is the Fourier transform of the combined components. The data can be quickly viewed using tools such as Nexpy or HDFview. Most languages have libraries that can work with HDF5 (eg. H5py for python) a partial list is provided at https://manual.nexusformat.org/utilities.html The data can be quickly viewed using tools such as Nexpy or HDFview. https://neutrons.ornl.gov/nomad https://www.hdfgroup.org/solutions/hdf5 https://www.nexusformat.org/
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition.
Accurate 3D point clouds can (easily and cheaply) be adquired nowdays from different sources:
However there is a lack of large 3D datasets (you can find a good one here based on triangular meshes); it's especially hard to find datasets based on point clouds (wich is the raw output from every 3D sensing device).
This dataset contains 3D point clouds generated from the original images of the MNIST dataset to bring a familiar introduction to 3D to people used to work with 2D datasets (images).
In the 3D_from_2D notebook you can find the code used to generate the dataset.
You can use the code in the notebook to generate a bigger 3D dataset from the original.
The entire dataset stored as 4096-D vectors obtained from the voxelization (x:16, y:16, z:16) of all the 3D point clouds.
In adition to the original point clouds, it contains randomly rotated copies with noise.
The full dataset is splitted into arrays:
Example python code reading the full dataset:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
X_train = hf["X_train"][:]
y_train = hf["y_train"][:]
X_test = hf["X_test"][:]
y_test = hf["y_test"][:]
5000 (train), and 1000 (test) 3D point clouds stored in HDF5 file format. The point clouds have zero mean and a maximum dimension range of 1.
Each file is divided into HDF5 groups
Each group is named as its corresponding array index in the original mnist dataset and it contains:
x, y, z
coordinates of each 3D point in the point cloud.nx, ny, nz
components of the unit normal associate to each point.Example python code reading 2 digits and storing some of the group content in tuples:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
a = hf["0"]
b = hf["1"]
digit_a = (a["img"][:], a["points"][:], a.attrs["label"])
digit_b = (b["img"][:], b["points"][:], b.attrs["label"])
Simple Python class that generates a grid of voxels from the 3D point cloud. Check kernel for use.
Module with functions to plot point clouds and voxelgrid inside jupyter notebook. You have to run this locally due to Kaggle's notebook lack of support to rendering Iframes. See github issue here
Functions included:
array_to_color
Converts 1D array to rgb values use as kwarg color
in plot_points()
plot_points(xyz, colors=None, size=0.1, axis=False)
plot_voxelgrid(v_grid, cmap="Oranges", axis=False)