43 datasets found

Z
Apple CT Data: Ground truth reconstructions - 3 of 6
data.niaid.nih.gov
zenodo.org
Updated Mar 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coban, Sophia Bethany (2021). Apple CT Data: Ground truth reconstructions - 3 of 6 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4576077
Explore at:
Dataset updated
Mar 4, 2021
Dataset provided by
Andriiashen, Vladyslav
Ganguly, Poulami Somanya
Coban, Sophia Bethany
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary This submission is a supplementary material to the article [Coban 2020b]. As part of the manuscript, we release three simulated parallel-beam tomographic datasets of 94 apples with internal defects, the ground truth reconstructions and two defect label files.

Description This Zenodo upload contains the ground truth reconstructed slices for each apple. In total, there are 72192 reconstructed slices, which have been divided into 6 separate submissions:

ground_truths_1.zip (1 of 6): 10.5281/zenodo.4550729

ground_truths_2.zip (2 of 6): 10.5281/zenodo.4575904

ground_truths_3.zip (3 of 6): 10.5281/zenodo.4576078 (this upload)

ground_truths_4.zip (4 of 6): 10.5281/zenodo.4576122

ground_truths_5.zip (5 of 6): 10.5281/zenodo.4576202

ground_truths_6.zip (6 of 6): 10.5281/zenodo.4576260

The simulated parallel-beam datasets and defect label files are also available through this project, via a separate Zenodo upload: 10.5281/zenodo.4212301.

Apparatus The dataset is acquired using the custom-built and highly flexible CT scanner, FleX-ray Laboratory, developed by TESCAN-XRE, located at CWI in Amsterdam. This apparatus consists of a cone-beam microfocus X-ray point source that projects polychromatic X-rays onto a 1944-by-1536 pixels, 14-bit, flat detector panel. Full details can be found in [Coban 2020a].

Ground Truth Generation

We reconstructed the raw tomographic data, which was captured at sample resolution of 54.2µm over a 360 degrees in circular and continuous motion in a cone-beam setup. A total of 1200 projections were collected, which were distributed evenly over the full circle. The raw tomographic data is available upon request.

The ground truth reconstructed slices were generated based on Conjugate Gradient Least Squares (CGLS) reconstruction of each apple. The voxel grid in the reconstruction was 972px x 972px x 768px. The resolution in the ground truth reconstructions remained unchanged.

All ground truth reconstructed slices are in .tif format. Each file is named "appleNo_sliceNo.tif".

List of Contents The contents of the submission is given below.

ground_truths_3: This folder contains reconstructed slices of 16 apples

Additional Links These datasets are produced by the Computational Imaging group at Centrum Wiskunde & Informatica (CI-CWI). For any relevant Python/MATLAB scripts for the FleX-ray datasets, we refer the reader to our group's GitHub page.

Contact Details For more information or guidance in using these dataset, please get in touch with

s.b.coban [at] cwi.nl

vladyslav.andriiashen [at] cwi.nl

poulami.ganguly [at] cwi.nl

Acknowledgments We acknowledge GREEFA for supplying the apples and further discussions.
CrossDomainTypes4Py: A Python Dataset for Cross-Domain Evaluation of Type...
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bernd Gruner; Bernd Gruner; Thomas Heinze; Thomas Heinze; Clemens-Alexander Brust; Clemens-Alexander Brust (2022). CrossDomainTypes4Py: A Python Dataset for Cross-Domain Evaluation of Type Inference Systems [Dataset]. http://doi.org/10.5281/zenodo.5747024
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5747024
Dataset updated
Jan 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bernd Gruner; Bernd Gruner; Thomas Heinze; Thomas Heinze; Clemens-Alexander Brust; Clemens-Alexander Brust
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains python repositories mined on GitHub on January 20, 2021. It allows a cross-domain evaluation of type inference systems. For this purpose, it consists of two sub-datasets, each containing only projects from the web or scientific calculation domain, respectively. Therefore we searched for projects with dependencies to either NumPy or Flask. Furthermore, only projects with dependencies to mypy were considered, because this should ensure that at least parts of the projects have type annotations. These can be used later as ground truth. Further details about the dataset will be described in an upcoming paper, as soon as it is published it will be linked here.
The dataset consists of two files for the two sub-datasets. The web domain dataset contains 3129 repositories and the scientific calculation domain dataset contains 4783 repositories. The files have two columns with the URL to the GitHub repository and the used commit hash. Thus, it is possible to download the dataset using shell or python scripts, for example, the pipeline provided by ManyTypes4Py can be used.
If repositories do not exist anymore or are private, you can contact us via the following email address: bernd.gruner@dlr.de. We have a backup of all repositories and will be happy to help you.
T
oxford_iiit_pet
tensorflow.org
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
oxford_iiit_pet [Dataset]. https://www.tensorflow.org/datasets/catalog/oxford_iiit_pet
Explore at:
Description
The Oxford-IIIT pet dataset is a 37 category pet image dataset with roughly 200 images for each class. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed and species. Additionally, head bounding boxes are provided for the training split, allowing using this dataset for simple object detection tasks. In the test split, the bounding boxes are empty.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('oxford_iiit_pet', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
RD Dataset
figshare.com
zip
Updated Sep 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seung Seog Han (2022). RD Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.15170853.v5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.15170853.v5
Dataset updated
Sep 16, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Seung Seog Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
** RD DATASET ** RD dataset was created by the images from the melanoma community on the internet (https://reddit.com/r/melanoma). Consecutive images were included using a python library (https://github.com/aliparlakci/bulk-downloader-for-reddit) from Jan 25, 2020, to July 30, 2021. The ground truth was voted by four dermatologists and one plastic surgeon while referring to the chief complaint and brief history. A total of 1,282 images (1,201 cases) were finally included. Because of the deleted cases by users, the links of 860 cases are valid in July 2021.

RD_RAW.xlsx The download links and ground truth of the RD dataset are included in this excel file. In addition, the raw data of the AI (Model Dermatology Build2021 - https://modelderm.com) and 32 laypersons were included.

v1_public.zip "v1_public.zip" includes the 1,282 lesional images (full-size). The 24 images that were excluded from the study are also available.

v1_private.zip is not available here. Wide field images are not available here. If the archive is needed for research purpose, please email to Dr. Han Seung Seog (whria78@gmail.com) or Dr Cristian Navarrete-Dechent (ctnavarr@gmail.com).

References - The Degradation of Performance of a State-of-the-art Skin Image Classifier When Applied to Patient-driven Internet Search - Scientific Report (in-press)

** Background normal test with the ISIC images ** ISIC dataset (https://www.isic-archive.com; Gallery -> 2018 JID Editorial images; 99 images; ISIC_0024262 and ISIC_0024261 are identical images and ISIC_0024262 was skipped) was used for the background normal test. We defined 10% area rectangle crop to “specialist-size crop”, and 5% area rectangle crop to “layperson-size crop” a) S-crops.zip: specialist-size crops Format: CROPNO_AGE(0~99)_GENDER(1=male,0=female)[m]_FILENAME.png b) L-crops.zip: layperson-size crops Format: CROPNO_AGE(0~99)_GENDER(1=male,0=female)[m]_FILENAME.png c) result_S.zip: Background normal test result using the specialist-size crops d) result_L.zip; Background normal test result using the layperson-size crops

Reference - Automated Dermatological Diagnosis: Hype or Reality? - https://doi.org/10.1016/j.jid.2018.04.040 - Multiclass Artificial Intelligence in Dermatology: Progress but Still Room for Improvement - https://doi.org/10.1016/j.jid.2020.06.040
T
wider_face
tensorflow.org
opendatalab.com
+1more
Updated Dec 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). wider_face [Dataset]. https://www.tensorflow.org/datasets/catalog/wider_face
Explore at:
Dataset updated
Dec 6, 2022
Description
WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('wider_face', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/wider_face-0.1.0.png" alt="Visualization" width="500px">
h
Data from: PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of...
rodare.hzdr.de
zip
Updated Jan 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arbash, Elias; Fuchs, Margret; Rasti, Behnood; Lorenz, Sandra; Ghamisi, Pedram; Gloaguen, Richard (2024). PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit Boards [Dataset]. http://doi.org/10.14278/rodare.2704
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.14278/rodare.2704
Dataset updated
Jan 29, 2024
Dataset provided by
Helmholtz Institute Freiberg for Resource Technology
Authors
Arbash, Elias; Fuchs, Margret; Rasti, Behnood; Lorenz, Sandra; Ghamisi, Pedram; Gloaguen, Richard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PCB-Vision Dataset

Description:

The PCB-Vision dataset is a multiscene RGB-Hyperspectral benchmark dataset comprising 53 Printed Circuit Boards (PCBs). The RGB images are collected using a Teledyne Dalsa C4020 camera on a conveyor belt, while hyperspectral images (HSI) are acquired with a Specim FX10 spectrometer. The HSI data contains 224 bands in the VNIR range [400 - 1000]nm.

Data Format

RGB Images: .png files

PCB Masks: .jpg files

HSI Data: Each hyperspectral data cube is accompanied by a data file and a .hdr file.

Folder Organization

PCBVision

HSI/

53 subfolders (one for each PCB)

'General_masks' folder for 'General' segmentation ground truth

'Monoseg_masks' folder for 'Monoseg' segmentation ground truth

'PCB_Masks' folder for masks of the 53 PCBs in the hyperspectral cube

RGB/

53 .jpg images

'General' folder for RGB images 'General' segmentation ground truth

'Monoseg_masks' folder for RGB images 'Monoseg' segmentation ground truth

Data Classes in Masks

Masks (both 'General' and 'Monoseg') contain 1 to 4 segmentation classes:

0: "Others"

1: "IC"

2: "Capacitors"

3: "Connectors"

Code Repository

To facilitate reading and working with the data, Python codes are available on the GitHub repository:

https://github.com/hifexplo/PCBVision

Citation

If you use this dataset, please cite the following article:

Word:

Arbash, Elias, Fuchs, Margret, Rasti, Behnood, Lorenz, Sandra, Ghamisi, Pedram, & Gloaguen, Richard. (2024). PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit Boards (Version 1) [Data set]. Rodare. http://doi.org/10.14278/rodare.2704

Latex:

@article{arbash2024pcb, title={PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit Boards}, author={Arbash, Elias and Fuchs, Margret and Rasti, Behnood and Lorenz, Sandra and Ghamisi, Pedram and Gloaguen, Richard}, journal={arXiv preprint arXiv:2401.06528}, year={2024} }

Contact

For further information or inquiries, please visit our website:

https://www.iexplo.space/

Contact Email: e.arbash@hzdr.de
h
vqa
huggingface.co
Updated Feb 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Korea Electronics Technology Institute Artificial Intelligence Research Center (2023). vqa [Dataset]. https://huggingface.co/datasets/KETI-AIR/vqa
Explore at:
Dataset updated
Feb 21, 2023
Dataset authored and provided by
Korea Electronics Technology Institute Artificial Intelligence Research Center
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
VQA

What is VQA?

VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. - 265,016 images (COCO and abstract scenes) - At least 3 questions (5.4 questions on average) per image - 10 ground truth answers per question - 3 plausible (but likely incorrect) answers per question - Automatic evaluation metric

Dataset

Details on downloading the latest dataset may be found on the download webpage.

Usage

from datasets import load_dataset raw_datasets = load_dataset( "vqa.py", "base", cache_dir="huggingface_datasets", data_dir="data", ignore_verifications=True, ) dataset_train = raw_datasets["train"] for item in dataset_train: print(item) exit()

v2 = v2.real + v2.abstract (v2.abstract == v1.abstract) v1 = v1.real + v1.abstract v2.abstract.balanced.bin
Data from: EyeFi: Fast Human Identification Through Vision and WiFi-based...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon (2022). EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching [Dataset]. http://doi.org/10.5281/zenodo.7396485
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7396485
Dataset updated
Dec 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon; Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
EyeFi Dataset

This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.

Data Collection Setup

In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.

The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.

To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.

List of Files
Here is a list of files included in the dataset:

|- 1_person |- 1_person_1.h5 |- 1_person_2.h5 |- 2_people |- 2_people_1.h5 |- 2_people_2.h5 |- 2_people_3.h5 |- 3_people |- 3_people_1.h5 |- 3_people_2.h5 |- 3_people_3.h5 |- 5_people |- 5_people_1.h5 |- 5_people_2.h5 |- 5_people_3.h5 |- 5_people_4.h5 |- 10_people |- 10_people_1.h5 |- 10_people_2.h5 |- 10_people_3.h5 |- Kitchen |- 1_person |- kitchen_1_person_1.h5 |- kitchen_1_person_2.h5 |- kitchen_1_person_3.h5 |- 3_people |- kitchen_3_people_1.h5 |- training |- shuffuled_train.h5 |- shuffuled_valid.h5 |- shuffuled_test.h5 View-Dataset-Example.ipynb README.md

In this dataset, folder `1_person/` , `2_people/` , `3_people/` , `5_people/`, and `10_people/` contains data collected from the lab area whereas `Kitchen/` folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.

The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from `1_person/` folder collected in the lab area (`1_person_1.h5` and `1_person_2.h5`).

Why multiple files in one folder?

Each folder contains multiple files. For example, `1_person` folder has two files: `1_person_1.h5` and `1_person_2.h5`. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like `1_person_1.h5`, `1_person_2.h5`) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.

Special note:

For `1_person_1.h5`, this file is generated by the same person who is holding the phone, and `1_person_2.h5` contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.

Access the data
To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.

Each file is structured as (except the files under *"training/"* folder):

|- csi_imag |- csi_real |- nPaths_1 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_2 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_3 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- nPaths_4 |- offset_00 |- spotfi_aoa |- offset_11 |- spotfi_aoa |- offset_12 |- spotfi_aoa |- offset_21 |- spotfi_aoa |- offset_22 |- spotfi_aoa |- num_obj |- obj_0 |- cam_aoa |- coordinates |- obj_1 |- cam_aoa |- coordinates ... |- timestamp

The `csi_real` and `csi_imag` are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 `csi_real` and `csi_imag` values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. `nPaths_x` group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with `x` number of multiple paths specified during calculation. Under the `nPath_x` group are `offset_xx` subgroup where `xx` stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:

|Antennas | Offset 1 (rad) | Offset 2 (rad) | |:-------:|:---------------:|:-------------:| | 1 & 2 | 1.1899 | -2.0071 | 1 & 3 | 1.3883 | -1.8129

The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the `offset_xx` naming. For example, `offset_12` is offset 1 between antenna 1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.

The `num_obj` field is used to store the number of human subjects present in the scene. The `obj_0` is always the subject who is holding the phone. In each file, there are `num_obj` of `obj_x`. For each `obj_x1`, we have the `coordinates` reported from the camera and `cam_aoa`, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the `training` folder) . It reflects the way the person carried the phone moved in the space (for `obj_0`) and everyone else walked (for other `obj_y`, where `y` > 0).

The `timestamp` is provided here for time reference for each WiFi packets.

To access the data (Python):

import h5py data = h5py.File('3_people_3.h5','r') csi_real = data['csi_real'][()] csi_imag = data['csi_imag'][()] cam_aoa = data['obj_0/cam_aoa'][()] cam_loc = data['obj_0/coordinates'][()]

For file inside `training/` folder:

Files inside training folder has a different data structure:

|- nPath-1 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-2 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-3 |- aoa |- csi_imag |- csi_real |- spotfi |- nPath-4 |- aoa |- csi_imag |- csi_real |- spotfi

The group `nPath-x` is the number of multiple path specified during the SpotFi calculation. `aoa` is the camera generated angle of arrival (AoA) (can be considered as ground truth), `csi_image` and `csi_real` is the imaginary and real component of the CSI value. `spotfi` is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across `1_person_1.h5` and `1_person_2.h5`. All the rows under the same `nPath-x` group are aligned (i.e., first row of `aoa` corresponds to the first row of `csi_imag`, `csi_real`, and `spotfi`. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the `1_person_1.h5` and `1_person_2.h5` files.

Citation
If you use the dataset, please cite our paper:

@inproceedings{eyefi2020, title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching}, author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar}, booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)}, year={2020},
Z
Wrist-mounted IMU data towards the investigation of free-living human eating...
data.niaid.nih.gov
explore.openaire.eu
Updated Jun 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Delopoulos, Anastasios (2022). Wrist-mounted IMU data towards the investigation of free-living human eating behavior - the Free-living Food Intake Cycle (FreeFIC) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4420038
Explore at:
Dataset updated
Jun 20, 2022
Dataset provided by
Kyritsis, Konstantinos
Diou, Christos
Delopoulos, Anastasios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The Free-living Food Intake Cycle (FreeFIC) dataset was created by the Multimedia Understanding Group towards the investigation of in-the-wild eating behavior. This is achieved by recording the subjects’ meals as a small part part of their everyday life, unscripted, activities. The FreeFIC dataset contains the (3D) acceleration and orientation velocity signals ((6) DoF) from (22) in-the-wild sessions provided by (12) unique subjects. All sessions were recorded using a commercial smartwatch ((6) using the Huawei Watch 2™ and the MobVoi TicWatch™ for the rest) while the participants performed their everyday activities. In addition, FreeFIC also contains the start and end moments of each meal session as reported by the participants.

Description

FreeFIC includes (22) in-the-wild sessions that belong to (12) unique subjects. Participants were instructed to wear the smartwatch to the hand of their preference well ahead before any meal and continue to wear it throughout the day until the battery is depleted. In addition, we followed a self-report labeling model, meaning that the ground truth is provided from the participant by documenting the start and end moments of their meals to the best of their abilities as well as the hand they wear the smartwatch on. The total duration of the (22) recordings sums up to (112.71) hours, with a mean duration of (5.12) hours. Additional data statistics can be obtained by executing the provided python script stats_dataset.py. Furthermore, the accompanying python script viz_dataset.py will visualize the IMU signals and ground truth intervals for each of the recordings. Information on how to execute the Python scripts can be found below.

The script(s) and the pickle file must be located in the same directory.

Tested with Python 3.6.4

Requirements: Numpy, Pickle and Matplotlib

Calculate and echo dataset statistics

$ python stats_dataset.py

Visualize signals and ground truth

$ python viz_dataset.py

FreeFIC is also tightly related to Food Intake Cycle (FIC), a dataset we created in order to investigate the in-meal eating behavior. More information about FIC can be found here and here.

Publications

If you plan to use the FreeFIC dataset or any of the resources found in this page, please cite our work:

@article{kyritsis2020data,
title={A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches},
author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios},
journal={IEEE Journal of Biomedical and Health Informatics}, year={2020},
publisher={IEEE}}

@inproceedings{kyritsis2017automated, title={Detecting Meals In the Wild Using the Inertial Data of a Typical Smartwatch}, author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios}, booktitle={2019 41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)},
year={2019}, organization={IEEE}}

Technical details

We provide the FreeFIC dataset as a pickle. The file can be loaded using Python in the following way:

import pickle as pkl import numpy as np

with open('./FreeFIC_FreeFIC-heldout.pkl','rb') as fh: dataset = pkl.load(fh)

The dataset variable in the snipet above is a dictionary with (5) keys. Namely:

'subject_id'

'session_id'

'signals_raw'

'signals_proc'

'meal_gt'

The contents under a specific key can be obtained by:

sub = dataset['subject_id'] # for the subject id ses = dataset['session_id'] # for the session id raw = dataset['signals_raw'] # for the raw IMU signals proc = dataset['signals_proc'] # for the processed IMU signals gt = dataset['meal_gt'] # for the meal ground truth

The sub, ses, raw, proc and gt variables in the snipet above are lists with a length equal to (22). Elements across all lists are aligned; e.g., the (3)rd element of the list under the 'session_id' key corresponds to the (3)rd element of the list under the 'signals_proc' key.

sub: list Each element of the sub list is a scalar (integer) that corresponds to the unique identifier of the subject that can take the following values: ([1, 2, 3, 4, 13, 14, 15, 16, 17, 18, 19, 20]). It should be emphasized that the subjects with ids (15, 16, 17, 18, 19) and (20) belong to the held-out part of the FreeFIC dataset (more information can be found in ( )the publication titled "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al). Moreover, the subject identifier in FreeFIC is in-line with the subject identifier in the FIC dataset (more info here and here); i.e., FIC’s subject with id equal to (2) is the same person as FreeFIC’s subject with id equal to (2).

ses: list Each element of this list is a scalar (integer) that corresponds to the unique identifier of the session that can range between (1) and (5). It should be noted that not all subjects have the same number of sessions.

raw: list Each element of this list is dictionary with the 'acc' and 'gyr' keys. The data under the 'acc' key is a (N_{acc} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw accelerometer measurements in (g) (second, third and forth columns - representing the (x, y ) and (z) axis, respectively). The data under the 'gyr' key is a (N_{gyr} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw gyroscope measurements in ({degrees}/{second})(second, third and forth columns - representing the (x, y ) and (z) axis, respectively). All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). Finally, the length of the raw accelerometer and gyroscope numpy.ndarrays is different ((N_{acc} eq N_{gyr})). This behavior is predictable and is caused by the Android platform.

proc: list Each element of this list is an (M\times7) numpy.ndarray that contains the timestamps, (3D) accelerometer and gyroscope measurements for each meal. Specifically, the first column contains the timestamps in seconds, the second, third and forth columns contain the (x,y) and (z) accelerometer values in (g) and the fifth, sixth and seventh columns contain the (x,y) and (z) gyroscope values in ({degrees}/{second}). Unlike elements in the raw list, processed measurements (in the proc list) have a constant sampling rate of (100) Hz and the accelerometer/gyroscope measurements are aligned with each other. In addition, all sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component).

meal_gt: list Each element of this list is a (K\times2) matrix. Each row represents the meal intervals for the specific in-the-wild session. The first column contains the timestamps of the meal start moments whereas the second one the timestamps of the meal end moments. All timestamps are in seconds. The number of meals (K) varies across recordings (e.g., a recording exist where a participant consumed two meals).

Ethics and funding

Informed consent, including permission for third-party access to anonymised data, was obtained from all subjects prior to their engagement in the study. The work has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 727688 - BigO: Big data against childhood obesity.

Contact

Any inquiries regarding the FreeFIC dataset should be addressed to:

Dr. Konstantinos KYRITSIS

Multimedia Understanding Group (MUG) Department of Electrical & Computer Engineering Aristotle University of Thessaloniki University Campus, Building C, 3rd floor Thessaloniki, Greece, GR54124

Tel: +30 2310 996359, 996365 Fax: +30 2310 996398 E-mail: kokirits [at] mug [dot] ee [dot] auth [dot] gr
T
i_naturalist2021
tensorflow.org
Updated Sep 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). i_naturalist2021 [Dataset]. https://www.tensorflow.org/datasets/catalog/i_naturalist2021
Explore at:
Dataset updated
Sep 9, 2023
Description
The iNaturalist dataset 2021 contains a total of 10,000 species. The full training dataset contains nearly 2.7M images. To make the dataset more accessible we have also created a "mini" training dataset with 50 examples per species for a total of 500K images. The full training train split overlaps with the mini split. The val set contains for each species 10 validation images (100K in total). There are a total of 500,000 test images in the public_test split (without ground-truth labels).

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('i_naturalist2021', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/i_naturalist2021-2.0.1.png" alt="Visualization" width="500px">
h
BeetleBox
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BLAZE, BeetleBox [Dataset]. https://huggingface.co/datasets/bug-localization/BeetleBox
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
BLAZE
Description
Dataset Card for Dataset Name

BeetleBox

Dataset Details

The BeetleBox dataset is a comprehensive multi-language, multi-project dataset designed for bug localization research. It includes 26,321 bugs from 29 projects, covering five major programming languages: Java, Python, C++, JavaScript, and Go. The dataset was meticulously curated to ensure accuracy, with a manual analysis revealing an incorrect ground truth rate of only 0.06%.

Dataset Description

The… See the full description on the dataset page: https://huggingface.co/datasets/bug-localization/BeetleBox.
P
LSMI Dataset
paperswithcode.com
Updated Nov 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongyoung Kim; Jinwoo Kim; Seonghyeon Nam; Dongwoo Lee; Yeonkyung Lee; Nahyup Kang; Hyong-Euk Lee; ByungIn Yoo; Jae-Joon Han; Seon Joo Kim (2022). LSMI Dataset [Dataset]. https://paperswithcode.com/dataset/lsmi
Explore at:
Dataset updated
Nov 15, 2022
Authors
Dongyoung Kim; Jinwoo Kim; Seonghyeon Nam; Dongwoo Lee; Yeonkyung Lee; Nahyup Kang; Hyong-Euk Lee; ByungIn Yoo; Jae-Joon Han; Seon Joo Kim
Description
Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination (ICCV 2021)

Change Log LSMI Dataset Version : 1.1

1.0 : LSMI dataset released. (Aug 05, 2021)

1.1 : Add option for saving sub-pair images for 3-illuminant scene (ex. _1,_12,_13) & saving subtracted image (ex. _2,_3,_23) (Feb 20, 2022)

About [Paper] [Project site] [Download Dataset] [Video]

This is an official repository of "Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination", which is accepted as a poster in ICCV 2021.

This repository provides 1. Preprocessing code of "Large Scale Multi Illuminant (LSMI) Dataset" 2. Code of Pixel-level illumination inference U-Net 3. Pre-trained model parameter for testing U-Net

If you use our code or dataset, please cite our paper: @inproceedings{kim2021large, title={Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm Under Mixed Illumination}, author={Kim, Dongyoung and Kim, Jinwoo and Nam, Seonghyeon and Lee, Dongwoo and Lee, Yeonkyung and Kang, Nahyup and Lee, Hyong-Euk and Yoo, ByungIn and Han, Jae-Joon and Kim, Seon Joo}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={2410--2419}, year={2021} }

Requirements Our running environment is as follows:

Python version 3.8.3 Pytorch version 1.7.0 CUDA version 11.2

We provide a docker image, which supports all extra requirements (ex. dcraw,rawpy,tensorboard...), including specified version of python, pytorch, CUDA above.

You can download the docker image here.

The following instructions are assumed to run in a docker container that uses the docker image we provided.

Getting Started Clone this repo In the docker container, clone this repository first.

sh git clone https://github.com/DY112/LSMI-dataset.git

Download the LSMI dataset You should first download the LSMI dataset from here.

The dataset is composed of 3 sub-folers named "galaxy", "nikon", "sony".

Folders named by each camera include several scenes, and each scene folder contains full-resolution RAW files and JPG files that is converted to sRGB color space.

Move all three folders to the root of cloned repository.

In each sub-folders, we provides metadata (meta.json), and train/val/test scene index (split.json).

In meta.json, we provides following informations.

NumOfLights : Number of illuminants in the scene MCCCoord : Locations of Macbeth color chart Light1,2,3 : Normalized chromaticities of each illuminant (calculated through running 1_make_mixture_map.py)

Preprocess the LSMI dataset

Convert raw images to tiff files

To convert original 1-channel bayer-pattern images to 3-channel RGB tiff images, run following code:

sh python 0_cvt2tiff.py You should modify SOURCE and EXT variables properly.

The converted tiff files are generated at the same location as the source file.

This process uses DCRAW command, with '-h -D -4 -T' as options.

There is no black level subtraction, saturated pixel clipping or else.

You can change the parameters as appropriate for your purpose.

Make mixture map sh python 1_make_mixture_map.py Change the CAMERA variable properly to the target directory you want.

This code does the following operations for each scene:

Subtract black level (no saturation clipping) Use Macbeth Color Chart's achromatic patches, find each illuminant's chromaticities Use green channel pixel values, calculate pixel level illuminant mixture map Mask uncalculable pixel positions (which have 0 as value for all scene pairs) to ZERO_MASK

After running this code, npy tpye mixture map data will be generated at each scene's directory.

:warning: If you run this code with ZERO_MASK=-1, the full resolution mixture map may contains -1 for uncalculable pixels. You MUST replace this value appropriately before resizing to prevent this negative value from interpolating with other values.

Crop for train/test U-Net (Optional) sh python 2_preprocess_data.py

This preprocessing code is written only for U-Net, so you can skip this step and freely process the full resolution LSMI set (tiff and npy files).

The image and the mixture map are resized as a square with a length of the SIZE variable inside the code, and the ground-truth image is also generated.

Note that the side of the image will be cropped to make the image shape square.

If you don't want to crop the side of the image and just want to resize whole image anyway, use SQUARE_CROP=False

We set the default test size to 256, and set train size to 512, and SQUARE_CROP=True.

The new dataset is created in a folder with the name of the CAMERA_SIZE. (Ex. galaxy_512)

Use U-Net for pixel-level AWB You can download pre-trained model parameter here.

Pre-trained model is trained on 512x512 data with random crop & random pixel level relighting augmentation method.

Locate downloaded models folder into SVWB_Unet.

Test U-Net sh cd SVWB_Unet sh test.sh

Train U-Net sh cd SVWB_Unet sh train.sh

Dataset License http://creativecommons.org/licenses/by-nc/4.0/">https://i.creativecommons.org/l/by-nc/4.0/88x31.png" />
This work is licensed under a http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License.
c
Data from: ALFA: A Dataset for UAV Fault and Anomaly Detection
kilthub.cmu.edu
zip
Updated Jul 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azarakhsh Keipour; Mohammadreza Mousaei; Sebastian Scherer (2020). ALFA: A Dataset for UAV Fault and Anomaly Detection [Dataset]. http://doi.org/10.1184/R1/12707963.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/12707963.v1
Dataset updated
Jul 31, 2020
Dataset provided by
Carnegie Mellon University
Authors
Azarakhsh Keipour; Mohammadreza Mousaei; Sebastian Scherer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The recent growth in the use of Autonomous Aerial Vehicles (AAVs) has increased concerns about the safety of the autonomous vehicles, the people, and the properties around the flight path and onboard the vehicle. Much research is being done on new regulations, more robust systems are designed to address the concerns, and new methods and algorithms are introduced to detect the potential hardware and software issues. This dataset presents several fault types in control surfaces of a fixed-wing Unmanned Aerial Vehicle (UAV) for use in Fault Detection and Isolation (FDI) and Anomaly Detection (AD) research. Currently, the dataset includes processed data for 47 autonomous flights with 23 sudden full engine failure scenarios and 24 scenarios for seven other types of sudden control surface (actuator) faults, with a total of 66 minutes of flight in normal conditions and 13 minutes of post-fault flight time. It additionally includes many hours of raw data of fully-autonomous, autopilot-assisted and manual flights with tens of fault scenarios. The ground truth of the time and type of faults is provided in each scenario to enable the evaluation of new methods using the dataset. We have also provided the helper tools in several programming languages to load and work with the data and to help the evaluation of a detection method using the dataset. A set of metrics is proposed to help to compare different methods using the dataset. Most of the current fault detection methods are evaluated in simulation and as far as we know, this dataset is the only one providing the real flight data with faults in such capacity. We hope it will help advance the state-of-the-art in Anomaly Detection or FDI research for Autonomous Aerial Vehicles and mobile robots to enhance the safety of autonomous and remote flight operations further. Hardware: The platform used for collecting the dataset is a custom modification of the Carbon Z T-28 model plane. The plane has 2 meters of wingspan, a single electric engine in the front, ailerons, flaperons, an elevator, and a rudder. We equipped the aircraft with a Holybro PX4 2.4.6 autopilot, a Pitot Tube, a GPS module, and an Nvidia Jetson TX2 onboard computer. In addition to the receiver, we also equipped it with a radio for communication with the ground station.Software: The Pixhawk autopilot uses a custom version of Ardupilot/ArduPlane firmware to control the plane in both manual and autonomous modes and to create the simulations. The original firmware is modified from ArduPlane v3.9.0beta1 to allow disabling control surfaces during the flight. The onboard computer uses Robot Operating System(ROS) Kinetic Kame on Linux Ubuntu 16.04 (Xenial) to read the flight and state information from the Pixhawk using MAVROS package (the MAVLink node for ROS). More Information and Supplemental ToolsPlease visit http://theairlab.org/alfa-dataset for more information. It includes the description of each flight sequence, alternative download locations to view and download each individual flight sequence, correct citations to the relevant publications, supplemental code, and an open-source published method using the dataset.The corresponding paper explaining the dataset in more detail is currently under review in the International Journal of Robotics Research (IJRR). The pre-print (arXiv) of the paper can be accessed from our website at http://theairlab.org/alfa-dataset .The supplemental tools for reading and working with the dataset in C++, MATLAB and Python languages can be accessed from https://github.com/castacks/alfa-dataset. The repository also includes a C++ ROS-based tool for evaluating the new methods and all the ROS message type definitions for working directly with the ROS bags. Citing the WorkPlease refer to our website at http://theairlab.org/alfa-dataset to find the correct citation(s) if you are using this dataset.
ALFI dataset (final)
springernature.figshare.com
bin
Updated Sep 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Antonelli; Federica Polverino; Alexandra Albu; Aroj Hada; Italia Asteriti; Francesca Degrassi; Giulia Guarguaglini; Lucia Maddalena; Mario Guarracino (2023). ALFI dataset (final) [Dataset]. http://doi.org/10.6084/m9.figshare.23798451.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23798451.v1
Dataset updated
Sep 21, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Laura Antonelli; Federica Polverino; Alexandra Albu; Aroj Hada; Italia Asteriti; Francesca Degrassi; Giulia Guarguaglini; Lucia Maddalena; Mario Guarracino
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data is divided into three folders:

1) Folder "Data&Annotations" includes images and annotations for all the sequences.

1.1) For sequences MI01 through MI08 (including annotations for both Task 1 and Task 2), the content is, e.g.,

├── Data&Annotations │ ├── MI01 │ │ ├── Images │ │ │ ├── I_MI01_0001.png ... │ │ │ └── I_MI01_0069.png │ │ ├── MI01_DTLTruth.csv │ │ ├── MI01_PhenoTruth.csv │ │ └── Masks │ │ ├── M_MI01_0001.png ... │ │ └── M_MI01_0069.png

For each j, I_MI01_j.png is the j-th image of video MI01 and M_MI01_j.png is the corresponding ground truth segmentation mask.

The MI01_DTLTruth.csv file includes annotation information consisting of a) Number of sequence image (ImNo), b) Cell ID (ID), c) Cell class (Class, either Interphase or Mitosis), d) Bounding box, specified by its upper left corner (xmin, ymin) and its dimensions (width, height), and e) Parent cell ID (Parent).

The MI01_PhenoTruth.csv file includes annotation information consisting of a) Number of sequence image (ImNo), b) Cell ID (ID), c) Cell class (Class, including EarlyMitosis, LateMitosis, CellDeath, and Multipolar), and d) Bounding box, specified by its upper left corner (xmin, ymin) and its dimensions (width, height).

1.2) For sequences CD01 through CD09 and TP01 through TP12 (including only annotations for Task 2), the content is similar, but there is no "Masks" subdirectory nor "*PhenoTruth.csv" file.

2) Folder "UseExamples" includes two usage examples and two directories including their output, in the form of Matlab/Python scripts. ├── UseExamples │ ├── UseExample1.m │ ├── UseExample1.py │ ├── UseExample1.py.readme.txt │ ├── UseExample1Out │ │ ├── TP06BBannots │ │ │ ├── TP06_py_WithBB_0001.png ... │ │ │ └── TP06_py_WithBB_0016.png │ │ │ ├── TP06_WithBB_0001.png ... │ │ │ └── TP06_WithBB_0016.png │ ├── UseExample2.m │ ├── UseExample2Out │ │ ├── MI01Lineage.png │ │ ├── MI01LineageCrops.png │ │ ├── MI02Lineage.png │ │ ├── MI02LineageCrops.png │ │ ├── MI03Lineage.png │ │ ├── MI03LineageCrops.png │ │ ├── MI04Lineage.png │ │ ├── MI04LineageCrops.png │ │ ├── MI05Lineage.png │ │ ├── MI05LineageCrops.png │ │ ├── MI06Lineage.png │ │ ├── MI06LineageCrops.png │ │ ├── MI07Lineage.png │ │ ├── MI07Lineage(Mitoses).png │ │ ├── MI07LineageCrops.png │ │ ├── MI08Lineage.png │ │ ├── MI08Lineage(Mitoses).png │ │ ├── MI08LineageCrops.png

2.1) UseExample1.m is a Matlab script for plotting the bounding box annotations over the original sequence images; the output is included in folder UseExample1Out/TP06BBannots, that contains images TP06_WithBB_0001.png, ..., TP06_WithBB_0016.png. UseExample1.py is the analogous examples written in Python and UseExample1.py.readme.txt is the related readme file; the output is included in folder UseExample1Out/TP06BBannots, that contains images TP06_py_WithBB_0001.png, ..., TP06_py_WithBB_0016.png.

2.2) UseExample2.m is a Matlab script for plotting the cells lineage for one of the sequences MI01-MI08; the output of this example is the lineage representation and image crops of mitotic cells given in the PNG images included in folder UseExample2Out. For the user convenience, we provide the output images for all MI* sequences, without need to run it. Please, observe that, for the very crowded sequences MI07 and MI08, we also also provide the lineage representation reduced only to Mitoses (e.g., MI07Lineage(Mitoses).png), obtained setting the parameter OnlyMitoses = 1; (line 49 of the script).

3) Folder "Videos" includes the 29 videos stored in ND2 format. └── Videos ├── CD01.nd2 ... └── TP10.nd2
Data from: YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and...
zenodo.org
data.niaid.nih.gov
+1more
application/gzip, bin +1
Updated Apr 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Till Grenzdörffer; Martin Günther; Martin Günther; Joachim Hertzberg; Joachim Hertzberg; Till Grenzdörffer (2020). YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation [Dataset]. http://doi.org/10.5281/zenodo.2579173
Explore at:
application/gzip, mp4, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2579173
Dataset updated
Apr 27, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Till Grenzdörffer; Martin Günther; Martin Günther; Joachim Hertzberg; Joachim Hertzberg; Till Grenzdörffer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
While a great variety of 3D cameras have been introduced in recent years, most publicly available datasets for object recognition and pose estimation focus on one single camera. This dataset consists of 32 scenes that have been captured by 7 different 3D cameras, totaling 49,294 frames. This allows evaluating the sensitivity of pose estimation algorithms to the specifics of the used camera and the development of more robust algorithms that are more independent of the camera model. Vice versa, our dataset enables researchers to perform a quantitative comparison of the data from several different cameras and depth sensing technologies and evaluate their algorithms before selecting a camera for their specific task. The scenes in our dataset contain 20 different objects from the common benchmark YCB object and model set. We provide full ground truth 6DoF poses for each object, per-pixel segmentation, 2D and 3D bounding boxes and a measure of the amount of occlusion of each object.

If you use this dataset in your research, please cite the following publication:

T. Grenzdörffer, M. Günther, and J. Hertzberg, “YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31-June 4, 2020. IEEE, 2020.

@InProceedings{Grenzdoerffer2020ycbm, title = {{YCB-M}: A Multi-Camera {RGB-D} Dataset for Object Recognition and {6DoF} Pose Estimation}, author = {Grenzd{\"{o}}rffer, Till and G{\"{u}}nther, Martin and Hertzberg, Joachim}, booktitle = {2020 {IEEE} International Conference on Robotics and Automation, {ICRA} 2020, Paris, France, May 31-June 4, 2020}, year = {2020}, publisher = {{IEEE}} }

This paper is also available on arXiv: https://arxiv.org/abs/2004.11657

To visualize the dataset, follow these instructions (tested on Ubuntu Xenial 16.04):

# IMPORTANT: the ROS setup.bash must NOT be sourced, otherwise the following error occurs: # ImportError: /opt/ros/kinetic/lib/python2.7/dist-packages/cv2.so: undefined symbol: PyCObject_Type # nvdu requires Python 3.5 or 3.6 sudo add-apt-repository -y ppa:deadsnakes/ppa # to get python3.6 on Ubuntu Xenial sudo apt-get update sudo apt-get install -y python3.6 libsm6 libxext6 libxrender1 python-virtualenv python-pip # create a new virtual environment virtualenv -p python3.6 venv_nvdu cd venv_nvdu/ source bin/activate # clone our fork of NVIDIA's Dataset Utilities that incorporates some essential fixes pip install -e 'git+https://github.com/mintar/Dataset_Utilities.git#egg=nvdu' # download and transform the meshes # (alternatively, unzip the meshes contained in the dataset # to

For further details, see README.md.
T
web_graph
tensorflow.org
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). web_graph [Dataset]. http://identifiers.org/arxiv:2112.02194
Explore at:
Unique identifier
https://identifiers.org/arxiv:2112.02194
Dataset updated
Nov 23, 2022
Description
This dataset contains a sparse graph representing web link structure for a small subset of the Web.

Its a processed version of a single crawl performed by CommonCrawl in 2021 where we strip everything and keep only the link->outlinks structure. The final dataset is basically int -> List[int] format with each integer id representing a url.

Also, in order to increase the value of this resource, we created 6 different version of WebGraph, each varying in the sparsity pattern and locale. We took the following processing steps, in order:

We started with WAT files from June 2021 crawl.

Since the outlinks in HTTP-Response-Metadata are stored as relative paths, we convert them to absolute paths using urllib after validating each link.

To study locale-specific graphs, we further filter based on 2 top level domains: ‘de’ and ‘in’, each producing a graph with an order of magnitude less number of nodes.

These graphs can still have arbitrary sparsity patterns and dangling links. Thus we further filter the nodes in each graph to have minimum of K ∈ [10, 50] inlinks and outlinks. Note that we only do this processing once, thus this is still an approximation i.e. the resulting graph might have nodes with less than K links.

Using both locale and count filters, we finalize 6 versions of WebGraph dataset, summarized in the folling table.

Version Top level domain Min count Num nodes Num edges
sparse 10 365.4M 30B
dense 50 136.5M 22B
de-sparse de 10 19.7M 1.19B
de-dense de 50 5.7M 0.82B
in-sparse in 10 1.5M 0.14B
in-dense in 50 0.5M 0.12B

All versions of the dataset have following features:

"row_tag": a unique identifier of the row (source link).

"col_tag": a list of unique identifiers of non-zero columns (dest outlinks).

"gt_tag": a list of unique identifiers of non-zero columns used as ground truth (dest outlinks), empty for train/train_t splits.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('web_graph', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
P
Bluesky Social Dataset Dataset
paperswithcode.com
Updated Apr 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Bluesky Social Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/bluesky-social-dataset
Explore at:
Dataset updated
Apr 28, 2024
Description
Bluesky Social Dataset Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.

The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.

Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions and time of bookmarking.

This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection, and performing content virality and diffusion analysis.

Dataset Here is a description of the dataset files.

followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers u, v, representing a directed following relation (i.e., user u follows user v). posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in 100 files, each containing the full posts of up to 50,000 users. Each post is stored as a JSON-formatted line. interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers, and represents a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author ,quoted_author, and date. graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread. feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author); feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values, namely the feed name, the user id, and the timestamp. feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp; scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.

Citation If used for research purposes, please cite the following paper describing the dataset details:

Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data. (2024) arXiv:2404.18984

Acknowledgments: This work is supported by :

the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”, Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu); SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021; EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
Z
Data from: Dataset for Vehicle Indoor Positioning in Industrial Environments...
data.niaid.nih.gov
producciocientifica.uv.es
+1more
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivo Silva (2024). Dataset for Vehicle Indoor Positioning in Industrial Environments with Wi-Fi, inertial, and odometry data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7826539
Explore at:
Dataset updated
Feb 12, 2024
Dataset provided by
Adriano Moreira
Joaquín Torres-Sospedra
Cristiano Pendão
Ivo Silva
Description
Dataset collected in an indoor industrial environment using a mobile unit (manually pushed trolley) that resembles an industrial vehicle equipped with several sensors, namely, Wi-Fi, wheel encoder (displacement), and Inertial Measurement Unit (IMU).

Sensors were connected to a Raspberry Pi (RPi 3B +), which collected the data from the sensors. Ground truth information was obtained with video camera pointed towards the floor, registering the times when the trolley passed by reference tags.

List of sensors:

4x Wi-Fi interfaces: Edimax EW7811-Un

2x IMUs: Adafruit BNO055

1x Absolute Encoder: US Digital A2 (attached to a wheel with a diameter of 125 mm)

This dataset includes:

1x Wi-Fi radio map that can be used for Wi-Fi fingerprinting.

6x Trajectories: including sensor data + ground truth.

APs Information: list of APs in the building, including their position and transmission channel.

Floor plan: image of the building's floor plan with obstacles and non-navigable areas.

Python package provided for:

parsing the dataset into a data structure (Pandas dataframes).

performing statistical analysis on the data (number of samples, time difference between consecutive samples, etc.).

computing Dead Reckoning trajectory from a provided initial position.

computing Wi-Fi fingerprinting position estimates.

determining positioning error in Dead Reckoning and Wi-Fi fingerprinting.

generating plots including the floor plan of the building, dead reckoning trajectories, and CDFs.

When using this dataset, please cite its data description paper:

Silva , I.; Pendão, C.; Torres-Sospedra, J.; Moreira, A. Industrial Environment Multi-Sensor Dataset for Vehicle Indoor Tracking with Wi-Fi, Inertial and Odometry Data. Data 2023, 8, 157. https://doi.org/10.3390/data8100157
Z
DustNet - structured data and Python code to reproduce the model,...
data.niaid.nih.gov
zenodo.org
Updated Jul 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DustNet - structured data and Python code to reproduce the model, statistical analysis and figures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10631953
Explore at:
Dataset updated
Jul 7, 2024
Dataset provided by
Simmons, Benno I.
Augousti, Andy T.
Nowak, T. E.
Siegert, Stefan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and Python code used for AOD prediction with DustNet model - a Machine Learning/AI based forecasting.

Model input data and code

Processed MODIS AOD data (from Aqua and Terra) and selected ERA5 variables* ready to reproduce the DustNet model results or for similar forecasting with Machine Learning. These long-term daily timeseries (2003-2022) are provided as n-dimensional NumPy arrays. The Python code to handle the data and run the DustNet model** is included as Jupyter Notebook ‘DustNet_model_code.ipynb’. A subfolder with normalised and split data into training/validation/testing sets is also provided with Python code for two additional ML based models** used for comparison (U-NET and Conv2D). Pre-trained models are also archived here as TensorFlow files.

Model output data and code

This dataset was constructed by running the ‘DustNet_model_code.ipynb’ (see above). It consists of 1095 days of forecased AOD data (2020-2022) by CAMS, DustNet model, naïve prediction (persistence) and gridded climatology. The ground truth raw AOD data form MODIS is provided for comparison and statystical analysis of predictions. It is intended for a quick reproduction of figures and statystical analysis presented in DustNet introducing paper.

*datasets are NumPy arrays (v1.23) created in Python v3.8.18.

**all ML models were created with Keras in Python v3.10.10.
u
WEISS Catheter Segmentation in Fluoroscopy Dataset
rdr.ucl.ac.uk
png
Updated Nov 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evangelos Mazomenos; Danail Stoyanov; Marta Gherardini (2023). WEISS Catheter Segmentation in Fluoroscopy Dataset [Dataset]. http://doi.org/10.5522/04/24624243.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.5522/04/24624243.v1
Dataset updated
Nov 27, 2023
Dataset provided by
University College London
Authors
Evangelos Mazomenos; Danail Stoyanov; Marta Gherardini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains fluoroscopy images extracted from four videos of canulation experiments with an aorta phantom and six videos of in-vivo catheterisation procedures: four Transcatheter Aortic Valve Implantations (TAVI) and two diagnostic catheterisation procedures. Please refer to the README.docxThe Phantom.hdf5 file contains the 2000 (Dataset-2 in the paper) images extracted from the four fluoroscopy videos from catheterization experiments carried out on a silicon aorta phantom in an angiography suite.The T1T2.hdf5 and T3-T6.hdf5 files contain images extracted from the six fluoroscopy videos during in-vivo endovascular operations (Dataset-3 in the paper). Specifically, 836 frames were extracted from TAVI (data groups T1, T2, T3 andT4) and 371 from diagnostic catheterization (data groups T5 andT6). Each data group contains the following number of images: T1 – 286, T2 – 150, T3 – 200, T4 – 200, T5 – 143, T6 – 228.Binary segmentation masks of the interventional catheter are provided as ground truth. A semiautomated tracking method with manual initialisation (http://ieeexplore.ieee.org/document/7381624/) was employed to obtain the catheter annotations as the 2D coordinates of the catheter restricted to a manually selected region of interest (ROI). The method employs a b-spline tube model as a prior for the catheter shape to restrict the search space and deal with potential missing measurements. This is combined with a probabilistic framework that estimates the pixel-wise posteriors between the foreground (catheter) and background delimited by the b-spline tube contour. The output of the algorithm was manually checked and corrected to provide the final catheter segmentation.The annotations are provided in the files: “Phantom_label.hdf5”, “T1T2_label.hdf5” and “T3-T6_label.hdf5”. All annotations consist of full-scale (256x256 px) binary masks where background pixels have a “0” value, while a value equal to “1” denotes the catheter pixels.Example python code (MAIN.py) is provided to access the data and the labels and visualize them.Citing the datasetThe dataset should be cited using its DOI whenever research making use of this dataset is reported in any academic publication or research report. Please also cite the following publication:Marta Gherardini, Evangelos Mazomenos, Arianna Menciassi, Danail Stoyanov, “Catheter segmentation in X-ray fluoroscopy using synthetic data and transfer learning with light U-nets”, Computer Methods and Programs in Biomedicine, Volume 192, Aug 2020, 105420, doi:10.1016/j.cmpb.2020.105420.To find out more about our research team, visit the Surgical Robot Vision and Wellcome/EPSRC Centre for Interventional and Surgical Science websites.

Version	Top level domain	Min count	Num nodes	Num edges
sparse		10	365.4M	30B
dense		50	136.5M	22B
de-sparse	de	10	19.7M	1.19B
de-dense	de	50	5.7M	0.82B
in-sparse	in	10	1.5M	0.14B
in-dense	in	50	0.5M	0.12B

Facebook

Twitter

Click to copy link

Link copied

Cite

Coban, Sophia Bethany (2021). Apple CT Data: Ground truth reconstructions - 3 of 6 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4576077

Apple CT Data: Ground truth reconstructions - 3 of 6

Explore at:

Dataset updated

Mar 4, 2021

Dataset provided by

Andriiashen, Vladyslav
Ganguly, Poulami Somanya
Coban, Sophia Bethany

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Summary This submission is a supplementary material to the article [Coban 2020b]. As part of the manuscript, we release three simulated parallel-beam tomographic datasets of 94 apples with internal defects, the ground truth reconstructions and two defect label files.

Description This Zenodo upload contains the ground truth reconstructed slices for each apple. In total, there are 72192 reconstructed slices, which have been divided into 6 separate submissions:

ground_truths_1.zip (1 of 6): 10.5281/zenodo.4550729

ground_truths_2.zip (2 of 6): 10.5281/zenodo.4575904

ground_truths_3.zip (3 of 6): 10.5281/zenodo.4576078 (this upload)

ground_truths_4.zip (4 of 6): 10.5281/zenodo.4576122

ground_truths_5.zip (5 of 6): 10.5281/zenodo.4576202

ground_truths_6.zip (6 of 6): 10.5281/zenodo.4576260

The simulated parallel-beam datasets and defect label files are also available through this project, via a separate Zenodo upload: 10.5281/zenodo.4212301.

Apparatus The dataset is acquired using the custom-built and highly flexible CT scanner, FleX-ray Laboratory, developed by TESCAN-XRE, located at CWI in Amsterdam. This apparatus consists of a cone-beam microfocus X-ray point source that projects polychromatic X-rays onto a 1944-by-1536 pixels, 14-bit, flat detector panel. Full details can be found in [Coban 2020a].

Ground Truth Generation

We reconstructed the raw tomographic data, which was captured at sample resolution of 54.2µm over a 360 degrees in circular and continuous motion in a cone-beam setup. A total of 1200 projections were collected, which were distributed evenly over the full circle. The raw tomographic data is available upon request.

The ground truth reconstructed slices were generated based on Conjugate Gradient Least Squares (CGLS) reconstruction of each apple. The voxel grid in the reconstruction was 972px x 972px x 768px. The resolution in the ground truth reconstructions remained unchanged.

All ground truth reconstructed slices are in .tif format. Each file is named "appleNo_sliceNo.tif".

List of Contents The contents of the submission is given below.

ground_truths_3: This folder contains reconstructed slices of 16 apples

Additional Links These datasets are produced by the Computational Imaging group at Centrum Wiskunde & Informatica (CI-CWI). For any relevant Python/MATLAB scripts for the FleX-ray datasets, we refer the reader to our group's GitHub page.

Contact Details For more information or guidance in using these dataset, please get in touch with

s.b.coban [at] cwi.nl

vladyslav.andriiashen [at] cwi.nl

poulami.ganguly [at] cwi.nl

Acknowledgments We acknowledge GREEFA for supplying the apples and further discussions.

Clear search

Close search

Google apps

Main menu

Apple CT Data: Ground truth reconstructions - 3 of 6

CrossDomainTypes4Py: A Python Dataset for Cross-Domain Evaluation of Type...

oxford_iiit_pet

RD Dataset

wider_face

Data from: PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of...

vqa

VQA

What is VQA?

Dataset

Usage

Data from: EyeFi: Fast Human Identification Through Vision and WiFi-based...

Wrist-mounted IMU data towards the investigation of free-living human eating...

The script(s) and the pickle file must be located in the same directory.

Tested with Python 3.6.4

Requirements: Numpy, Pickle and Matplotlib

Calculate and echo dataset statistics

Visualize signals and ground truth

i_naturalist2021

BeetleBox

LSMI Dataset

Data from: ALFA: A Dataset for UAV Fault and Anomaly Detection

ALFI dataset (final)

Data from: YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and...

web_graph

Bluesky Social Dataset Dataset

Data from: Dataset for Vehicle Indoor Positioning in Industrial Environments...

DustNet - structured data and Python code to reproduce the model,...

WEISS Catheter Segmentation in Fluoroscopy Dataset

Apple CT Data: Ground truth reconstructions - 3 of 6See More Versions

Apple CT Data: Ground truth reconstructions - 3 of 6