89 datasets found

Datasets for Evaluation of Multimodal Image Registration
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. http://doi.org/10.5281/zenodo.5557568
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5557568
Dataset updated
Oct 11, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

Aerial data

The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.

Modality A: Near-Infrared (NIR)

Modality B: three colour channels (in B-G-R order)

Cytological data

The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.

Modality A: Fluorescence Images

Modality B: Quantitative Phase Images (QPI)

Histological dataset

For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.

Modality A: Second Harmonic Generation (SHG)

Modality B: Bright-Field (BF)

The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

Radiological data

The Radiological dataset is divided into 3 sub-groups by patient IDs: {109, 106, 003, 006}, {108, 105, 007, 001}, {107, 102, 005, 009}. Since the Radiological dataset is non-isotropic (and also of varying resolution), it is resampled using B-spline interpolation to 1 mm³ cubic voxels, taking explicit care to not resample twice; displaced volumes are transformed and resampled in one step.

Modality A: T1-weighted MRI

Modality B: T2-weighted MRI

(Run make_rire_patches.py to generate the sub-volumes.)

Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.

In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.

Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

Metadata

In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

Filename: identifier(ID) of the image pair

X1_Ref: x-coordinate of the upper-left corner of reference patch I_Ref

Y1_Ref: y-coordinate of the upper-left corner of reference patch I_Ref

X2_Ref: x-coordinate of the lower-left corner of reference patch I_Ref

Y2_Ref: y-coordinate of the lower-left corner of reference patch I_Ref

X3_Ref: x-coordinate of the lower-right corner of reference patch I_Ref

Y3_Ref: y-coordinate of the lower-right corner of reference patch I_Ref

X4_Ref: x-coordinate of the upper-right corner of reference patch I_Ref

Y4_Ref: y-coordinate of the upper-right corner of reference patch I_Ref

X1_Trans: x-coordinate of the upper-left corner of transformed patch I_Init

Y1_Trans: y-coordinate of the upper-left corner of transformed patch I_Init

X2_Trans: x-coordinate of the lower-left corner of transformed patch I_Init

Y2_Trans: y-coordinate of the lower-left corner of transformed patch I_Init

X3_Trans: x-coordinate of the lower-right corner of transformed patch I_Init

Y3_Trans: y-coordinate of the lower-right corner of transformed patch I_Init

X4_Trans: x-coordinate of the upper-right corner of transformed patch I_Init

Y4_Trans: y-coordinate of the upper-right corner of transformed patch I_Init

Displacement: mean Euclidean distance between reference corner points and transformed corner points

RelativeDisplacement: the ratio of displacement to the width/height of image patch

Tx: randomly generated translation in the x-direction to synthesise the transformed patch I_Init

Ty: randomly generated translation in the y-direction to synthesise the transformed patch I_Init

AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch I_Init

AngleRad: randomly generated rotation in radian to synthesise the transformed patch I_Init

In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:

Z1_Ref: z-coordinate of the upper-left corner of reference patch I_Ref

Z2_Ref: z-coordinate of the lower-left corner of reference patch I_Ref

Z3_Ref: z-coordinate of the lower-right corner of reference patch I_Ref

Z4_Ref: z-coordinate of the upper-right corner of reference patch I_Ref

Z1_Trans: z-coordinate of the upper-left corner of transformed patch I_Init

Z2_Trans: z-coordinate of the lower-left corner of transformed patch I_Init

Z3_Trans: z-coordinate of the lower-right corner of transformed patch I_Init

Z4_Trans: z-coordinate of the upper-right corner of transformed patch I_Init

(...and similarly, coordinates of the 5th-8th corners)

Tz: randomly generated translation in z-direction to synthesise the transformed patch I_Init

AngleDegreeX: randomly generated rotation around X-axis in degrees to synthesise the transformed patch I_Init

AngleRadX: randomly generated rotation around X-axis in radian to synthesise the transformed patch I_Init

AngleDegreeY: randomly generated rotation around Y-axis in degrees to synthesise the transformed patch I_Init

AngleRadY: randomly generated rotation around Y-axis in radian to synthesise the transformed patch I_Init

AngleDegreeZ: randomly generated rotation around Z-axis in degrees to synthesise the transformed patch I_Init

AngleRadZ: randomly generated rotation around Z-axis in radian to synthesise the transformed patch I_Init

Naming convention

Aerial Data

zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png

Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.

</li> <li><strong>Cytological data</strong> <ul> <li> <pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre> </li> <li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed
S
School Learning Modalities, 2021-2022
splitgraph.com
healthdata.gov
+5more
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datahub-hhs-gov (2024). School Learning Modalities, 2021-2022 [Dataset]. https://www.splitgraph.com/datahub-hhs-gov/school-learning-modalities-20212022-aitj-yx37/
Explore at:
application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
Dataset updated
Jun 28, 2024
Authors
datahub-hhs-gov
Description
The 2021-2022 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2021-2022 school year and the Fall 2022 semester, from August 2021 – December 2022.

These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021.

School learning modality types are defined as follows:

Data Information

“BI” in the state column refers to school districts funded by the Bureau of Indian Education.

Technical Notes

Data from August 1, 2021 to June 24, 2022 correspond to the 2021-2022 school year. During this time frame, data from the AEI/Return to Learn Tracker and most state dashboards were not available. Inferred modalities with a probability below 0.6 were deemed inconclusive and were omitted. During the Fall 2022 semester, modalities for districts with a school closure reported by Burbio were updated to either “Remote”, if the closure spanned the entire week, or “Hybrid”, if the closure spanned 1-4 days of the week.

Data from August

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
Z
Multimodal Vision-Audio-Language Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schaumlöffel, Timothy; Roig, Gemma; Choksi, Bhavin (2024). Multimodal Vision-Audio-Language Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10060784
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Goethe University Frankfurt
Authors
Schaumlöffel, Timothy; Roig, Gemma; Choksi, Bhavin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation

pip install pandas pyarrow Example

import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])

dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de
S
HA4M - Human Action Multi-Modal Monitoring in Manufacturing
scidb.cn
resodate.org
Updated Jul 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Marani; Laura Romeo; Grazia Cicirelli; Tiziana D'Orazio (2022). HA4M - Human Action Multi-Modal Monitoring in Manufacturing [Dataset]. http://doi.org/10.57760/sciencedb.01872
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.01872
Dataset updated
Jul 6, 2022
Dataset provided by
Science Data Bank
Authors
Roberto Marani; Laura Romeo; Grazia Cicirelli; Tiziana D'Orazio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OverviewThe HA4M dataset is a collection of multi-modal data relative to actions performed by different subjects in an assembly scenario for manufacturing. It has been collected to provide a good test-bed for developing, validating and testing techniques and methodologies for the recognition of assembly actions. To the best of the authors' knowledge, few vision-based datasets exist in the context of object assembly.The HA4M dataset provides a considerable variety of multi-modal data compared to existing datasets. Six types of simultaneous data are supplied: RGB frames, Depth maps, IR frames, RGB-Depth-Aligned frames, Point Clouds and Skeleton data.These data allow the scientific community to make consistent comparisons among processing approaches or machine learning approaches by using one or more data modalities. Researchers in computer vision, pattern recognition and machine learning can use/reuse the data for different investigations in different application domains such as motion analysis, human-robot cooperation, action recognition, and so on.Dataset detailsThe dataset includes 12 assembly actions performed by 41 subjects for building an Epicyclic Gear Train (EGT).The assembly task involves three phases first, the assembly of Block 1 and Block 2 separately, and then the final setting up of both Blocks to build the EGT. The EGT is made up of a total of 12 components divided into two sets: the first eight components for building Block 1 and the remaining four components for Block 2. Finally, two screws are fixed with an Allen Key to assemble the two blocks and thus obtain the EGT.Acquisition setupThe acquisition experiment took place in two laboratories (one in Italy and one in Spain), where an acquisition area was reserved for the experimental setup. A Microsoft Azure Kinect camera acquires videos during the execution of the assembly task. It is placed in front of the operator and the table where the components are spread over. The camera is place on a tripod at an height h of 1.54 m and a distance of 1.78m. The camera is down-tilted by an angle of 17 degrees.Technical informationThe HA4M dataset contains 217 videos of the assembly task performed by 41 subjects (15 females and 26 males). Their ages ranged from 23 to 60. All the subjects participated voluntarily and were provided with a written description of the experiment. Each subject was asked to execute the task several times and to perform the actions at their own convenience (e.g. with both hands), independently from their dominant hand. The HA4M project is a growing project. So new acquisitions, planned in the next future, will expand the current dataset.ActionsTwelve actions are considered in HA4M. Actions from 1 to 4 are needed to build Block 1, then actions from 5 to 8 for building Block 2 and finally, the actions from 9 to 12 for completing the EGT. Actions are listed below:Pick up/Place CarrierPick up/Place Gear Bearings (x3)Pick up/Place Planet Gears (x3)Pick up/Place Carrier ShaftPick up/Place Sun ShaftPick up/Place Sun GearPick up/Place Sun Gear BearingPick up/Place Ring BearPick up Block 2 and place it on Block 1Pick up/Place CoverPick up/Place Screws (x2)Pick up/Place Allen Key, Turn Screws, Return Allen Key and EGTAnnotationData annotation concerns the labeling of the different actions in the video sequences.The annotation of the actions has been manually done by observing the RGB videos, frame by frame. The start frame of each action is identified as the subject starts to move the arm to the component to be grasped. The end frame, instead, is recorded when the subject releases the component, so the next frame becomes the start frame of the subsequent action.The total number of actions annotated in this study is 4123, including the “don't care” action (ID=0) and the action repetitions in the case of actions 2, 3 and 11.Available codeThe dataset has been acquired using the Multiple Azure Kinect GUI software, available at https://gitlab.com/roberto.marani/multiple-azure-kinect-gui, based on the Azure Kinect Sensor SDK v1.4.1 and Azure Kinect Body Tracking SDK v1.1.2.The software records device data to a Matroska (.mkv) file, containing video tracks, IMU samples, and device calibration. In this work, IMU samples are not considered.The same Multiple Azure Kinect GUI software processes the Matroska file and returns the different types of data provided with our dataset: RGB images, RGB-depth-Aligned (RGB-A) images, Depth images, IR images, Point Cloud and Skeleton data.
Multi-modality medical image dataset for medical image processing in Python...
zenodo.org
zip
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Candace Moore; Candace Moore; Giulia Crocioni; Giulia Crocioni (2024). Multi-modality medical image dataset for medical image processing in Python lesson [Dataset]. http://doi.org/10.5281/zenodo.13305760
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13305760
Dataset updated
Aug 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Candace Moore; Candace Moore; Giulia Crocioni; Giulia Crocioni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of medical imaging files for use in the "Medical Image Processing with Python" lesson, developed by the Netherlands eScience Center.

The dataset includes:

SimpleITK compatible files: MRI T1 and CT scans (training_001_mr_T1.mha, training_001_ct.mha), digital X-ray (digital_xray.dcm in DICOM format), neuroimaging data (A1_grayT1.nrrd, A1_grayT2.nrrd). Data have been downloaded from here.

MRI data: a T2-weighted image (OBJECT_phantom_T2W_TSE_Cor_14_1.nii in NIfTI-1 format). Data have been downloaded from here.

Example images for the machine learning lesson: chest X-rays (rotatechest.png, other_op.png), cardiomegaly example (cardiomegaly_cc0.png).

Additional anonymized data: TBA

These files represent various medical imaging modalities and formats commonly used in clinical research and practice. They are intended for educational purposes, allowing students to practice image processing techniques, machine learning applications, and statistical analysis of medical images using Python libraries such as scikit-image, pydicom, and SimpleITK.
D
Replication Data for: When modality and tense meet. The future marker budet...
dataverse.azure.uit.no
dataverse.no
+1more
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elmira Zhamaletdinova; Elmira Zhamaletdinova (2023). Replication Data for: When modality and tense meet. The future marker budet ‘will’ in impersonal constructions with the modal adverb možno ‘be possible’ [Dataset]. http://doi.org/10.18710/MOJBDK
Explore at:
text/comma-separated-values(657010), txt(10575), text/comma-separated-values(54088)Available download formats
Unique identifier
https://doi.org/10.18710/MOJBDK
Dataset updated
Nov 22, 2023
Dataset provided by
DataverseNO
Authors
Elmira Zhamaletdinova; Elmira Zhamaletdinova
License
https://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/MOJBDKhttps://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/MOJBDK
Time period covered
1826 - 2015
Area covered
Russian Federation
Description
Dataset description: This is a study of examples of Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will be,’ i.e., možno + budet + INF and možno + INF. The data was collected in 2020-2021 from the old version of the Russian National Corpus (ruscorpora.ru). In the spreadsheet 01DataMoznoBudet, the data merges the results of four searches conducted to extract examples of sentences with the following construction types: možno + budet + INF.PFV, možno + budet + INF.IPFV, možno + INF.PFV and možno + INF.IPFV. The results for each search were downloaded, pseudorandomized, and the first 200 examples were manually annotated, based on the syntactic analyses given in the corpus. The syntactic and morphological categories used in the corpus are explained here: https://ruscorpora.ru/corpus/main. In the spreadsheet 01DataZavtraMoznoBudet, the data merges the results of four searches conducted to extract examples of sentences with the following structure: zavtra + možno + budet + INF.PFV, zavtra + možno + budet + INF.IPFV, zavtra + možno + INF.PFV and zavtra + možno + INF.IPFV. All of the examples (103 sentences) were imported to a spreadsheet and annotated manually, based on the syntactic analyses given in the corpus. The syntactic and morphological categories used in the corpus are explained here: https://ruscorpora.ru/corpus/main. Article abstract: This paper examines Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will be,’ i.e., možno + budet + INF and možno + INF. My contribution can be summarized as follows. First, corpus-based evidence reveals that možno + INF constructions are vastly more frequent than constructions with copula. Second, the meaning of constructions without the future copula is more flexible: while the possibility is typically located in the present, the situation denoted by the infinitive may be located in the present or the future. Third, I show that the možno + INF construction is more ambiguous and can denote present, gnomic or future situations. Fourth, I identify a number of contextual factors that unambiguously locate the situation in the future. I demonstrate that such factors are more frequently used with the future copula, and thus motivate the choice between the two constructions. Finally, I illustrate the interpretations in a straightforward manner by means of schemas of the type used in cognitive linguistics.
MELD Preprocessed
kaggle.com
zip
Updated Mar 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argish Abhangi (2025). MELD Preprocessed [Dataset]. https://www.kaggle.com/datasets/argish/meld-preprocessed
Explore at:
zip(3527202381 bytes)Available download formats
Dataset updated
Mar 1, 2025
Authors
Argish Abhangi
Description
The MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.

Data Sources

Audio: Waveforms extracted from the original video files.

Video: Video files are processed to sample frames at a target frame rate (default: 2 fps) and to detect faces using a Haar Cascade classifier.

Text: Utterances from the dialogue, which are cleaned using custom encoding functions to fix potential byte encoding issues.

Emotion Labels: Each sample is associated with an emotion label.

Preprocessing Pipeline

The preprocessing script performs several key steps:

Text Cleaning:

fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.

replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).

Audio Processing:

Extracts raw audio waveform from each sample.

Computes a Mel-spectrogram using torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).

Converts the spectrogram to a logarithmic scale for numerical stability.

Video Processing:

Reads video frames at a specified target FPS (default: 2 fps) using OpenCV.

For each video, samples frames evenly based on the original video's FPS.

Applies Haar Cascade face detection on the frames to extract the first detected face.

Resizes the detected face to 224x224 and converts it to RGB. If no face is detected, a default black image (224x224x3) is returned.

Saving Processed Samples:

Each sample is saved as a .pt file in a directory structure split by data type (train, dev, and test).

The filename is derived from the original video filename (e.g., dia0_utt1.mp4 becomes dia0_utt1.pt).

Data Format

Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:

utterance (str): The cleaned textual utterance.

emotion (str/int): The corresponding emotion label.

video_path (str): Original path to the video file from which the sample was extracted.

audio (Tensor): Raw audio waveform tensor of shape [channels, time].

audio_sample_rate (int): The sampling rate of the audio waveform.

audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].

face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.

Directory Structure

The preprocessed files are organized into splits: preprocessed_data/ ├── train/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... ├── dev/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... └── test/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt └── ...

Loading and Using the Dataset

A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:

Dataset Class

from torch.utils.data import Dataset import os import torch class PreprocessedMELDDataset(Dataset): def _init_(self, data_dir): """ Args: data_dir (str): Directory where preprocessed .pt files are stored. """ self.data_dir = data_dir self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')] def _len_(self): return len(self.files) def _getitem_(self, idx): sample_path = self.files[idx] sample = torch.load(sample_path) return sample

Custom Collate Function

def preprocessed_collate_fn(batch): """ Collates a list of sample dictionaries into a single dictionary with keys mapping to lists. Modify this function to pad or stack tensor data if needed. """ collated = {} collated['utterance'] = [sample['utterance'] for sample in batch] collated['emotion'] = [sample['emotion'] for sample in batch] collated['video_path'] = [sample['video_path'] for sample in batch] collated['audio'] = [sample['audio'] for sample in batch] collated['audio_sample_rate'] = batch[0]['audio_sample_rate'] collated['audio_mel'] = [sample['audio_mel'] for sample in batch] collated['face'] = [sample['face'] for sample in batch] return collated

Creating DataLoaders

from torch.utils.data import DataLoader # Define paths for each split train_data_dir = "preprocessed_data/train" dev_data_dir = "preproces...
Z
Data from: MLM: A Benchmark Dataset for Multitask Learning with Multiple...
data.niaid.nih.gov
Updated Jun 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Armitage, Jason; Kacupaj, Endri; Tahmasebzadeh, Golsa; Swati (2020). MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3885752
Explore at:
Dataset updated
Jun 12, 2020
Dataset provided by
Jožef Stefan Institute, Slovenia
TIB – Leibniz InformationCenter for Science andTechnology, Germany
University of Bonn, Germany
Authors
Armitage, Jason; Kacupaj, Endri; Tahmasebzadeh, Golsa; Swati
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

We introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities. The dataset is designed for researchers and developers who build applications that perform multiple tasks on data encountered on the web and in digital archives. The second version of MLM provides a geo-representative subset of the data with weighted samples for countries of the European Union. We demonstrate the value of the resource in developing novel applications in the digital humanities with a motivating use case and specify a benchmark set of tasks to retrieve modalities and locate entities in the dataset. Evaluation of baseline multitask and single-task systems on the full and geo-representative versions of MLM demonstrate the challenges of generalizing on diverse data. In addition to the digital humanities, we expect the resource to contribute to research in multimodal representation learning, location estimation, and scene understanding.

Introduction: Multiple Languages and Modalities comprises data points on 236k human settlements for evaluating and optimizing multitask learning systems. MLM presents a dataset with a high level of diversity in terms of modality and language. For each entity, we have extracted text summaries, images, coordinates, and their respective triple classes. Text summaries are available in three languages (English, French, and German) with each entity having between one and three language entries.

Human settlements from all continents are provided in the overall dataset (MLM) with 72% located in Europe. Two further versions of the dataset - MLM-irle and MLM-irle-gr - were generated for use in the benchmark evaluation for multitask systems described in the paper (see above). MLM-irle-gr (ie geo-representative) was generated to serve organizations that focus on the European Union by providing a geographically balanced coverage of human settlements in this region. MLM-irle-gr contains data on 24k human settlements across the EU weighted in relation to the population count for each of the 28 countries.

MLM contains the following fields:

field-label description

id a unique identifier

label textual label

coordinates longitude, latitude geo-location value

summaries list of textual summaries related to the entity

images list of images related to the entity

6. classes list of associated triple class

MLM - Details by Dataset Version:

Num. of MLM MLM-irle MLM-irle-gr

Entities 236496 218681 22501 Images 412422 314533 31621 Summaries 497899 462328 47508

Triple classes 1685 1655 452

Availability:

All three versions of MLM listed in the table directly above are available for direct download and use. To support findability and sustainability, the MLM dataset is published as an on-line resource at https://doi.org/10.5281/zenodo.3885753. A separate page with detailed explanations and illustrations is available at http://cleopatra.ijs.si/goal-mlm/ to promote ease-of-use. The project GitHub repository contains the complete source code for the system and the generation script is available at https://github.com/GOALCLEOPATRA/MLM. Documentation adheres to the standards of FAIR Data principles with all relevant metadata specified to the research community and users. It is freely accessible under the Creative Commons Attribution 4.0 International license, which makes it reusable for almost any purpose.

Updating and Reusability: MLM is supported by a team of researchers from the University of Bonn, the Leibniz Information Center for Science and Technology, and Jožef Stefan Institute. The resource is already in use for individual projects and as a contribution to the project deliverables of the Marie Skłodowska-Curie CLEOPATRA Innovative Training Network. In addition to the steps above that make the resource available to the wider community, the usage of MLM will be promoted to the network of researchers in this project. Use among researchers and practitioners in digital humanities will be promoted by demonstrations and presentations at domain-related events. Activities are planned for the Digital Methods Summer School run by the University of Amsterdam. The range of modalities and languages present in the dataset also extend its application to research on multimodal representation learning, multilingual machine learning, information retrieval, location estimation, and the Semantic Web. MLM will be supported and maintained for three years in the first instance. A second release of the dataset is already scheduled and the generation process outlined above is designed to enable rapid scaling.

HaDR: Dataset for hands instance segmentation

kaggle.com

zip

Updated Mar 7, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Ales Vysocky (2023). HaDR: Dataset for hands instance segmentation [Dataset]. https://www.kaggle.com/datasets/alevysock/hadr-dataset-for-hands-instance-segmentation

Explore at:

zip(10662295286 bytes)Available download formats

Dataset updated

Mar 7, 2023

Authors

Ales Vysocky

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.

S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826

The HaDR dataset is a multimodal dataset designed for human-robot gesture-based interaction research, consisting of RGB and Depth frames, with binary masks for each hand instance (i1, i2, single class data). The dataset is entirely synthetic, generated using Domain Randomization technique in CoppeliaSim 3D. The dataset can be used to train Deep Learning models to recognize hands using either a single modality (RGB or depth) or both simultaneously. The training-validation split comprises 95K and 22K samples, respectively, with annotations provided in COCO format. The instances are uniformly distributed across the image boundaries. The vision sensor captures depth and color images of the scene, with the depth pixel values scaled into a single channel 8-bit grayscale image in the range [0.2, 1.0] m. The following aspects of the scene were randomly varied during generation of dataset: • Number, colors, textures, scales and types of distractor objects selected from a set of 3D models of general tools and geometric primitives. A special type of distractor – an articulated dummy without hands (for instance-free samples) • Hand gestures (9 options). • Hand models’ positions and orientations. • Texture and surface properties (diffuse, specular and emissive properties) and number (from none to 2) of the object of interest, as well as its background. • Number and locations of directional lights sources (from 1 to 4), in addition to a planar light for ambient illumination. The sample resolution is set to 320×256, encoded in lossless PNG format, and contains only right hand meshes (we suggest using Flip augmentations during training), with a maximum of two instances per sample.

Test dataset (real camera images): Test dataset containing 706 images was captured using a real RGB-D camera (RealSense L515) in a cluttered and unstructured industrial environment. The dataset comprises various scenarios with diverse lighting conditions, backgrounds, obstacles, number of hands, and different types of work gloves (red, green, white, yellow, no gloves) with varying sleeve lengths. The dataset is assumed to have only one user, and the maximum number of hand instances per sample was limited to two. The dataset was manually labelled, and we provide hand instance segmentation COCO annotations in instances_hands_full.json (separately for train and val) and full arm instance annotations in instances_arms_full.json. The sample resolution was set to 640×480, and depth images were encoded in the same way as those of the synthetic dataset.

Channel-wise normalization and standardization parameters for datasets

Dataset	Mean (R, G, B, D)	STD (R, G, B, D)
Train	98.173, 95.456, 93.858, 55.872	67.539, 67.194, 67.796, 47.284
Validation	99.321, 97.284, 96.318, 58.189	67.814, 67.518, 67.576, 47.186
Test	123.675, 116.28, 103.53, 35.3792	58.395, 57.12, 57.375, 45.978

f
Data Sheet 2_Large language models generating synthetic clinical datasets: a...
frontiersin.figshare.com
figshare.com
xlsx
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1533508.s002
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
Z
Data from: Extended datasets from MM-IMDB and Ads-Parallelity dataset with...
data-staging.niaid.nih.gov
zenodo.org
Updated Feb 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi (2023). Extended datasets from MM-IMDB and Ads-Parallelity dataset with the features from Google Cloud Vision API [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7050923
Explore at:
Dataset updated
Feb 24, 2023
Dataset provided by
Hosei University
CyberAgent, Inc.
Authors
Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is extended datasets from MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] dataset with the features from Google Cloud Vision API. These datasets are stored in jsonl (JSON Lines) format.

Abstract (from our paper):

There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM2S2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.

Dataset (MM-IMDB and Ads-Parallelity):

We extended two multimodal datasets, namely, MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] for the empirical experiments. The MM-IMDB dataset contains 25,925 movies with multiple labels (genres). We used the original split provided in the dataset and reported the F1 scores (micro, macro, and samples) of the test set. The Ads-Parallelity dataset contains 670 images and slogans from persuasive advertisements to understand the implicit relationship (parallel and non-parallel) between these two modalities. A binary classification task is used to predict whether the text and image in the same ad convey the same message.

We transformed the following multimodal information (i.e., visual, textual, and categorical data) into textual tokens and fed these into our proposed model. We used the Google Cloud Vision API for the visual features to obtain the following four pieces of information as tokens: (1) text from the OCR, (2) category labels from the label detection, (3) object tags from the object detection, and (4) the number of faces from the facial detection. We input the labels and object detection results as a sequence in order of confidence, as obtained from the API. We describe the visual, textual, and categorical features of each dataset below.

MM-IMDB: We used the title and plot of movies as the textual features, and the aforementioned API results based on poster images as visual features.

Ads-Parallelity: We used the same API-based visual features as in MM-IMDB. Furthermore, we used textual and categorical features consisting of textual inputs of transcriptions and messages, and categorical inputs of natural and text concrete images.
The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...
zenodo.org
bin, csv, zip
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux (2024). The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases, Labeled Images and Captions from Open Access PMC Articles [Dataset]. http://doi.org/10.5281/zenodo.10079370
Explore at:
zip, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10079370
Dataset updated
Jan 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.
Z
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
data.niaid.nih.gov
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yfantidou, Sofia; Karagianni, Christina; Efstathiou, Stefanos; Vakali, Athena; Palotti, Joao; Giakatos, Dimitrios Panteleimon; Marchioro, Thomas; Kazlouski, Andrei; Ferrari, Elena; Girdzijauskas, Šarūnas (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6826682
Explore at:
Dataset updated
Oct 20, 2022
Dataset provided by
University of Insubria
KTH Royal Institute of Technology
Foundation for Research and Technology Hellas
Aristotle University of Thessaloniki
Earkick
Authors
Yfantidou, Sofia; Karagianni, Christina; Efstathiou, Stefanos; Vakali, Athena; Palotti, Joao; Giakatos, Dimitrios Panteleimon; Marchioro, Thomas; Kazlouski, Andrei; Ferrari, Elena; Girdzijauskas, Šarūnas
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id: id (or user_id): type: data: }

Each document consists of four fields: id (also found as user_id in sema and survey collections), type, and data. The _id field is the MongoDB-defined primary key and can be ignored. The id field refers to a user-specific ID used to uniquely identify each user across all collections. The type field refers to the specific data type within the collection, e.g., steps, heart rate, calories, etc. The data field contains the actual information about the document e.g., steps count for a specific timestamp for the steps type, in the form of an embedded object. The contents of the data object are type-dependent, meaning that the fields within the data object are different between different types of data. As mentioned previously, all times are stored in local time, and user IDs are common across different collections. For more information on the available data types, see the related publication.

Surveys Encoding

BREQ2

Why do you engage in exercise?

Code Text engage[SQ001] I exercise because other people say I should engage[SQ002] I feel guilty when I don’t exercise engage[SQ003] I value the benefits of exercise engage[SQ004] I exercise because it’s fun engage[SQ005] I don’t see why I should have to exercise engage[SQ006] I take part in exercise because my friends/family/partner say I should engage[SQ007] I feel ashamed when I miss an exercise session engage[SQ008] It’s important to me to exercise regularly engage[SQ009] I can’t see why I should bother exercising engage[SQ010] I enjoy my exercise sessions engage[SQ011] I exercise because others will not be pleased with me if I don’t engage[SQ012] I don’t see the point in exercising engage[SQ013] I feel like a failure when I haven’t exercised in a while engage[SQ014] I think it is important to make the effort to exercise regularly engage[SQ015] I find exercise a pleasurable activity engage[SQ016] I feel under pressure from my friends/family to exercise engage[SQ017] I get restless if I don’t exercise regularly engage[SQ018] I get pleasure and satisfaction from participating in exercise engage[SQ019] I think exercising is a waste of time

PANAS

Indicate the extent you have felt this way over the past week

P1[SQ001] Interested P1[SQ002] Distressed P1[SQ003] Excited P1[SQ004] Upset P1[SQ005] Strong P1[SQ006] Guilty P1[SQ007] Scared P1[SQ008] Hostile P1[SQ009] Enthusiastic P1[SQ010] Proud P1[SQ011] Irritable P1[SQ012] Alert P1[SQ013] Ashamed P1[SQ014] Inspired P1[SQ015] Nervous P1[SQ016] Determined P1[SQ017] Attentive P1[SQ018] Jittery P1[SQ019] Active P1[SQ020] Afraid

Personality

How Accurately Can You Describe Yourself?

Code Text ipip[SQ001] Am the life of the party. ipip[SQ002] Feel little concern for others. ipip[SQ003] Am always prepared. ipip[SQ004] Get stressed out easily. ipip[SQ005] Have a rich vocabulary. ipip[SQ006] Don't talk a lot. ipip[SQ007] Am interested in people. ipip[SQ008] Leave my belongings around. ipip[SQ009] Am relaxed most of the time. ipip[SQ010] Have difficulty understanding abstract ideas. ipip[SQ011] Feel comfortable around people. ipip[SQ012] Insult people. ipip[SQ013] Pay attention to details. ipip[SQ014] Worry about things. ipip[SQ015] Have a vivid imagination. ipip[SQ016] Keep in the background. ipip[SQ017] Sympathize with others' feelings. ipip[SQ018] Make a mess of things. ipip[SQ019] Seldom feel blue. ipip[SQ020] Am not interested in abstract ideas. ipip[SQ021] Start conversations. ipip[SQ022] Am not interested in other people's problems. ipip[SQ023] Get chores done right away. ipip[SQ024] Am easily disturbed. ipip[SQ025] Have excellent ideas. ipip[SQ026] Have little to say. ipip[SQ027] Have a soft heart. ipip[SQ028] Often forget to put things back in their proper place. ipip[SQ029] Get upset easily. ipip[SQ030] Do not have a good imagination. ipip[SQ031] Talk to a lot of different people at parties. ipip[SQ032] Am not really interested in others. ipip[SQ033] Like order. ipip[SQ034] Change my mood a lot. ipip[SQ035] Am quick to understand things. ipip[SQ036] Don't like to draw attention to myself. ipip[SQ037] Take time out for others. ipip[SQ038] Shirk my duties. ipip[SQ039] Have frequent mood swings. ipip[SQ040] Use difficult words. ipip[SQ041] Don't mind being the centre of attention. ipip[SQ042] Feel others' emotions. ipip[SQ043] Follow a schedule. ipip[SQ044] Get irritated easily. ipip[SQ045] Spend time reflecting on things. ipip[SQ046] Am quiet around strangers. ipip[SQ047] Make people feel at ease. ipip[SQ048] Am exacting in my work. ipip[SQ049] Often feel blue. ipip[SQ050] Am full of ideas.

STAI

Indicate how you feel right now

Code Text STAI[SQ001] I feel calm STAI[SQ002] I feel secure STAI[SQ003] I am tense STAI[SQ004] I feel strained STAI[SQ005] I feel at ease STAI[SQ006] I feel upset STAI[SQ007] I am presently worrying over possible misfortunes STAI[SQ008] I feel satisfied STAI[SQ009] I feel frightened STAI[SQ010] I feel comfortable STAI[SQ011] I feel self-confident STAI[SQ012] I feel nervous STAI[SQ013] I am jittery STAI[SQ014] I feel indecisive STAI[SQ015] I am relaxed STAI[SQ016] I feel content STAI[SQ017] I am worried STAI[SQ018] I feel confused STAI[SQ019] I feel steady STAI[SQ020] I feel pleasant

TTM

Do you engage in regular physical activity according to the definition above? How frequently did each event or experience occur in the past month?

Code Text processes[SQ002] I read articles to learn more about physical
d
Replication Data for: The Choice of Aspect in the Russian Modal Construction...
search.dataone.org
dataverse.no
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bernasconi, Beatrice (2024). Replication Data for: The Choice of Aspect in the Russian Modal Construction with prixodit'sja/prijtis' [Dataset]. http://doi.org/10.18710/KR5RRK
Explore at:
Unique identifier
https://doi.org/10.18710/KR5RRK
Dataset updated
Jan 5, 2024
Dataset provided by
DataverseNO
Authors
Bernasconi, Beatrice
Time period covered
Jan 1, 1950 - Jan 1, 2020
Description
This dataset includes all the data files that were used for the studies in my Master Thesis: "The Choice of Aspect in the Russian Modal Construction with prixodit'sja/prijtis'". The data files are numbered so that they are shown in the same order as they are presented in the thesis. They include the database and the code used for the statistical analysis. Their contents are described in the ReadMe files. The core of the work is a quantitative and empirical study on the choice of aspect by Russian native speakers in the modal construction prixodit’sja/prijtis’ + inf. The hypothesis is that in the modal construction prixodit’sja/prijtis’ + inf the aspect of the infinitive is not fully determined by grammatical context but, to some extent, open to construal. A preliminary analysis was carried out on data gathered from the Russian National Corpus (www.ruscorpora.ru). Four hundred and forty-seven examples with the verb prijtis' were annotated manually for several factors and a statistical test (CART) was run. Results demonstrated that no grammatical factor plays a big role in the use of one aspect rather than the other. Data for this study can be consulted in the files from 01 to 03 and include a ReadMe file, the database in .csv format and the code used for the statistical test. An experiment with native speakers was then carried out. A hundred and ten native speakers of Russian were surveyed and asked to evaluate the acceptability of the infinitive in examples with prixodit’sja/prijtis’ delat’/sdelat’ šag/vid/vybor. The survey presented seventeen examples from the Russian National Corpus that were submitted two times: the first time with the same aspect as in the original version, the second time with the other aspect. Participants had to evaluate each case by choosing among “Impossible”, “Acceptable” and “Excellent” ratings. They were also allowed to give their opinion about the difference between aspects in each example. A Logistic Regression with Mixed Effects was run on the answers. Data for this study can be consulted in the files from 04 to 010 and include a ReadMe file, the text and the answers of the questionnaire, the database in .csv, .txt and pdf formats and the code used for the statistical test. Results showed that prijtis’ often admits both aspects in the infinitive, while prixodit’sja is more restrictive and prefers imperfective. Overall, “Acceptable” and “Excellent” responses were higher than “Impossible” responses for both aspects, even when the aspect evaluated didn’t match with the original. Personal opinions showed that the choice of aspect often depends on the meaning the speaker wants to convey. Only in very few cases the grammatical context was considered to be a constraint on the choice.
Human Bone Fractures (Image Dataset)
kaggle.com
zip
Updated Aug 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Essa (2025). Human Bone Fractures (Image Dataset) [Dataset]. https://www.kaggle.com/datasets/jockeroika/human-bone-fractures-image-dataset
Explore at:
zip(39969682 bytes)Available download formats
Dataset updated
Aug 9, 2025
Authors
Omar Essa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Human Bone Fractures Multi-modal Image Dataset (HBFMID) is a comprehensive medical imaging dataset designed for research and development in bone fracture detection, classification, and localization. Published on December 2, 2024 by Shahnaj Parvin, the dataset integrates both X-ray and MRI modalities, covering a wide range of human skeletal regions.

Dataset Composition Total Raw Images: 641

X-ray Images: 510

MRI Images: 131

Anatomical Regions Covered: Elbow, finger, forearm, humerus, shoulder, femur, shinbone, knee, hipbone, wrist, spinal cord, and other healthy bone samples.

Data Splits Training Set: 449 images → augmented to 1,347 images (×3 augmentation factor)

Validation Set: 128 images

Test Set: 64 images

Total Final Dataset Size: 1,539 images

Pre-processing Steps All images underwent the following pre-processing:

Auto-orientation (correcting rotation/flip metadata)

Resizing to 640 × 640 pixels

Contrast adjustments to enhance bone visibility

Data Augmentation Techniques To improve model generalization, several augmentation methods were applied:

Flip: Horizontal & Vertical

Rotation: Between −5° and +5°

Shear: ±2° (Horizontal & Vertical)

Zooming: 2%

Saturation Adjustment: ±5%

Brightness Adjustment: ±10%

Scaling, Shifting, Shearing, Cropping, Random Rotation
octmnist_shaowen
kaggle.com
zip
Updated Dec 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaowen Huang (2022). octmnist_shaowen [Dataset]. https://www.kaggle.com/datasets/shaowenhuang/octmnist-shaowen
Explore at:
zip(54954840 bytes)Available download formats
Dataset updated
Dec 21, 2022
Authors
Shaowen Huang
Description
MedMNIST2D Data Modality Tasks (# Classes/Labels) # Samples # Training / Validation / Test OCTMNIST Retinal OCT Multi-Class (4) 109,309 97,477 / 10,832 / 1,000
h
rocov2-modality-x-ray
huggingface.co
Updated Sep 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wafaa Abdallah Yassin Fraih (2025). rocov2-modality-x-ray [Dataset]. https://huggingface.co/datasets/WafaaFraih/rocov2-modality-x-ray
Explore at:
Dataset updated
Sep 1, 2025
Authors
Wafaa Abdallah Yassin Fraih
Description
ROCOv2 X-ray Dataset

This dataset contains X-ray imaging data from the ROCOv2 radiology dataset.

Dataset Structure

caption: Medical description text modality: Imaging modality (X-ray) modality_id: Numerical modality ID caption_length: Number of words in caption length_category: Short/medium/long categorization original_index: Index from original dataset

Splits

train: 154 samples val: 19 samples
test: 20 samples

Usage

from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/WafaaFraih/rocov2-modality-x-ray.
MM5: Multimodal Image Dataset
figshare.com
zip
Updated Aug 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Brenner; Napoleon Reyes; Teo Susnjak; Andre Barczak (2025). MM5: Multimodal Image Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28722164.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28722164.v3
Dataset updated
Aug 3, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Martin Brenner; Napoleon Reyes; Teo Susnjak; Andre Barczak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MM5 dataset is a comprehensive multimodal dataset capturing RGB, Depth, Thermal (LWIR), Ultraviolet (UV), and Near-Infrared (NIR) images. It is designed for advanced multimodal research, providing diverse modalities, annotated data, and carefully calibrated and aligned images.For additional scripts, documentation, and usage examples, please visit our GitHub repository: https://github.com/martinbrennernz/MM5-Dataset
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
data.europa.eu
zenodo.org
unknown
Updated Jul 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6832242?locale=fr
Explore at:
unknown(642961582)Available download formats
Dataset updated
Jul 12, 2022
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction. The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication. Data Import: Reading CSV For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command. Data Import: Setting up a MongoDB (Recommended) To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database. To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here. For the Fitbit data, run the following: mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
h
ArchCAD
huggingface.co
Updated Oct 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
luo (2025). ArchCAD [Dataset]. https://huggingface.co/datasets/jackluoluo/ArchCAD
Explore at:
Dataset updated
Oct 19, 2025
Authors
luo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
🏗️ ArchCAD

🇺🇸 English | 🇨🇳 中文说明

A Multimodal CAD Dataset for Vectorized Drawing Understanding

40k Samples · 5 Strictly Aligned Modalities · Foundational Data for AI Understanding of Engineering Drawings

📑 Table of Contents

What is ArchCAD? Key Features Dataset Structure Data Modalities Annotations

Baseline Model: DPSS Potential Applications Citation

📘 What is ArchCAD?

AI systems have long struggled to interpret and utilize CAD… See the full description on the dataset page: https://huggingface.co/datasets/jackluoluo/ArchCAD.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. http://doi.org/10.5281/zenodo.5557568

Datasets for Evaluation of Multimodal Image Registration

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5557568

Dataset updated

Oct 11, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Description

Aerial data
The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.
- Modality A: Near-Infrared (NIR)
- Modality B: three colour channels (in B-G-R order)
Cytological data
The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.
- Modality A: Fluorescence Images
- Modality B: Quantitative Phase Images (QPI)
Histological dataset
For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.
- Modality A: Second Harmonic Generation (SHG)
- Modality B: Bright-Field (BF)

The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

Radiological data
The Radiological dataset is divided into 3 sub-groups by patient IDs: {109, 106, 003, 006}, {108, 105, 007, 001}, {107, 102, 005, 009}. Since the Radiological dataset is non-isotropic (and also of varying resolution), it is resampled using B-spline interpolation to 1 mm³ cubic voxels, taking explicit care to not resample twice; displaced volumes are transformed and resampled in one step.
- Modality A: T1-weighted MRI
- Modality B: T2-weighted MRI

(Run make_rire_patches.py to generate the sub-volumes.)

Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.

In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.

Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

Metadata

In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

Filename: identifier(ID) of the image pair
X1_Ref: x-coordinate of the upper-left corner of reference patch I_Ref
Y1_Ref: y-coordinate of the upper-left corner of reference patch I_Ref
X2_Ref: x-coordinate of the lower-left corner of reference patch I_Ref
Y2_Ref: y-coordinate of the lower-left corner of reference patch I_Ref
X3_Ref: x-coordinate of the lower-right corner of reference patch I_Ref
Y3_Ref: y-coordinate of the lower-right corner of reference patch I_Ref
X4_Ref: x-coordinate of the upper-right corner of reference patch I_Ref
Y4_Ref: y-coordinate of the upper-right corner of reference patch I_Ref
X1_Trans: x-coordinate of the upper-left corner of transformed patch I_Init
Y1_Trans: y-coordinate of the upper-left corner of transformed patch I_Init
X2_Trans: x-coordinate of the lower-left corner of transformed patch I_Init
Y2_Trans: y-coordinate of the lower-left corner of transformed patch I_Init
X3_Trans: x-coordinate of the lower-right corner of transformed patch I_Init
Y3_Trans: y-coordinate of the lower-right corner of transformed patch I_Init
X4_Trans: x-coordinate of the upper-right corner of transformed patch I_Init
Y4_Trans: y-coordinate of the upper-right corner of transformed patch I_Init
Displacement: mean Euclidean distance between reference corner points and transformed corner points
RelativeDisplacement: the ratio of displacement to the width/height of image patch
Tx: randomly generated translation in the x-direction to synthesise the transformed patch I_Init
Ty: randomly generated translation in the y-direction to synthesise the transformed patch I_Init
AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch I_Init
AngleRad: randomly generated rotation in radian to synthesise the transformed patch I_Init

In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:

Z1_Ref: z-coordinate of the upper-left corner of reference patch I_Ref
Z2_Ref: z-coordinate of the lower-left corner of reference patch I_Ref
Z3_Ref: z-coordinate of the lower-right corner of reference patch I_Ref
Z4_Ref: z-coordinate of the upper-right corner of reference patch I_Ref
Z1_Trans: z-coordinate of the upper-left corner of transformed patch I_Init
Z2_Trans: z-coordinate of the lower-left corner of transformed patch I_Init
Z3_Trans: z-coordinate of the lower-right corner of transformed patch I_Init
Z4_Trans: z-coordinate of the upper-right corner of transformed patch I_Init
(...and similarly, coordinates of the 5th-8th corners)
Tz: randomly generated translation in z-direction to synthesise the transformed patch I_Init
AngleDegreeX: randomly generated rotation around X-axis in degrees to synthesise the transformed patch I_Init
AngleRadX: randomly generated rotation around X-axis in radian to synthesise the transformed patch I_Init
AngleDegreeY: randomly generated rotation around Y-axis in degrees to synthesise the transformed patch I_Init
AngleRadY: randomly generated rotation around Y-axis in radian to synthesise the transformed patch I_Init
AngleDegreeZ: randomly generated rotation around Z-axis in degrees to synthesise the transformed patch I_Init
AngleRadZ: randomly generated rotation around Z-axis in radian to synthesise the transformed patch I_Init

Naming convention

Aerial Data

 zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png

Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.

</li>
<li><strong>Cytological data</strong>
<ul>
  <li>
  <pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre>
  </li>
  <li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed

Clear search

Close search

Google apps

Main menu

Datasets for Evaluation of Multimodal Image Registration

School Learning Modalities, 2021-2022

Multimodal Vision-Audio-Language Dataset

HA4M - Human Action Multi-Modal Monitoring in Manufacturing

Multi-modality medical image dataset for medical image processing in Python...

Replication Data for: When modality and tense meet. The future marker budet...

MELD Preprocessed

Data Sources

Preprocessing Pipeline

Data Format

Directory Structure

Loading and Using the Dataset

Dataset Class

Custom Collate Function

Creating DataLoaders

Data from: MLM: A Benchmark Dataset for Multitask Learning with Multiple...

field-label description

6. classes list of associated triple class

Num. of MLM MLM-irle MLM-irle-gr

Triple classes 1685 1655 452

HaDR: Dataset for hands instance segmentation

Data Sheet 2_Large language models generating synthetic clinical datasets: a...

Data from: Extended datasets from MM-IMDB and Ads-Parallelity dataset with...

The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Replication Data for: The Choice of Aspect in the Russian Modal Construction...

Human Bone Fractures (Image Dataset)

octmnist_shaowen

rocov2-modality-x-ray

MM5: Multimodal Image Dataset

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

ArchCAD

Datasets for Evaluation of Multimodal Image Registration