Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modality A: Near-Infrared (NIR)
Modality B: three colour channels (in B-G-R order)
Modality A: Fluorescence Images
Modality B: Quantitative Phase Images (QPI)
Modality A: Second Harmonic Generation (SHG)
Modality B: Bright-Field (BF)
The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.
Modality A: T1-weighted MRI
Modality B: T2-weighted MRI
(Run make_rire_patches.py to generate the sub-volumes.)
Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.
In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.
Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.
Metadata
In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:
Filename: identifier(ID) of the image pair
X1_Ref: x-coordinate of the upper-left corner of reference patch IRef
Y1_Ref: y-coordinate of the upper-left corner of reference patch IRef
X2_Ref: x-coordinate of the lower-left corner of reference patch IRef
Y2_Ref: y-coordinate of the lower-left corner of reference patch IRef
X3_Ref: x-coordinate of the lower-right corner of reference patch IRef
Y3_Ref: y-coordinate of the lower-right corner of reference patch IRef
X4_Ref: x-coordinate of the upper-right corner of reference patch IRef
Y4_Ref: y-coordinate of the upper-right corner of reference patch IRef
X1_Trans: x-coordinate of the upper-left corner of transformed patch IInit
Y1_Trans: y-coordinate of the upper-left corner of transformed patch IInit
X2_Trans: x-coordinate of the lower-left corner of transformed patch IInit
Y2_Trans: y-coordinate of the lower-left corner of transformed patch IInit
X3_Trans: x-coordinate of the lower-right corner of transformed patch IInit
Y3_Trans: y-coordinate of the lower-right corner of transformed patch IInit
X4_Trans: x-coordinate of the upper-right corner of transformed patch IInit
Y4_Trans: y-coordinate of the upper-right corner of transformed patch IInit
Displacement: mean Euclidean distance between reference corner points and transformed corner points
RelativeDisplacement: the ratio of displacement to the width/height of image patch
Tx: randomly generated translation in the x-direction to synthesise the transformed patch IInit
Ty: randomly generated translation in the y-direction to synthesise the transformed patch IInit
AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch IInit
AngleRad: randomly generated rotation in radian to synthesise the transformed patch IInit
In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:
Naming convention
zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png
zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.</li>
<li><strong>Cytological data</strong>
<ul>
<li>
<pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre>
</li>
<li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed
Facebook
TwitterThe 2021-2022 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2021-2022 school year and the Fall 2022 semester, from August 2021 – December 2022.
These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021.
School learning modality types are defined as follows:
Data Information
“BI” in the state column refers to school districts funded by the Bureau of Indian Education.
Technical Notes
Data from August 1, 2021 to June 24, 2022 correspond to the 2021-2022 school year. During this time frame, data from the AEI/Return to Learn Tracker and most state dashboards were not available. Inferred modalities with a probability below 0.6 were deemed inconclusive and were omitted. During the Fall 2022 semester, modalities for districts with a school closure reported by Burbio were updated to either “Remote”, if the closure spanned the entire week, or “Hybrid”, if the closure spanned 1-4 days of the week.
Data from August
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation
pip install pandas pyarrow Example
import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])
dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewThe HA4M dataset is a collection of multi-modal data relative to actions performed by different subjects in an assembly scenario for manufacturing. It has been collected to provide a good test-bed for developing, validating and testing techniques and methodologies for the recognition of assembly actions. To the best of the authors' knowledge, few vision-based datasets exist in the context of object assembly.The HA4M dataset provides a considerable variety of multi-modal data compared to existing datasets. Six types of simultaneous data are supplied: RGB frames, Depth maps, IR frames, RGB-Depth-Aligned frames, Point Clouds and Skeleton data.These data allow the scientific community to make consistent comparisons among processing approaches or machine learning approaches by using one or more data modalities. Researchers in computer vision, pattern recognition and machine learning can use/reuse the data for different investigations in different application domains such as motion analysis, human-robot cooperation, action recognition, and so on.Dataset detailsThe dataset includes 12 assembly actions performed by 41 subjects for building an Epicyclic Gear Train (EGT).The assembly task involves three phases first, the assembly of Block 1 and Block 2 separately, and then the final setting up of both Blocks to build the EGT. The EGT is made up of a total of 12 components divided into two sets: the first eight components for building Block 1 and the remaining four components for Block 2. Finally, two screws are fixed with an Allen Key to assemble the two blocks and thus obtain the EGT.Acquisition setupThe acquisition experiment took place in two laboratories (one in Italy and one in Spain), where an acquisition area was reserved for the experimental setup. A Microsoft Azure Kinect camera acquires videos during the execution of the assembly task. It is placed in front of the operator and the table where the components are spread over. The camera is place on a tripod at an height h of 1.54 m and a distance of 1.78m. The camera is down-tilted by an angle of 17 degrees.Technical informationThe HA4M dataset contains 217 videos of the assembly task performed by 41 subjects (15 females and 26 males). Their ages ranged from 23 to 60. All the subjects participated voluntarily and were provided with a written description of the experiment. Each subject was asked to execute the task several times and to perform the actions at their own convenience (e.g. with both hands), independently from their dominant hand. The HA4M project is a growing project. So new acquisitions, planned in the next future, will expand the current dataset.ActionsTwelve actions are considered in HA4M. Actions from 1 to 4 are needed to build Block 1, then actions from 5 to 8 for building Block 2 and finally, the actions from 9 to 12 for completing the EGT. Actions are listed below:Pick up/Place CarrierPick up/Place Gear Bearings (x3)Pick up/Place Planet Gears (x3)Pick up/Place Carrier ShaftPick up/Place Sun ShaftPick up/Place Sun GearPick up/Place Sun Gear BearingPick up/Place Ring BearPick up Block 2 and place it on Block 1Pick up/Place CoverPick up/Place Screws (x2)Pick up/Place Allen Key, Turn Screws, Return Allen Key and EGTAnnotationData annotation concerns the labeling of the different actions in the video sequences.The annotation of the actions has been manually done by observing the RGB videos, frame by frame. The start frame of each action is identified as the subject starts to move the arm to the component to be grasped. The end frame, instead, is recorded when the subject releases the component, so the next frame becomes the start frame of the subsequent action.The total number of actions annotated in this study is 4123, including the “don't care” action (ID=0) and the action repetitions in the case of actions 2, 3 and 11.Available codeThe dataset has been acquired using the Multiple Azure Kinect GUI software, available at https://gitlab.com/roberto.marani/multiple-azure-kinect-gui, based on the Azure Kinect Sensor SDK v1.4.1 and Azure Kinect Body Tracking SDK v1.1.2.The software records device data to a Matroska (.mkv) file, containing video tracks, IMU samples, and device calibration. In this work, IMU samples are not considered.The same Multiple Azure Kinect GUI software processes the Matroska file and returns the different types of data provided with our dataset: RGB images, RGB-depth-Aligned (RGB-A) images, Depth images, IR images, Point Cloud and Skeleton data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of medical imaging files for use in the "Medical Image Processing with Python" lesson, developed by the Netherlands eScience Center.
The dataset includes:
These files represent various medical imaging modalities and formats commonly used in clinical research and practice. They are intended for educational purposes, allowing students to practice image processing techniques, machine learning applications, and statistical analysis of medical images using Python libraries such as scikit-image, pydicom, and SimpleITK.
Facebook
Twitterhttps://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/MOJBDKhttps://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/MOJBDK
Dataset description: This is a study of examples of Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will be,’ i.e., možno + budet + INF and možno + INF. The data was collected in 2020-2021 from the old version of the Russian National Corpus (ruscorpora.ru). In the spreadsheet 01DataMoznoBudet, the data merges the results of four searches conducted to extract examples of sentences with the following construction types: možno + budet + INF.PFV, možno + budet + INF.IPFV, možno + INF.PFV and možno + INF.IPFV. The results for each search were downloaded, pseudorandomized, and the first 200 examples were manually annotated, based on the syntactic analyses given in the corpus. The syntactic and morphological categories used in the corpus are explained here: https://ruscorpora.ru/corpus/main. In the spreadsheet 01DataZavtraMoznoBudet, the data merges the results of four searches conducted to extract examples of sentences with the following structure: zavtra + možno + budet + INF.PFV, zavtra + možno + budet + INF.IPFV, zavtra + možno + INF.PFV and zavtra + možno + INF.IPFV. All of the examples (103 sentences) were imported to a spreadsheet and annotated manually, based on the syntactic analyses given in the corpus. The syntactic and morphological categories used in the corpus are explained here: https://ruscorpora.ru/corpus/main. Article abstract: This paper examines Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will be,’ i.e., možno + budet + INF and možno + INF. My contribution can be summarized as follows. First, corpus-based evidence reveals that možno + INF constructions are vastly more frequent than constructions with copula. Second, the meaning of constructions without the future copula is more flexible: while the possibility is typically located in the present, the situation denoted by the infinitive may be located in the present or the future. Third, I show that the možno + INF construction is more ambiguous and can denote present, gnomic or future situations. Fourth, I identify a number of contextual factors that unambiguously locate the situation in the future. I demonstrate that such factors are more frequently used with the future copula, and thus motivate the choice between the two constructions. Finally, I illustrate the interpretations in a straightforward manner by means of schemas of the type used in cognitive linguistics.
Facebook
TwitterThe MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.
The preprocessing script performs several key steps:
Text Cleaning:
fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).Audio Processing:
torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).Video Processing:
Saving Processed Samples:
.pt file in a directory structure split by data type (train, dev, and test).dia0_utt1.mp4 becomes dia0_utt1.pt).Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:
utterance (str): The cleaned textual utterance.emotion (str/int): The corresponding emotion label.video_path (str): Original path to the video file from which the sample was extracted.audio (Tensor): Raw audio waveform tensor of shape [channels, time].audio_sample_rate (int): The sampling rate of the audio waveform.audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.The preprocessed files are organized into splits:
preprocessed_data/
├── train/
│ ├── dia0_utt0.pt
│ ├── dia1_utt1.pt
│ └── ...
├── dev/
│ ├── dia0_utt0.pt
│ ├── dia1_utt1.pt
│ └── ...
└── test/
│ ├── dia0_utt0.pt
│ ├── dia1_utt1.pt
└── ...
A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:
from torch.utils.data import Dataset
import os
import torch
class PreprocessedMELDDataset(Dataset):
def _init_(self, data_dir):
"""
Args:
data_dir (str): Directory where preprocessed .pt files are stored.
"""
self.data_dir = data_dir
self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')]
def _len_(self):
return len(self.files)
def _getitem_(self, idx):
sample_path = self.files[idx]
sample = torch.load(sample_path)
return sample
def preprocessed_collate_fn(batch):
"""
Collates a list of sample dictionaries into a single dictionary with keys mapping to lists.
Modify this function to pad or stack tensor data if needed.
"""
collated = {}
collated['utterance'] = [sample['utterance'] for sample in batch]
collated['emotion'] = [sample['emotion'] for sample in batch]
collated['video_path'] = [sample['video_path'] for sample in batch]
collated['audio'] = [sample['audio'] for sample in batch]
collated['audio_sample_rate'] = batch[0]['audio_sample_rate']
collated['audio_mel'] = [sample['audio_mel'] for sample in batch]
collated['face'] = [sample['face'] for sample in batch]
return collated
from torch.utils.data import DataLoader
# Define paths for each split
train_data_dir = "preprocessed_data/train"
dev_data_dir = "preproces...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
We introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities. The dataset is designed for researchers and developers who build applications that perform multiple tasks on data encountered on the web and in digital archives. The second version of MLM provides a geo-representative subset of the data with weighted samples for countries of the European Union. We demonstrate the value of the resource in developing novel applications in the digital humanities with a motivating use case and specify a benchmark set of tasks to retrieve modalities and locate entities in the dataset. Evaluation of baseline multitask and single-task systems on the full and geo-representative versions of MLM demonstrate the challenges of generalizing on diverse data. In addition to the digital humanities, we expect the resource to contribute to research in multimodal representation learning, location estimation, and scene understanding.
Introduction: Multiple Languages and Modalities comprises data points on 236k human settlements for evaluating and optimizing multitask learning systems. MLM presents a dataset with a high level of diversity in terms of modality and language. For each entity, we have extracted text summaries, images, coordinates, and their respective triple classes. Text summaries are available in three languages (English, French, and German) with each entity having between one and three language entries.
Human settlements from all continents are provided in the overall dataset (MLM) with 72% located in Europe. Two further versions of the dataset - MLM-irle and MLM-irle-gr - were generated for use in the benchmark evaluation for multitask systems described in the paper (see above). MLM-irle-gr (ie geo-representative) was generated to serve organizations that focus on the European Union by providing a geographically balanced coverage of human settlements in this region. MLM-irle-gr contains data on 24k human settlements across the EU weighted in relation to the population count for each of the 28 countries.
MLM contains the following fields:
MLM - Details by Dataset Version:
Entities 236496 218681 22501 Images 412422 314533 31621 Summaries 497899 462328 47508
Availability:
All three versions of MLM listed in the table directly above are available for direct download and use. To support findability and sustainability, the MLM dataset is published as an on-line resource at https://doi.org/10.5281/zenodo.3885753. A separate page with detailed explanations and illustrations is available at http://cleopatra.ijs.si/goal-mlm/ to promote ease-of-use. The project GitHub repository contains the complete source code for the system and the generation script is available at https://github.com/GOALCLEOPATRA/MLM. Documentation adheres to the standards of FAIR Data principles with all relevant metadata specified to the research community and users. It is freely accessible under the Creative Commons Attribution 4.0 International license, which makes it reusable for almost any purpose.
Updating and Reusability: MLM is supported by a team of researchers from the University of Bonn, the Leibniz Information Center for Science and Technology, and Jožef Stefan Institute. The resource is already in use for individual projects and as a contribution to the project deliverables of the Marie Skłodowska-Curie CLEOPATRA Innovative Training Network. In addition to the steps above that make the resource available to the wider community, the usage of MLM will be promoted to the network of researchers in this project. Use among researchers and practitioners in digital humanities will be promoted by demonstrations and presentations at domain-related events. Activities are planned for the Digital Methods Summer School run by the University of Amsterdam. The range of modalities and languages present in the dataset also extend its application to research on multimodal representation learning, multilingual machine learning, information retrieval, location estimation, and the Semantic Web. MLM will be supported and maintained for three years in the first instance. A second release of the dataset is already scheduled and the generation process outlined above is designed to enable rapid scaling.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.
S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826
The HaDR dataset is a multimodal dataset designed for human-robot gesture-based interaction research, consisting of RGB and Depth frames, with binary masks for each hand instance (i1, i2, single class data). The dataset is entirely synthetic, generated using Domain Randomization technique in CoppeliaSim 3D. The dataset can be used to train Deep Learning models to recognize hands using either a single modality (RGB or depth) or both simultaneously. The training-validation split comprises 95K and 22K samples, respectively, with annotations provided in COCO format. The instances are uniformly distributed across the image boundaries. The vision sensor captures depth and color images of the scene, with the depth pixel values scaled into a single channel 8-bit grayscale image in the range [0.2, 1.0] m. The following aspects of the scene were randomly varied during generation of dataset: • Number, colors, textures, scales and types of distractor objects selected from a set of 3D models of general tools and geometric primitives. A special type of distractor – an articulated dummy without hands (for instance-free samples) • Hand gestures (9 options). • Hand models’ positions and orientations. • Texture and surface properties (diffuse, specular and emissive properties) and number (from none to 2) of the object of interest, as well as its background. • Number and locations of directional lights sources (from 1 to 4), in addition to a planar light for ambient illumination. The sample resolution is set to 320×256, encoded in lossless PNG format, and contains only right hand meshes (we suggest using Flip augmentations during training), with a maximum of two instances per sample.
Test dataset (real camera images): Test dataset containing 706 images was captured using a real RGB-D camera (RealSense L515) in a cluttered and unstructured industrial environment. The dataset comprises various scenarios with diverse lighting conditions, backgrounds, obstacles, number of hands, and different types of work gloves (red, green, white, yellow, no gloves) with varying sleeve lengths. The dataset is assumed to have only one user, and the maximum number of hand instances per sample was limited to two. The dataset was manually labelled, and we provide hand instance segmentation COCO annotations in instances_hands_full.json (separately for train and val) and full arm instance annotations in instances_arms_full.json. The sample resolution was set to 640×480, and depth images were encoded in the same way as those of the synthetic dataset.
Channel-wise normalization and standardization parameters for datasets
| Dataset | Mean (R, G, B, D) | STD (R, G, B, D) |
|---|---|---|
| Train | 98.173, 95.456, 93.858, 55.872 | 67.539, 67.194, 67.796, 47.284 |
| Validation | 99.321, 97.284, 96.318, 58.189 | 67.814, 67.518, 67.576, 47.186 |
| Test | 123.675, 116.28, 103.53, 35.3792 | 58.395, 57.12, 57.375, 45.978 |
If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.
S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is extended datasets from MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] dataset with the features from Google Cloud Vision API. These datasets are stored in jsonl (JSON Lines) format.
Abstract (from our paper):
There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM2S2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.
Dataset (MM-IMDB and Ads-Parallelity):
We extended two multimodal datasets, namely, MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] for the empirical experiments. The MM-IMDB dataset contains 25,925 movies with multiple labels (genres). We used the original split provided in the dataset and reported the F1 scores (micro, macro, and samples) of the test set. The Ads-Parallelity dataset contains 670 images and slogans from persuasive advertisements to understand the implicit relationship (parallel and non-parallel) between these two modalities. A binary classification task is used to predict whether the text and image in the same ad convey the same message.
We transformed the following multimodal information (i.e., visual, textual, and categorical data) into textual tokens and fed these into our proposed model. We used the Google Cloud Vision API for the visual features to obtain the following four pieces of information as tokens: (1) text from the OCR, (2) category labels from the label detection, (3) object tags from the object detection, and (4) the number of faces from the facial detection. We input the labels and object detection results as a sequence in order of confidence, as obtained from the API. We describe the visual, textual, and categorical features of each dataset below.
MM-IMDB: We used the title and plot of movies as the textual features, and the aforementioned API results based on poster images as visual features.
Ads-Parallelity: We used the same API-based visual features as in MM-IMDB. Furthermore, we used textual and categorical features consisting of textual inputs of transcriptions and messages, and categorical inputs of natural and text concrete images.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.
Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.
Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.
For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.
Facebook
TwitterLifeSnaps Dataset Documentation
Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.
The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.
Data Import: Reading CSV
For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.
Data Import: Setting up a MongoDB (Recommended)
To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.
To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.
For the Fitbit data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
For the SEMA data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c sema
For surveys data, run the following:
mongorestore --host localhost:27017 -d rais_anonymized -c surveys
If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.
Data Availability
The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:
{ _id: id (or user_id): type: data: }
Each document consists of four fields: id (also found as user_id in sema and survey collections), type, and data. The _id field is the MongoDB-defined primary key and can be ignored. The id field refers to a user-specific ID used to uniquely identify each user across all collections. The type field refers to the specific data type within the collection, e.g., steps, heart rate, calories, etc. The data field contains the actual information about the document e.g., steps count for a specific timestamp for the steps type, in the form of an embedded object. The contents of the data object are type-dependent, meaning that the fields within the data object are different between different types of data. As mentioned previously, all times are stored in local time, and user IDs are common across different collections. For more information on the available data types, see the related publication.
Surveys Encoding
BREQ2
Why do you engage in exercise?
Code
Text
engage[SQ001]
I exercise because other people say I should
engage[SQ002]
I feel guilty when I don’t exercise
engage[SQ003]
I value the benefits of exercise
engage[SQ004]
I exercise because it’s fun
engage[SQ005]
I don’t see why I should have to exercise
engage[SQ006]
I take part in exercise because my friends/family/partner say I should
engage[SQ007]
I feel ashamed when I miss an exercise session
engage[SQ008]
It’s important to me to exercise regularly
engage[SQ009]
I can’t see why I should bother exercising
engage[SQ010]
I enjoy my exercise sessions
engage[SQ011]
I exercise because others will not be pleased with me if I don’t
engage[SQ012]
I don’t see the point in exercising
engage[SQ013]
I feel like a failure when I haven’t exercised in a while
engage[SQ014]
I think it is important to make the effort to exercise regularly
engage[SQ015]
I find exercise a pleasurable activity
engage[SQ016]
I feel under pressure from my friends/family to exercise
engage[SQ017]
I get restless if I don’t exercise regularly
engage[SQ018]
I get pleasure and satisfaction from participating in exercise
engage[SQ019]
I think exercising is a waste of time
PANAS
Indicate the extent you have felt this way over the past week
P1[SQ001]
Interested
P1[SQ002]
Distressed
P1[SQ003]
Excited
P1[SQ004]
Upset
P1[SQ005]
Strong
P1[SQ006]
Guilty
P1[SQ007]
Scared
P1[SQ008]
Hostile
P1[SQ009]
Enthusiastic
P1[SQ010]
Proud
P1[SQ011]
Irritable
P1[SQ012]
Alert
P1[SQ013]
Ashamed
P1[SQ014]
Inspired
P1[SQ015]
Nervous
P1[SQ016]
Determined
P1[SQ017]
Attentive
P1[SQ018]
Jittery
P1[SQ019]
Active
P1[SQ020]
Afraid
Personality
How Accurately Can You Describe Yourself?
Code
Text
ipip[SQ001]
Am the life of the party.
ipip[SQ002]
Feel little concern for others.
ipip[SQ003]
Am always prepared.
ipip[SQ004]
Get stressed out easily.
ipip[SQ005]
Have a rich vocabulary.
ipip[SQ006]
Don't talk a lot.
ipip[SQ007]
Am interested in people.
ipip[SQ008]
Leave my belongings around.
ipip[SQ009]
Am relaxed most of the time.
ipip[SQ010]
Have difficulty understanding abstract ideas.
ipip[SQ011]
Feel comfortable around people.
ipip[SQ012]
Insult people.
ipip[SQ013]
Pay attention to details.
ipip[SQ014]
Worry about things.
ipip[SQ015]
Have a vivid imagination.
ipip[SQ016]
Keep in the background.
ipip[SQ017]
Sympathize with others' feelings.
ipip[SQ018]
Make a mess of things.
ipip[SQ019]
Seldom feel blue.
ipip[SQ020]
Am not interested in abstract ideas.
ipip[SQ021]
Start conversations.
ipip[SQ022]
Am not interested in other people's problems.
ipip[SQ023]
Get chores done right away.
ipip[SQ024]
Am easily disturbed.
ipip[SQ025]
Have excellent ideas.
ipip[SQ026]
Have little to say.
ipip[SQ027]
Have a soft heart.
ipip[SQ028]
Often forget to put things back in their proper place.
ipip[SQ029]
Get upset easily.
ipip[SQ030]
Do not have a good imagination.
ipip[SQ031]
Talk to a lot of different people at parties.
ipip[SQ032]
Am not really interested in others.
ipip[SQ033]
Like order.
ipip[SQ034]
Change my mood a lot.
ipip[SQ035]
Am quick to understand things.
ipip[SQ036]
Don't like to draw attention to myself.
ipip[SQ037]
Take time out for others.
ipip[SQ038]
Shirk my duties.
ipip[SQ039]
Have frequent mood swings.
ipip[SQ040]
Use difficult words.
ipip[SQ041]
Don't mind being the centre of attention.
ipip[SQ042]
Feel others' emotions.
ipip[SQ043]
Follow a schedule.
ipip[SQ044]
Get irritated easily.
ipip[SQ045]
Spend time reflecting on things.
ipip[SQ046]
Am quiet around strangers.
ipip[SQ047]
Make people feel at ease.
ipip[SQ048]
Am exacting in my work.
ipip[SQ049]
Often feel blue.
ipip[SQ050]
Am full of ideas.
STAI
Indicate how you feel right now
Code
Text
STAI[SQ001]
I feel calm
STAI[SQ002]
I feel secure
STAI[SQ003]
I am tense
STAI[SQ004]
I feel strained
STAI[SQ005]
I feel at ease
STAI[SQ006]
I feel upset
STAI[SQ007]
I am presently worrying over possible misfortunes
STAI[SQ008]
I feel satisfied
STAI[SQ009]
I feel frightened
STAI[SQ010]
I feel comfortable
STAI[SQ011]
I feel self-confident
STAI[SQ012]
I feel nervous
STAI[SQ013]
I am jittery
STAI[SQ014]
I feel indecisive
STAI[SQ015]
I am relaxed
STAI[SQ016]
I feel content
STAI[SQ017]
I am worried
STAI[SQ018]
I feel confused
STAI[SQ019]
I feel steady
STAI[SQ020]
I feel pleasant
TTM
Do you engage in regular physical activity according to the definition above? How frequently did each event or experience occur in the past month?
Code
Text
processes[SQ002]
I read articles to learn more about physical
Facebook
TwitterThis dataset includes all the data files that were used for the studies in my Master Thesis: "The Choice of Aspect in the Russian Modal Construction with prixodit'sja/prijtis'". The data files are numbered so that they are shown in the same order as they are presented in the thesis. They include the database and the code used for the statistical analysis. Their contents are described in the ReadMe files. The core of the work is a quantitative and empirical study on the choice of aspect by Russian native speakers in the modal construction prixodit’sja/prijtis’ + inf. The hypothesis is that in the modal construction prixodit’sja/prijtis’ + inf the aspect of the infinitive is not fully determined by grammatical context but, to some extent, open to construal. A preliminary analysis was carried out on data gathered from the Russian National Corpus (www.ruscorpora.ru). Four hundred and forty-seven examples with the verb prijtis' were annotated manually for several factors and a statistical test (CART) was run. Results demonstrated that no grammatical factor plays a big role in the use of one aspect rather than the other. Data for this study can be consulted in the files from 01 to 03 and include a ReadMe file, the database in .csv format and the code used for the statistical test. An experiment with native speakers was then carried out. A hundred and ten native speakers of Russian were surveyed and asked to evaluate the acceptability of the infinitive in examples with prixodit’sja/prijtis’ delat’/sdelat’ šag/vid/vybor. The survey presented seventeen examples from the Russian National Corpus that were submitted two times: the first time with the same aspect as in the original version, the second time with the other aspect. Participants had to evaluate each case by choosing among “Impossible”, “Acceptable” and “Excellent” ratings. They were also allowed to give their opinion about the difference between aspects in each example. A Logistic Regression with Mixed Effects was run on the answers. Data for this study can be consulted in the files from 04 to 010 and include a ReadMe file, the text and the answers of the questionnaire, the database in .csv, .txt and pdf formats and the code used for the statistical test. Results showed that prijtis’ often admits both aspects in the infinitive, while prixodit’sja is more restrictive and prefers imperfective. Overall, “Acceptable” and “Excellent” responses were higher than “Impossible” responses for both aspects, even when the aspect evaluated didn’t match with the original. Personal opinions showed that the choice of aspect often depends on the meaning the speaker wants to convey. Only in very few cases the grammatical context was considered to be a constraint on the choice.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Human Bone Fractures Multi-modal Image Dataset (HBFMID) is a comprehensive medical imaging dataset designed for research and development in bone fracture detection, classification, and localization. Published on December 2, 2024 by Shahnaj Parvin, the dataset integrates both X-ray and MRI modalities, covering a wide range of human skeletal regions.
Dataset Composition Total Raw Images: 641
X-ray Images: 510
MRI Images: 131
Anatomical Regions Covered: Elbow, finger, forearm, humerus, shoulder, femur, shinbone, knee, hipbone, wrist, spinal cord, and other healthy bone samples.
Data Splits Training Set: 449 images → augmented to 1,347 images (×3 augmentation factor)
Validation Set: 128 images
Test Set: 64 images
Total Final Dataset Size: 1,539 images
Pre-processing Steps All images underwent the following pre-processing:
Auto-orientation (correcting rotation/flip metadata)
Resizing to 640 × 640 pixels
Contrast adjustments to enhance bone visibility
Data Augmentation Techniques To improve model generalization, several augmentation methods were applied:
Flip: Horizontal & Vertical
Rotation: Between −5° and +5°
Shear: ±2° (Horizontal & Vertical)
Zooming: 2%
Saturation Adjustment: ±5%
Brightness Adjustment: ±10%
Scaling, Shifting, Shearing, Cropping, Random Rotation
Facebook
TwitterMedMNIST2D Data Modality Tasks (# Classes/Labels) # Samples # Training / Validation / Test OCTMNIST Retinal OCT Multi-Class (4) 109,309 97,477 / 10,832 / 1,000
Facebook
TwitterROCOv2 X-ray Dataset
This dataset contains X-ray imaging data from the ROCOv2 radiology dataset.
Dataset Structure
caption: Medical description text modality: Imaging modality (X-ray) modality_id: Numerical modality ID caption_length: Number of words in caption length_category: Short/medium/long categorization original_index: Index from original dataset
Splits
train: 154 samples
val: 19 samples
test: 20 samples
Usage
from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/WafaaFraih/rocov2-modality-x-ray.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MM5 dataset is a comprehensive multimodal dataset capturing RGB, Depth, Thermal (LWIR), Ultraviolet (UV), and Near-Infrared (NIR) images. It is designed for advanced multimodal research, providing diverse modalities, annotated data, and carefully calibrated and aligned images.For additional scripts, documentation, and usage examples, please visit our GitHub repository: https://github.com/martinbrennernz/MM5-Dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LifeSnaps Dataset Documentation Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction. The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication. Data Import: Reading CSV For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command. Data Import: Setting up a MongoDB (Recommended) To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database. To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here. For the Fitbit data, run the following: mongorestore --host localhost:27017 -d rais_anonymized -c fitbit
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
🏗️ ArchCAD
🇺🇸 English | 🇨🇳 中文说明
A Multimodal CAD Dataset for Vectorized Drawing Understanding
40k Samples · 5 Strictly Aligned Modalities · Foundational Data for AI Understanding of Engineering Drawings
📑 Table of Contents
What is ArchCAD? Key Features Dataset Structure Data Modalities Annotations
Baseline Model: DPSS Potential Applications Citation
📘 What is ArchCAD?
AI systems have long struggled to interpret and utilize CAD… See the full description on the dataset page: https://huggingface.co/datasets/jackluoluo/ArchCAD.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modality A: Near-Infrared (NIR)
Modality B: three colour channels (in B-G-R order)
Modality A: Fluorescence Images
Modality B: Quantitative Phase Images (QPI)
Modality A: Second Harmonic Generation (SHG)
Modality B: Bright-Field (BF)
The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.
Modality A: T1-weighted MRI
Modality B: T2-weighted MRI
(Run make_rire_patches.py to generate the sub-volumes.)
Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.
In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.
Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.
Metadata
In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:
Filename: identifier(ID) of the image pair
X1_Ref: x-coordinate of the upper-left corner of reference patch IRef
Y1_Ref: y-coordinate of the upper-left corner of reference patch IRef
X2_Ref: x-coordinate of the lower-left corner of reference patch IRef
Y2_Ref: y-coordinate of the lower-left corner of reference patch IRef
X3_Ref: x-coordinate of the lower-right corner of reference patch IRef
Y3_Ref: y-coordinate of the lower-right corner of reference patch IRef
X4_Ref: x-coordinate of the upper-right corner of reference patch IRef
Y4_Ref: y-coordinate of the upper-right corner of reference patch IRef
X1_Trans: x-coordinate of the upper-left corner of transformed patch IInit
Y1_Trans: y-coordinate of the upper-left corner of transformed patch IInit
X2_Trans: x-coordinate of the lower-left corner of transformed patch IInit
Y2_Trans: y-coordinate of the lower-left corner of transformed patch IInit
X3_Trans: x-coordinate of the lower-right corner of transformed patch IInit
Y3_Trans: y-coordinate of the lower-right corner of transformed patch IInit
X4_Trans: x-coordinate of the upper-right corner of transformed patch IInit
Y4_Trans: y-coordinate of the upper-right corner of transformed patch IInit
Displacement: mean Euclidean distance between reference corner points and transformed corner points
RelativeDisplacement: the ratio of displacement to the width/height of image patch
Tx: randomly generated translation in the x-direction to synthesise the transformed patch IInit
Ty: randomly generated translation in the y-direction to synthesise the transformed patch IInit
AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch IInit
AngleRad: randomly generated rotation in radian to synthesise the transformed patch IInit
In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:
Naming convention
zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png
zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.</li>
<li><strong>Cytological data</strong>
<ul>
<li>
<pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre>
</li>
<li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed