89 datasets found
  1. Datasets for Evaluation of Multimodal Image Registration

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. http://doi.org/10.5281/zenodo.5557568
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    • Aerial data
    • The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.
      • Modality A: Near-Infrared (NIR)

      • Modality B: three colour channels (in B-G-R order)

    • Cytological data
    • The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.
      • Modality A: Fluorescence Images

      • Modality B: Quantitative Phase Images (QPI)

    • Histological dataset
    • For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.
      • Modality A: Second Harmonic Generation (SHG)

      • Modality B: Bright-Field (BF)

    The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

    • Radiological data
    • The Radiological dataset is divided into 3 sub-groups by patient IDs: {109, 106, 003, 006}, {108, 105, 007, 001}, {107, 102, 005, 009}. Since the Radiological dataset is non-isotropic (and also of varying resolution), it is resampled using B-spline interpolation to 1 mm3 cubic voxels, taking explicit care to not resample twice; displaced volumes are transformed and resampled in one step.
      • Modality A: T1-weighted MRI

      • Modality B: T2-weighted MRI

    (Run make_rire_patches.py to generate the sub-volumes.)

    Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.

    In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.

    Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

    Metadata

    In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

    • Filename: identifier(ID) of the image pair

    • X1_Ref: x-coordinate of the upper-left corner of reference patch IRef

    • Y1_Ref: y-coordinate of the upper-left corner of reference patch IRef

    • X2_Ref: x-coordinate of the lower-left corner of reference patch IRef

    • Y2_Ref: y-coordinate of the lower-left corner of reference patch IRef

    • X3_Ref: x-coordinate of the lower-right corner of reference patch IRef

    • Y3_Ref: y-coordinate of the lower-right corner of reference patch IRef

    • X4_Ref: x-coordinate of the upper-right corner of reference patch IRef

    • Y4_Ref: y-coordinate of the upper-right corner of reference patch IRef

    • X1_Trans: x-coordinate of the upper-left corner of transformed patch IInit

    • Y1_Trans: y-coordinate of the upper-left corner of transformed patch IInit

    • X2_Trans: x-coordinate of the lower-left corner of transformed patch IInit

    • Y2_Trans: y-coordinate of the lower-left corner of transformed patch IInit

    • X3_Trans: x-coordinate of the lower-right corner of transformed patch IInit

    • Y3_Trans: y-coordinate of the lower-right corner of transformed patch IInit

    • X4_Trans: x-coordinate of the upper-right corner of transformed patch IInit

    • Y4_Trans: y-coordinate of the upper-right corner of transformed patch IInit

    • Displacement: mean Euclidean distance between reference corner points and transformed corner points

    • RelativeDisplacement: the ratio of displacement to the width/height of image patch

    • Tx: randomly generated translation in the x-direction to synthesise the transformed patch IInit

    • Ty: randomly generated translation in the y-direction to synthesise the transformed patch IInit

    • AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch IInit

    • AngleRad: randomly generated rotation in radian to synthesise the transformed patch IInit

    In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:

    • Z1_Ref: z-coordinate of the upper-left corner of reference patch IRef
    • Z2_Ref: z-coordinate of the lower-left corner of reference patch IRef
    • Z3_Ref: z-coordinate of the lower-right corner of reference patch IRef
    • Z4_Ref: z-coordinate of the upper-right corner of reference patch IRef
    • Z1_Trans: z-coordinate of the upper-left corner of transformed patch IInit
    • Z2_Trans: z-coordinate of the lower-left corner of transformed patch IInit
    • Z3_Trans: z-coordinate of the lower-right corner of transformed patch IInit
    • Z4_Trans: z-coordinate of the upper-right corner of transformed patch IInit
    • (...and similarly, coordinates of the 5th-8th corners)
    • Tz: randomly generated translation in z-direction to synthesise the transformed patch IInit
    • AngleDegreeX: randomly generated rotation around X-axis in degrees to synthesise the transformed patch IInit
    • AngleRadX: randomly generated rotation around X-axis in radian to synthesise the transformed patch IInit
    • AngleDegreeY: randomly generated rotation around Y-axis in degrees to synthesise the transformed patch IInit
    • AngleRadY: randomly generated rotation around Y-axis in radian to synthesise the transformed patch IInit
    • AngleDegreeZ: randomly generated rotation around Z-axis in degrees to synthesise the transformed patch IInit
    • AngleRadZ: randomly generated rotation around Z-axis in radian to synthesise the transformed patch IInit

    Naming convention

    • Aerial Data
      •  zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png
      • Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.
      </li>
      <li><strong>Cytological data</strong>
      <ul>
        <li>
        <pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre>
        </li>
        <li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed
      
  2. S

    School Learning Modalities, 2021-2022

    • splitgraph.com
    • healthdata.gov
    • +5more
    Updated Jun 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datahub-hhs-gov (2024). School Learning Modalities, 2021-2022 [Dataset]. https://www.splitgraph.com/datahub-hhs-gov/school-learning-modalities-20212022-aitj-yx37/
    Explore at:
    application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
    Dataset updated
    Jun 28, 2024
    Authors
    datahub-hhs-gov
    Description

    The 2021-2022 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2021-2022 school year and the Fall 2022 semester, from August 2021 – December 2022.

    These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021.

    School learning modality types are defined as follows:

    Data Information

    “BI” in the state column refers to school districts funded by the Bureau of Indian Education.

    Technical Notes

    Data from August 1, 2021 to June 24, 2022 correspond to the 2021-2022 school year. During this time frame, data from the AEI/Return to Learn Tracker and most state dashboards were not available. Inferred modalities with a probability below 0.6 were deemed inconclusive and were omitted. During the Fall 2022 semester, modalities for districts with a school closure reported by Burbio were updated to either “Remote”, if the closure spanned the entire week, or “Hybrid”, if the closure spanned 1-4 days of the week.

    Data from August

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  3. Z

    Multimodal Vision-Audio-Language Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schaumlöffel, Timothy; Roig, Gemma; Choksi, Bhavin (2024). Multimodal Vision-Audio-Language Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10060784
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Goethe University Frankfurt
    Authors
    Schaumlöffel, Timothy; Roig, Gemma; Choksi, Bhavin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation

    pip install pandas pyarrow Example

    import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])

    dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de

  4. S

    HA4M - Human Action Multi-Modal Monitoring in Manufacturing

    • scidb.cn
    • resodate.org
    Updated Jul 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roberto Marani; Laura Romeo; Grazia Cicirelli; Tiziana D'Orazio (2022). HA4M - Human Action Multi-Modal Monitoring in Manufacturing [Dataset]. http://doi.org/10.57760/sciencedb.01872
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2022
    Dataset provided by
    Science Data Bank
    Authors
    Roberto Marani; Laura Romeo; Grazia Cicirelli; Tiziana D'Orazio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OverviewThe HA4M dataset is a collection of multi-modal data relative to actions performed by different subjects in an assembly scenario for manufacturing. It has been collected to provide a good test-bed for developing, validating and testing techniques and methodologies for the recognition of assembly actions. To the best of the authors' knowledge, few vision-based datasets exist in the context of object assembly.The HA4M dataset provides a considerable variety of multi-modal data compared to existing datasets. Six types of simultaneous data are supplied: RGB frames, Depth maps, IR frames, RGB-Depth-Aligned frames, Point Clouds and Skeleton data.These data allow the scientific community to make consistent comparisons among processing approaches or machine learning approaches by using one or more data modalities. Researchers in computer vision, pattern recognition and machine learning can use/reuse the data for different investigations in different application domains such as motion analysis, human-robot cooperation, action recognition, and so on.Dataset detailsThe dataset includes 12 assembly actions performed by 41 subjects for building an Epicyclic Gear Train (EGT).The assembly task involves three phases first, the assembly of Block 1 and Block 2 separately, and then the final setting up of both Blocks to build the EGT. The EGT is made up of a total of 12 components divided into two sets: the first eight components for building Block 1 and the remaining four components for Block 2. Finally, two screws are fixed with an Allen Key to assemble the two blocks and thus obtain the EGT.Acquisition setupThe acquisition experiment took place in two laboratories (one in Italy and one in Spain), where an acquisition area was reserved for the experimental setup. A Microsoft Azure Kinect camera acquires videos during the execution of the assembly task. It is placed in front of the operator and the table where the components are spread over. The camera is place on a tripod at an height h of 1.54 m and a distance of 1.78m. The camera is down-tilted by an angle of 17 degrees.Technical informationThe HA4M dataset contains 217 videos of the assembly task performed by 41 subjects (15 females and 26 males). Their ages ranged from 23 to 60. All the subjects participated voluntarily and were provided with a written description of the experiment. Each subject was asked to execute the task several times and to perform the actions at their own convenience (e.g. with both hands), independently from their dominant hand. The HA4M project is a growing project. So new acquisitions, planned in the next future, will expand the current dataset.ActionsTwelve actions are considered in HA4M. Actions from 1 to 4 are needed to build Block 1, then actions from 5 to 8 for building Block 2 and finally, the actions from 9 to 12 for completing the EGT. Actions are listed below:Pick up/Place CarrierPick up/Place Gear Bearings (x3)Pick up/Place Planet Gears (x3)Pick up/Place Carrier ShaftPick up/Place Sun ShaftPick up/Place Sun GearPick up/Place Sun Gear BearingPick up/Place Ring BearPick up Block 2 and place it on Block 1Pick up/Place CoverPick up/Place Screws (x2)Pick up/Place Allen Key, Turn Screws, Return Allen Key and EGTAnnotationData annotation concerns the labeling of the different actions in the video sequences.The annotation of the actions has been manually done by observing the RGB videos, frame by frame. The start frame of each action is identified as the subject starts to move the arm to the component to be grasped. The end frame, instead, is recorded when the subject releases the component, so the next frame becomes the start frame of the subsequent action.The total number of actions annotated in this study is 4123, including the “don't care” action (ID=0) and the action repetitions in the case of actions 2, 3 and 11.Available codeThe dataset has been acquired using the Multiple Azure Kinect GUI software, available at https://gitlab.com/roberto.marani/multiple-azure-kinect-gui, based on the Azure Kinect Sensor SDK v1.4.1 and Azure Kinect Body Tracking SDK v1.1.2.The software records device data to a Matroska (.mkv) file, containing video tracks, IMU samples, and device calibration. In this work, IMU samples are not considered.The same Multiple Azure Kinect GUI software processes the Matroska file and returns the different types of data provided with our dataset: RGB images, RGB-depth-Aligned (RGB-A) images, Depth images, IR images, Point Cloud and Skeleton data.

  5. Multi-modality medical image dataset for medical image processing in Python...

    • zenodo.org
    zip
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Candace Moore; Candace Moore; Giulia Crocioni; Giulia Crocioni (2024). Multi-modality medical image dataset for medical image processing in Python lesson [Dataset]. http://doi.org/10.5281/zenodo.13305760
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Candace Moore; Candace Moore; Giulia Crocioni; Giulia Crocioni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of medical imaging files for use in the "Medical Image Processing with Python" lesson, developed by the Netherlands eScience Center.

    The dataset includes:

    1. SimpleITK compatible files: MRI T1 and CT scans (training_001_mr_T1.mha, training_001_ct.mha), digital X-ray (digital_xray.dcm in DICOM format), neuroimaging data (A1_grayT1.nrrd, A1_grayT2.nrrd). Data have been downloaded from here.
    2. MRI data: a T2-weighted image (OBJECT_phantom_T2W_TSE_Cor_14_1.nii in NIfTI-1 format). Data have been downloaded from here.
    3. Example images for the machine learning lesson: chest X-rays (rotatechest.png, other_op.png), cardiomegaly example (cardiomegaly_cc0.png).
    4. Additional anonymized data: TBA

    These files represent various medical imaging modalities and formats commonly used in clinical research and practice. They are intended for educational purposes, allowing students to practice image processing techniques, machine learning applications, and statistical analysis of medical images using Python libraries such as scikit-image, pydicom, and SimpleITK.

  6. D

    Replication Data for: When modality and tense meet. The future marker budet...

    • dataverse.azure.uit.no
    • dataverse.no
    • +1more
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elmira Zhamaletdinova; Elmira Zhamaletdinova (2023). Replication Data for: When modality and tense meet. The future marker budet ‘will’ in impersonal constructions with the modal adverb možno ‘be possible’ [Dataset]. http://doi.org/10.18710/MOJBDK
    Explore at:
    text/comma-separated-values(657010), txt(10575), text/comma-separated-values(54088)Available download formats
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    DataverseNO
    Authors
    Elmira Zhamaletdinova; Elmira Zhamaletdinova
    License

    https://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/MOJBDKhttps://dataverse.no/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18710/MOJBDK

    Time period covered
    1826 - 2015
    Area covered
    Russian Federation
    Description

    Dataset description: This is a study of examples of Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will be,’ i.e., možno + budet + INF and možno + INF. The data was collected in 2020-2021 from the old version of the Russian National Corpus (ruscorpora.ru). In the spreadsheet 01DataMoznoBudet, the data merges the results of four searches conducted to extract examples of sentences with the following construction types: možno + budet + INF.PFV, možno + budet + INF.IPFV, možno + INF.PFV and možno + INF.IPFV. The results for each search were downloaded, pseudorandomized, and the first 200 examples were manually annotated, based on the syntactic analyses given in the corpus. The syntactic and morphological categories used in the corpus are explained here: https://ruscorpora.ru/corpus/main. In the spreadsheet 01DataZavtraMoznoBudet, the data merges the results of four searches conducted to extract examples of sentences with the following structure: zavtra + možno + budet + INF.PFV, zavtra + možno + budet + INF.IPFV, zavtra + možno + INF.PFV and zavtra + možno + INF.IPFV. All of the examples (103 sentences) were imported to a spreadsheet and annotated manually, based on the syntactic analyses given in the corpus. The syntactic and morphological categories used in the corpus are explained here: https://ruscorpora.ru/corpus/main. Article abstract: This paper examines Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will be,’ i.e., možno + budet + INF and možno + INF. My contribution can be summarized as follows. First, corpus-based evidence reveals that možno + INF constructions are vastly more frequent than constructions with copula. Second, the meaning of constructions without the future copula is more flexible: while the possibility is typically located in the present, the situation denoted by the infinitive may be located in the present or the future. Third, I show that the možno + INF construction is more ambiguous and can denote present, gnomic or future situations. Fourth, I identify a number of contextual factors that unambiguously locate the situation in the future. I demonstrate that such factors are more frequently used with the future copula, and thus motivate the choice between the two constructions. Finally, I illustrate the interpretations in a straightforward manner by means of schemas of the type used in cognitive linguistics.

  7. MELD Preprocessed

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argish Abhangi (2025). MELD Preprocessed [Dataset]. https://www.kaggle.com/datasets/argish/meld-preprocessed
    Explore at:
    zip(3527202381 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Argish Abhangi
    Description

    The MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.

    Data Sources

    • Audio: Waveforms extracted from the original video files.
    • Video: Video files are processed to sample frames at a target frame rate (default: 2 fps) and to detect faces using a Haar Cascade classifier.
    • Text: Utterances from the dialogue, which are cleaned using custom encoding functions to fix potential byte encoding issues.
    • Emotion Labels: Each sample is associated with an emotion label.

    Preprocessing Pipeline

    The preprocessing script performs several key steps:

    1. Text Cleaning:

      • fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.
      • replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).
    2. Audio Processing:

      • Extracts raw audio waveform from each sample.
      • Computes a Mel-spectrogram using torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).
      • Converts the spectrogram to a logarithmic scale for numerical stability.
    3. Video Processing:

      • Reads video frames at a specified target FPS (default: 2 fps) using OpenCV.
      • For each video, samples frames evenly based on the original video's FPS.
      • Applies Haar Cascade face detection on the frames to extract the first detected face.
      • Resizes the detected face to 224x224 and converts it to RGB. If no face is detected, a default black image (224x224x3) is returned.
    4. Saving Processed Samples:

      • Each sample is saved as a .pt file in a directory structure split by data type (train, dev, and test).
      • The filename is derived from the original video filename (e.g., dia0_utt1.mp4 becomes dia0_utt1.pt).

    Data Format

    Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:

    • utterance (str): The cleaned textual utterance.
    • emotion (str/int): The corresponding emotion label.
    • video_path (str): Original path to the video file from which the sample was extracted.
    • audio (Tensor): Raw audio waveform tensor of shape [channels, time].
    • audio_sample_rate (int): The sampling rate of the audio waveform.
    • audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].
    • face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.

    Directory Structure

    The preprocessed files are organized into splits: preprocessed_data/ ├── train/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... ├── dev/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... └── test/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt └── ...

    Loading and Using the Dataset

    A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:

    Dataset Class

    from torch.utils.data import Dataset
    import os
    import torch
    
    class PreprocessedMELDDataset(Dataset):
      def _init_(self, data_dir):
        """
        Args:
          data_dir (str): Directory where preprocessed .pt files are stored.
        """
        self.data_dir = data_dir
        self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')]
        
      def _len_(self):
        return len(self.files)
      
      def _getitem_(self, idx):
        sample_path = self.files[idx]
        sample = torch.load(sample_path)
        return sample
    

    Custom Collate Function

    def preprocessed_collate_fn(batch):
      """
      Collates a list of sample dictionaries into a single dictionary with keys mapping to lists.
      Modify this function to pad or stack tensor data if needed.
      """
      collated = {}
      collated['utterance'] = [sample['utterance'] for sample in batch]
      collated['emotion'] = [sample['emotion'] for sample in batch]
      collated['video_path'] = [sample['video_path'] for sample in batch]
      collated['audio'] = [sample['audio'] for sample in batch]
      collated['audio_sample_rate'] = batch[0]['audio_sample_rate']
      collated['audio_mel'] = [sample['audio_mel'] for sample in batch]
      collated['face'] = [sample['face'] for sample in batch]
      return collated
    

    Creating DataLoaders

    from torch.utils.data import DataLoader
    
    # Define paths for each split
    train_data_dir = "preprocessed_data/train"
    dev_data_dir = "preproces...
    
  8. Z

    Data from: MLM: A Benchmark Dataset for Multitask Learning with Multiple...

    • data.niaid.nih.gov
    Updated Jun 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Armitage, Jason; Kacupaj, Endri; Tahmasebzadeh, Golsa; Swati (2020). MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3885752
    Explore at:
    Dataset updated
    Jun 12, 2020
    Dataset provided by
    Jožef Stefan Institute, Slovenia
    TIB – Leibniz InformationCenter for Science andTechnology, Germany
    University of Bonn, Germany
    Authors
    Armitage, Jason; Kacupaj, Endri; Tahmasebzadeh, Golsa; Swati
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    We introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities. The dataset is designed for researchers and developers who build applications that perform multiple tasks on data encountered on the web and in digital archives. The second version of MLM provides a geo-representative subset of the data with weighted samples for countries of the European Union. We demonstrate the value of the resource in developing novel applications in the digital humanities with a motivating use case and specify a benchmark set of tasks to retrieve modalities and locate entities in the dataset. Evaluation of baseline multitask and single-task systems on the full and geo-representative versions of MLM demonstrate the challenges of generalizing on diverse data. In addition to the digital humanities, we expect the resource to contribute to research in multimodal representation learning, location estimation, and scene understanding.

    Introduction: Multiple Languages and Modalities comprises data points on 236k human settlements for evaluating and optimizing multitask learning systems. MLM presents a dataset with a high level of diversity in terms of modality and language. For each entity, we have extracted text summaries, images, coordinates, and their respective triple classes. Text summaries are available in three languages (English, French, and German) with each entity having between one and three language entries.

    Human settlements from all continents are provided in the overall dataset (MLM) with 72% located in Europe. Two further versions of the dataset - MLM-irle and MLM-irle-gr - were generated for use in the benchmark evaluation for multitask systems described in the paper (see above). MLM-irle-gr (ie geo-representative) was generated to serve organizations that focus on the European Union by providing a geographically balanced coverage of human settlements in this region. MLM-irle-gr contains data on 24k human settlements across the EU weighted in relation to the population count for each of the 28 countries.

    MLM contains the following fields:

    field-label description

    1. id a unique identifier
    2. label textual label
    3. coordinates longitude, latitude geo-location value
    4. summaries list of textual summaries related to the entity
    5. images list of images related to the entity

    6. classes list of associated triple class

    MLM - Details by Dataset Version:

    Num. of MLM MLM-irle MLM-irle-gr

    Entities 236496 218681 22501 Images 412422 314533 31621 Summaries 497899 462328 47508

    Triple classes 1685 1655 452

    Availability:

    All three versions of MLM listed in the table directly above are available for direct download and use. To support findability and sustainability, the MLM dataset is published as an on-line resource at https://doi.org/10.5281/zenodo.3885753. A separate page with detailed explanations and illustrations is available at http://cleopatra.ijs.si/goal-mlm/ to promote ease-of-use. The project GitHub repository contains the complete source code for the system and the generation script is available at https://github.com/GOALCLEOPATRA/MLM. Documentation adheres to the standards of FAIR Data principles with all relevant metadata specified to the research community and users. It is freely accessible under the Creative Commons Attribution 4.0 International license, which makes it reusable for almost any purpose.

    Updating and Reusability: MLM is supported by a team of researchers from the University of Bonn, the Leibniz Information Center for Science and Technology, and Jožef Stefan Institute. The resource is already in use for individual projects and as a contribution to the project deliverables of the Marie Skłodowska-Curie CLEOPATRA Innovative Training Network. In addition to the steps above that make the resource available to the wider community, the usage of MLM will be promoted to the network of researchers in this project. Use among researchers and practitioners in digital humanities will be promoted by demonstrations and presentations at domain-related events. Activities are planned for the Digital Methods Summer School run by the University of Amsterdam. The range of modalities and languages present in the dataset also extend its application to research on multimodal representation learning, multilingual machine learning, information retrieval, location estimation, and the Semantic Web. MLM will be supported and maintained for three years in the first instance. A second release of the dataset is already scheduled and the generation process outlined above is designed to enable rapid scaling.

  9. HaDR: Dataset for hands instance segmentation

    • kaggle.com
    zip
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ales Vysocky (2023). HaDR: Dataset for hands instance segmentation [Dataset]. https://www.kaggle.com/datasets/alevysock/hadr-dataset-for-hands-instance-segmentation
    Explore at:
    zip(10662295286 bytes)Available download formats
    Dataset updated
    Mar 7, 2023
    Authors
    Ales Vysocky
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.

    S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826

    The HaDR dataset is a multimodal dataset designed for human-robot gesture-based interaction research, consisting of RGB and Depth frames, with binary masks for each hand instance (i1, i2, single class data). The dataset is entirely synthetic, generated using Domain Randomization technique in CoppeliaSim 3D. The dataset can be used to train Deep Learning models to recognize hands using either a single modality (RGB or depth) or both simultaneously. The training-validation split comprises 95K and 22K samples, respectively, with annotations provided in COCO format. The instances are uniformly distributed across the image boundaries. The vision sensor captures depth and color images of the scene, with the depth pixel values scaled into a single channel 8-bit grayscale image in the range [0.2, 1.0] m. The following aspects of the scene were randomly varied during generation of dataset: • Number, colors, textures, scales and types of distractor objects selected from a set of 3D models of general tools and geometric primitives. A special type of distractor – an articulated dummy without hands (for instance-free samples) • Hand gestures (9 options). • Hand models’ positions and orientations. • Texture and surface properties (diffuse, specular and emissive properties) and number (from none to 2) of the object of interest, as well as its background. • Number and locations of directional lights sources (from 1 to 4), in addition to a planar light for ambient illumination. The sample resolution is set to 320×256, encoded in lossless PNG format, and contains only right hand meshes (we suggest using Flip augmentations during training), with a maximum of two instances per sample.

    Test dataset (real camera images): Test dataset containing 706 images was captured using a real RGB-D camera (RealSense L515) in a cluttered and unstructured industrial environment. The dataset comprises various scenarios with diverse lighting conditions, backgrounds, obstacles, number of hands, and different types of work gloves (red, green, white, yellow, no gloves) with varying sleeve lengths. The dataset is assumed to have only one user, and the maximum number of hand instances per sample was limited to two. The dataset was manually labelled, and we provide hand instance segmentation COCO annotations in instances_hands_full.json (separately for train and val) and full arm instance annotations in instances_arms_full.json. The sample resolution was set to 640×480, and depth images were encoded in the same way as those of the synthetic dataset.

    Channel-wise normalization and standardization parameters for datasets

    DatasetMean (R, G, B, D)STD (R, G, B, D)
    Train98.173, 95.456, 93.858, 55.87267.539, 67.194, 67.796, 47.284
    Validation99.321, 97.284, 96.318, 58.18967.814, 67.518, 67.576, 47.186
    Test123.675, 116.28, 103.53, 35.379258.395, 57.12, 57.375, 45.978

    If you use this dataset for your work, please cite the related papers: A. Vysocky, S. Grushko, T. Spurny, R. Pastor and T. Kot, Generating Synthetic Depth Image Dataset for Industrial Applications of Hand Localisation, in IEEE Access, 2022, doi: 10.1109/ACCESS.2022.3206948.

    S. Grushko, A. Vysocký, J. Chlebek, P. Prokop, HaDR: Applying Domain Randomization for Generating Synthetic Multimodal Dataset for Hand Instance Segmentation in Cluttered Industrial Environments. preprint in arXiv, 2023, https://doi.org/10.48550/arXiv.2304.05826

  10. f

    Data Sheet 2_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  11. Z

    Data from: Extended datasets from MM-IMDB and Ads-Parallelity dataset with...

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi (2023). Extended datasets from MM-IMDB and Ads-Parallelity dataset with the features from Google Cloud Vision API [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7050923
    Explore at:
    Dataset updated
    Feb 24, 2023
    Dataset provided by
    Hosei University
    CyberAgent, Inc.
    Authors
    Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is extended datasets from MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] dataset with the features from Google Cloud Vision API. These datasets are stored in jsonl (JSON Lines) format.

    Abstract (from our paper):

    There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM2S2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.

    Dataset (MM-IMDB and Ads-Parallelity):

    We extended two multimodal datasets, namely, MM-IMDB [Arevalo+ ICLRW'17], Ads-Parallelity [Zhang+ BMVC'18] for the empirical experiments. The MM-IMDB dataset contains 25,925 movies with multiple labels (genres). We used the original split provided in the dataset and reported the F1 scores (micro, macro, and samples) of the test set. The Ads-Parallelity dataset contains 670 images and slogans from persuasive advertisements to understand the implicit relationship (parallel and non-parallel) between these two modalities. A binary classification task is used to predict whether the text and image in the same ad convey the same message.

    We transformed the following multimodal information (i.e., visual, textual, and categorical data) into textual tokens and fed these into our proposed model. We used the Google Cloud Vision API for the visual features to obtain the following four pieces of information as tokens: (1) text from the OCR, (2) category labels from the label detection, (3) object tags from the object detection, and (4) the number of faces from the facial detection. We input the labels and object detection results as a sequence in order of confidence, as obtained from the API. We describe the visual, textual, and categorical features of each dataset below.

    MM-IMDB: We used the title and plot of movies as the textual features, and the aforementioned API results based on poster images as visual features.

    Ads-Parallelity: We used the same API-based visual features as in MM-IMDB. Furthermore, we used textual and categorical features consisting of textual inputs of transcriptions and messages, and categorical inputs of natural and text concrete images.

  12. The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases,...

    • zenodo.org
    bin, csv, zip
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux (2024). The MultiCaRe Dataset: A Multimodal Case Report Dataset with Clinical Cases, Labeled Images and Captions from Open Access PMC Articles [Dataset]. http://doi.org/10.5281/zenodo.10079370
    Explore at:
    zip, bin, csvAvailable download formats
    Dataset updated
    Jan 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mauro Nievas Offidani; Mauro Nievas Offidani; Claudio Delrieux; Claudio Delrieux
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains multi-modal data from over 75,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv.

    Almost 100,000 patients and almost 400,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file.

    Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.

    For a detailed insight about the contents of this dataset, please refer to this data article published in Data In Brief.

  13. Z

    Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • data.niaid.nih.gov
    Updated Oct 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yfantidou, Sofia; Karagianni, Christina; Efstathiou, Stefanos; Vakali, Athena; Palotti, Joao; Giakatos, Dimitrios Panteleimon; Marchioro, Thomas; Kazlouski, Andrei; Ferrari, Elena; Girdzijauskas, Šarūnas (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6826682
    Explore at:
    Dataset updated
    Oct 20, 2022
    Dataset provided by
    University of Insubria
    KTH Royal Institute of Technology
    Foundation for Research and Technology Hellas
    Aristotle University of Thessaloniki
    Earkick
    Authors
    Yfantidou, Sofia; Karagianni, Christina; Efstathiou, Stefanos; Vakali, Athena; Palotti, Joao; Giakatos, Dimitrios Panteleimon; Marchioro, Thomas; Kazlouski, Andrei; Ferrari, Elena; Girdzijauskas, Šarūnas
    Description

    LifeSnaps Dataset Documentation

    Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

    The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

    Data Import: Reading CSV

    For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

    Data Import: Setting up a MongoDB (Recommended)

    To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

    To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

    For the Fitbit data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

    For the SEMA data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c sema

    For surveys data, run the following:

    mongorestore --host localhost:27017 -d rais_anonymized -c surveys

    If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

    Data Availability

    The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

    { _id: id (or user_id): type: data: }

    Each document consists of four fields: id (also found as user_id in sema and survey collections), type, and data. The _id field is the MongoDB-defined primary key and can be ignored. The id field refers to a user-specific ID used to uniquely identify each user across all collections. The type field refers to the specific data type within the collection, e.g., steps, heart rate, calories, etc. The data field contains the actual information about the document e.g., steps count for a specific timestamp for the steps type, in the form of an embedded object. The contents of the data object are type-dependent, meaning that the fields within the data object are different between different types of data. As mentioned previously, all times are stored in local time, and user IDs are common across different collections. For more information on the available data types, see the related publication.

    Surveys Encoding

    BREQ2

    Why do you engage in exercise?

        Code
        Text
    
    
        engage[SQ001]
        I exercise because other people say I should
    
    
        engage[SQ002]
        I feel guilty when I don’t exercise
    
    
        engage[SQ003]
        I value the benefits of exercise
    
    
        engage[SQ004]
        I exercise because it’s fun
    
    
        engage[SQ005]
        I don’t see why I should have to exercise
    
    
        engage[SQ006]
        I take part in exercise because my friends/family/partner say I should
    
    
        engage[SQ007]
        I feel ashamed when I miss an exercise session
    
    
        engage[SQ008]
        It’s important to me to exercise regularly
    
    
        engage[SQ009]
        I can’t see why I should bother exercising
    
    
        engage[SQ010]
        I enjoy my exercise sessions
    
    
        engage[SQ011]
        I exercise because others will not be pleased with me if I don’t
    
    
        engage[SQ012]
        I don’t see the point in exercising
    
    
        engage[SQ013]
        I feel like a failure when I haven’t exercised in a while
    
    
        engage[SQ014]
        I think it is important to make the effort to exercise regularly
    
    
        engage[SQ015]
        I find exercise a pleasurable activity
    
    
        engage[SQ016]
        I feel under pressure from my friends/family to exercise
    
    
        engage[SQ017]
        I get restless if I don’t exercise regularly
    
    
        engage[SQ018]
        I get pleasure and satisfaction from participating in exercise
    
    
        engage[SQ019]
        I think exercising is a waste of time
    

    PANAS

    Indicate the extent you have felt this way over the past week

        P1[SQ001]
        Interested
    
    
        P1[SQ002]
        Distressed
    
    
        P1[SQ003]
        Excited
    
    
        P1[SQ004]
        Upset
    
    
        P1[SQ005]
        Strong
    
    
        P1[SQ006]
        Guilty
    
    
        P1[SQ007]
        Scared
    
    
        P1[SQ008]
        Hostile
    
    
        P1[SQ009]
        Enthusiastic
    
    
        P1[SQ010]
        Proud
    
    
        P1[SQ011]
        Irritable
    
    
        P1[SQ012]
        Alert
    
    
        P1[SQ013]
        Ashamed
    
    
        P1[SQ014]
        Inspired
    
    
        P1[SQ015]
        Nervous
    
    
        P1[SQ016]
        Determined
    
    
        P1[SQ017]
        Attentive
    
    
        P1[SQ018]
        Jittery
    
    
        P1[SQ019]
        Active
    
    
        P1[SQ020]
        Afraid
    

    Personality

    How Accurately Can You Describe Yourself?

        Code
        Text
    
    
        ipip[SQ001]
        Am the life of the party.
    
    
        ipip[SQ002]
        Feel little concern for others.
    
    
        ipip[SQ003]
        Am always prepared.
    
    
        ipip[SQ004]
        Get stressed out easily.
    
    
        ipip[SQ005]
        Have a rich vocabulary.
    
    
        ipip[SQ006]
        Don't talk a lot.
    
    
        ipip[SQ007]
        Am interested in people.
    
    
        ipip[SQ008]
        Leave my belongings around.
    
    
        ipip[SQ009]
        Am relaxed most of the time.
    
    
        ipip[SQ010]
        Have difficulty understanding abstract ideas.
    
    
        ipip[SQ011]
        Feel comfortable around people.
    
    
        ipip[SQ012]
        Insult people.
    
    
        ipip[SQ013]
        Pay attention to details.
    
    
        ipip[SQ014]
        Worry about things.
    
    
        ipip[SQ015]
        Have a vivid imagination.
    
    
        ipip[SQ016]
        Keep in the background.
    
    
        ipip[SQ017]
        Sympathize with others' feelings.
    
    
        ipip[SQ018]
        Make a mess of things.
    
    
        ipip[SQ019]
        Seldom feel blue.
    
    
        ipip[SQ020]
        Am not interested in abstract ideas.
    
    
        ipip[SQ021]
        Start conversations.
    
    
        ipip[SQ022]
        Am not interested in other people's problems.
    
    
        ipip[SQ023]
        Get chores done right away.
    
    
        ipip[SQ024]
        Am easily disturbed.
    
    
        ipip[SQ025]
        Have excellent ideas.
    
    
        ipip[SQ026]
        Have little to say.
    
    
        ipip[SQ027]
        Have a soft heart.
    
    
        ipip[SQ028]
        Often forget to put things back in their proper place.
    
    
        ipip[SQ029]
        Get upset easily.
    
    
        ipip[SQ030]
        Do not have a good imagination.
    
    
        ipip[SQ031]
        Talk to a lot of different people at parties.
    
    
        ipip[SQ032]
        Am not really interested in others.
    
    
        ipip[SQ033]
        Like order.
    
    
        ipip[SQ034]
        Change my mood a lot.
    
    
        ipip[SQ035]
        Am quick to understand things.
    
    
        ipip[SQ036]
        Don't like to draw attention to myself.
    
    
        ipip[SQ037]
        Take time out for others.
    
    
        ipip[SQ038]
        Shirk my duties.
    
    
        ipip[SQ039]
        Have frequent mood swings.
    
    
        ipip[SQ040]
        Use difficult words.
    
    
        ipip[SQ041]
        Don't mind being the centre of attention.
    
    
        ipip[SQ042]
        Feel others' emotions.
    
    
        ipip[SQ043]
        Follow a schedule.
    
    
        ipip[SQ044]
        Get irritated easily.
    
    
        ipip[SQ045]
        Spend time reflecting on things.
    
    
        ipip[SQ046]
        Am quiet around strangers.
    
    
        ipip[SQ047]
        Make people feel at ease.
    
    
        ipip[SQ048]
        Am exacting in my work.
    
    
        ipip[SQ049]
        Often feel blue.
    
    
        ipip[SQ050]
        Am full of ideas.
    

    STAI

    Indicate how you feel right now

        Code
        Text
    
    
        STAI[SQ001]
        I feel calm
    
    
        STAI[SQ002]
        I feel secure
    
    
        STAI[SQ003]
        I am tense
    
    
        STAI[SQ004]
        I feel strained
    
    
        STAI[SQ005]
        I feel at ease
    
    
        STAI[SQ006]
        I feel upset
    
    
        STAI[SQ007]
        I am presently worrying over possible misfortunes
    
    
        STAI[SQ008]
        I feel satisfied
    
    
        STAI[SQ009]
        I feel frightened
    
    
        STAI[SQ010]
        I feel comfortable
    
    
        STAI[SQ011]
        I feel self-confident
    
    
        STAI[SQ012]
        I feel nervous
    
    
        STAI[SQ013]
        I am jittery
    
    
        STAI[SQ014]
        I feel indecisive
    
    
        STAI[SQ015]
        I am relaxed
    
    
        STAI[SQ016]
        I feel content
    
    
        STAI[SQ017]
        I am worried
    
    
        STAI[SQ018]
        I feel confused
    
    
        STAI[SQ019]
        I feel steady
    
    
        STAI[SQ020]
        I feel pleasant
    

    TTM

    Do you engage in regular physical activity according to the definition above? How frequently did each event or experience occur in the past month?

        Code
        Text
    
    
        processes[SQ002]
        I read articles to learn more about physical
    
  14. d

    Replication Data for: The Choice of Aspect in the Russian Modal Construction...

    • search.dataone.org
    • dataverse.no
    Updated Jan 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernasconi, Beatrice (2024). Replication Data for: The Choice of Aspect in the Russian Modal Construction with prixodit'sja/prijtis' [Dataset]. http://doi.org/10.18710/KR5RRK
    Explore at:
    Dataset updated
    Jan 5, 2024
    Dataset provided by
    DataverseNO
    Authors
    Bernasconi, Beatrice
    Time period covered
    Jan 1, 1950 - Jan 1, 2020
    Description

    This dataset includes all the data files that were used for the studies in my Master Thesis: "The Choice of Aspect in the Russian Modal Construction with prixodit'sja/prijtis'". The data files are numbered so that they are shown in the same order as they are presented in the thesis. They include the database and the code used for the statistical analysis. Their contents are described in the ReadMe files. The core of the work is a quantitative and empirical study on the choice of aspect by Russian native speakers in the modal construction prixodit’sja/prijtis’ + inf. The hypothesis is that in the modal construction prixodit’sja/prijtis’ + inf the aspect of the infinitive is not fully determined by grammatical context but, to some extent, open to construal. A preliminary analysis was carried out on data gathered from the Russian National Corpus (www.ruscorpora.ru). Four hundred and forty-seven examples with the verb prijtis' were annotated manually for several factors and a statistical test (CART) was run. Results demonstrated that no grammatical factor plays a big role in the use of one aspect rather than the other. Data for this study can be consulted in the files from 01 to 03 and include a ReadMe file, the database in .csv format and the code used for the statistical test. An experiment with native speakers was then carried out. A hundred and ten native speakers of Russian were surveyed and asked to evaluate the acceptability of the infinitive in examples with prixodit’sja/prijtis’ delat’/sdelat’ šag/vid/vybor. The survey presented seventeen examples from the Russian National Corpus that were submitted two times: the first time with the same aspect as in the original version, the second time with the other aspect. Participants had to evaluate each case by choosing among “Impossible”, “Acceptable” and “Excellent” ratings. They were also allowed to give their opinion about the difference between aspects in each example. A Logistic Regression with Mixed Effects was run on the answers. Data for this study can be consulted in the files from 04 to 010 and include a ReadMe file, the text and the answers of the questionnaire, the database in .csv, .txt and pdf formats and the code used for the statistical test. Results showed that prijtis’ often admits both aspects in the infinitive, while prixodit’sja is more restrictive and prefers imperfective. Overall, “Acceptable” and “Excellent” responses were higher than “Impossible” responses for both aspects, even when the aspect evaluated didn’t match with the original. Personal opinions showed that the choice of aspect often depends on the meaning the speaker wants to convey. Only in very few cases the grammatical context was considered to be a constraint on the choice.

  15. Human Bone Fractures (Image Dataset)

    • kaggle.com
    zip
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Essa (2025). Human Bone Fractures (Image Dataset) [Dataset]. https://www.kaggle.com/datasets/jockeroika/human-bone-fractures-image-dataset
    Explore at:
    zip(39969682 bytes)Available download formats
    Dataset updated
    Aug 9, 2025
    Authors
    Omar Essa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Human Bone Fractures Multi-modal Image Dataset (HBFMID) is a comprehensive medical imaging dataset designed for research and development in bone fracture detection, classification, and localization. Published on December 2, 2024 by Shahnaj Parvin, the dataset integrates both X-ray and MRI modalities, covering a wide range of human skeletal regions.

    Dataset Composition Total Raw Images: 641

    X-ray Images: 510

    MRI Images: 131

    Anatomical Regions Covered: Elbow, finger, forearm, humerus, shoulder, femur, shinbone, knee, hipbone, wrist, spinal cord, and other healthy bone samples.

    Data Splits Training Set: 449 images → augmented to 1,347 images (×3 augmentation factor)

    Validation Set: 128 images

    Test Set: 64 images

    Total Final Dataset Size: 1,539 images

    Pre-processing Steps All images underwent the following pre-processing:

    Auto-orientation (correcting rotation/flip metadata)

    Resizing to 640 × 640 pixels

    Contrast adjustments to enhance bone visibility

    Data Augmentation Techniques To improve model generalization, several augmentation methods were applied:

    Flip: Horizontal & Vertical

    Rotation: Between −5° and +5°

    Shear: ±2° (Horizontal & Vertical)

    Zooming: 2%

    Saturation Adjustment: ±5%

    Brightness Adjustment: ±10%

    Scaling, Shifting, Shearing, Cropping, Random Rotation

  16. octmnist_shaowen

    • kaggle.com
    zip
    Updated Dec 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaowen Huang (2022). octmnist_shaowen [Dataset]. https://www.kaggle.com/datasets/shaowenhuang/octmnist-shaowen
    Explore at:
    zip(54954840 bytes)Available download formats
    Dataset updated
    Dec 21, 2022
    Authors
    Shaowen Huang
    Description

    MedMNIST2D Data Modality Tasks (# Classes/Labels) # Samples # Training / Validation / Test OCTMNIST Retinal OCT Multi-Class (4) 109,309 97,477 / 10,832 / 1,000

  17. h

    rocov2-modality-x-ray

    • huggingface.co
    Updated Sep 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wafaa Abdallah Yassin Fraih (2025). rocov2-modality-x-ray [Dataset]. https://huggingface.co/datasets/WafaaFraih/rocov2-modality-x-ray
    Explore at:
    Dataset updated
    Sep 1, 2025
    Authors
    Wafaa Abdallah Yassin Fraih
    Description

    ROCOv2 X-ray Dataset

    This dataset contains X-ray imaging data from the ROCOv2 radiology dataset.

      Dataset Structure
    

    caption: Medical description text modality: Imaging modality (X-ray) modality_id: Numerical modality ID caption_length: Number of words in caption length_category: Short/medium/long categorization original_index: Index from original dataset

      Splits
    

    train: 154 samples val: 19 samples
    test: 20 samples

      Usage
    

    from datasets import… See the full description on the dataset page: https://huggingface.co/datasets/WafaaFraih/rocov2-modality-x-ray.

  18. MM5: Multimodal Image Dataset

    • figshare.com
    zip
    Updated Aug 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Brenner; Napoleon Reyes; Teo Susnjak; Andre Barczak (2025). MM5: Multimodal Image Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28722164.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 3, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Martin Brenner; Napoleon Reyes; Teo Susnjak; Andre Barczak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MM5 dataset is a comprehensive multimodal dataset capturing RGB, Depth, Thermal (LWIR), Ultraviolet (UV), and Near-Infrared (NIR) images. It is designed for advanced multimodal research, providing diverse modalities, annotated data, and carefully calibrated and aligned images.For additional scripts, documentation, and usage examples, please visit our GitHub repository: https://github.com/martinbrennernz/MM5-Dataset

  19. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • data.europa.eu
    • zenodo.org
    unknown
    Updated Jul 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6832242?locale=fr
    Explore at:
    unknown(642961582)Available download formats
    Dataset updated
    Jul 12, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction. The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication. Data Import: Reading CSV For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command. Data Import: Setting up a MongoDB (Recommended) To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database. To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here. For the Fitbit data, run the following: mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

  20. h

    ArchCAD

    • huggingface.co
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    luo (2025). ArchCAD [Dataset]. https://huggingface.co/datasets/jackluoluo/ArchCAD
    Explore at:
    Dataset updated
    Oct 19, 2025
    Authors
    luo
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    🏗️ ArchCAD

    🇺🇸 English | 🇨🇳 中文说明

    A Multimodal CAD Dataset for Vectorized Drawing Understanding

    40k Samples · 5 Strictly Aligned Modalities · Foundational Data for AI Understanding of Engineering Drawings

      📑 Table of Contents
    

    What is ArchCAD? Key Features Dataset Structure Data Modalities Annotations

    Baseline Model: DPSS Potential Applications Citation

      📘 What is ArchCAD?
    

    AI systems have long struggled to interpret and utilize CAD… See the full description on the dataset page: https://huggingface.co/datasets/jackluoluo/ArchCAD.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. http://doi.org/10.5281/zenodo.5557568
Organization logo

Datasets for Evaluation of Multimodal Image Registration

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Oct 11, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Description

  • Aerial data
  • The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.
    • Modality A: Near-Infrared (NIR)

    • Modality B: three colour channels (in B-G-R order)

  • Cytological data
  • The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.
    • Modality A: Fluorescence Images

    • Modality B: Quantitative Phase Images (QPI)

  • Histological dataset
  • For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.
    • Modality A: Second Harmonic Generation (SHG)

    • Modality B: Bright-Field (BF)

The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

  • Radiological data
  • The Radiological dataset is divided into 3 sub-groups by patient IDs: {109, 106, 003, 006}, {108, 105, 007, 001}, {107, 102, 005, 009}. Since the Radiological dataset is non-isotropic (and also of varying resolution), it is resampled using B-spline interpolation to 1 mm3 cubic voxels, taking explicit care to not resample twice; displaced volumes are transformed and resampled in one step.
    • Modality A: T1-weighted MRI

    • Modality B: T2-weighted MRI

(Run make_rire_patches.py to generate the sub-volumes.)

Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.

In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.

Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

Metadata

In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

  • Filename: identifier(ID) of the image pair

  • X1_Ref: x-coordinate of the upper-left corner of reference patch IRef

  • Y1_Ref: y-coordinate of the upper-left corner of reference patch IRef

  • X2_Ref: x-coordinate of the lower-left corner of reference patch IRef

  • Y2_Ref: y-coordinate of the lower-left corner of reference patch IRef

  • X3_Ref: x-coordinate of the lower-right corner of reference patch IRef

  • Y3_Ref: y-coordinate of the lower-right corner of reference patch IRef

  • X4_Ref: x-coordinate of the upper-right corner of reference patch IRef

  • Y4_Ref: y-coordinate of the upper-right corner of reference patch IRef

  • X1_Trans: x-coordinate of the upper-left corner of transformed patch IInit

  • Y1_Trans: y-coordinate of the upper-left corner of transformed patch IInit

  • X2_Trans: x-coordinate of the lower-left corner of transformed patch IInit

  • Y2_Trans: y-coordinate of the lower-left corner of transformed patch IInit

  • X3_Trans: x-coordinate of the lower-right corner of transformed patch IInit

  • Y3_Trans: y-coordinate of the lower-right corner of transformed patch IInit

  • X4_Trans: x-coordinate of the upper-right corner of transformed patch IInit

  • Y4_Trans: y-coordinate of the upper-right corner of transformed patch IInit

  • Displacement: mean Euclidean distance between reference corner points and transformed corner points

  • RelativeDisplacement: the ratio of displacement to the width/height of image patch

  • Tx: randomly generated translation in the x-direction to synthesise the transformed patch IInit

  • Ty: randomly generated translation in the y-direction to synthesise the transformed patch IInit

  • AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch IInit

  • AngleRad: randomly generated rotation in radian to synthesise the transformed patch IInit

In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:

  • Z1_Ref: z-coordinate of the upper-left corner of reference patch IRef
  • Z2_Ref: z-coordinate of the lower-left corner of reference patch IRef
  • Z3_Ref: z-coordinate of the lower-right corner of reference patch IRef
  • Z4_Ref: z-coordinate of the upper-right corner of reference patch IRef
  • Z1_Trans: z-coordinate of the upper-left corner of transformed patch IInit
  • Z2_Trans: z-coordinate of the lower-left corner of transformed patch IInit
  • Z3_Trans: z-coordinate of the lower-right corner of transformed patch IInit
  • Z4_Trans: z-coordinate of the upper-right corner of transformed patch IInit
  • (...and similarly, coordinates of the 5th-8th corners)
  • Tz: randomly generated translation in z-direction to synthesise the transformed patch IInit
  • AngleDegreeX: randomly generated rotation around X-axis in degrees to synthesise the transformed patch IInit
  • AngleRadX: randomly generated rotation around X-axis in radian to synthesise the transformed patch IInit
  • AngleDegreeY: randomly generated rotation around Y-axis in degrees to synthesise the transformed patch IInit
  • AngleRadY: randomly generated rotation around Y-axis in radian to synthesise the transformed patch IInit
  • AngleDegreeZ: randomly generated rotation around Z-axis in degrees to synthesise the transformed patch IInit
  • AngleRadZ: randomly generated rotation around Z-axis in radian to synthesise the transformed patch IInit

Naming convention

  • Aerial Data
    •  zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png
    • Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.
    </li>
    <li><strong>Cytological data</strong>
    <ul>
      <li>
      <pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre>
      </li>
      <li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed
    
Search
Clear search
Close search
Google apps
Main menu