100+ datasets found
  1. u

    Table of three selected data events: Assembling a multimodal analysis

    • figshare.unimelb.edu.au
    pdf
    Updated May 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annamaria Neag; SARAH HEALY (2022). Table of three selected data events: Assembling a multimodal analysis [Dataset]. http://doi.org/10.26188/6285b8ec5db6b
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 19, 2022
    Dataset provided by
    The University of Melbourne
    Authors
    Annamaria Neag; SARAH HEALY
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Table of three selected data events showing initial analysis of multimodal elements that constituted the data, which were the:

    Metadata, Social actors, Visual images (e.g., photo anaysis), Linguistic expressions of sentiment, Non-linguistic reactions, and 6) Broader social-economic-political relations

  2. Text-audio pairs (4 of 4)

    • kaggle.com
    zip
    Updated Aug 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorvan (2024). Text-audio pairs (4 of 4) [Dataset]. https://www.kaggle.com/jorvan/text-audio-pairs-4-of-4
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 14, 2024
    Authors
    Jorvan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is the fourth of the four datasets that we have created, for audio-text training tasks. These collect pairs of texts and audios, based on the audio-image pairs from our datasets [1, 2, 3]. These are only intended for research purposes.

    For the conversion, .csv tables were created, where audio values were separated in 16,000 columns and images were transformed into texts using the public model BLIP [4]. The original images are also preserved for future reference.

    To allow other researchers a quick evaluation of the potential usefulness of our datasets for their purposes, we have made available a public page where anyone can check 60 random samples that we extracted from all of our data [5].

    [1] Jorge E. León. Image-audio pairs (1 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-1-of-3. [2] Jorge E. León. Image-audio pairs (2 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-2-of-3. [3] Jorge E. León. Image-audio pairs (3 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-3-of-3. [4] Junnan Li et al. “BLIP: Bootstrapping Language-Image Pre-training for Unified VisionLanguage Understanding and Generation”. En: ArXiv 2201.12086 (2022). [5] Jorge E. León. AVT Multimodal Dataset. 2024. url: https://jorvan758.github.io/AVT-Multimodal-Dataset/.

  3. h

    TableBench

    • huggingface.co
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Multilingual-Multimodal-NLP (2025). TableBench [Dataset]. https://huggingface.co/datasets/Multilingual-Multimodal-NLP/TableBench
    Explore at:
    Dataset updated
    Mar 28, 2025
    Dataset authored and provided by
    Multilingual-Multimodal-NLP
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for TableBench

    📚 Paper

    🏆 Leaderboard

    💻 Code

      Dataset Summary
    

    TableBench is a comprehensive and complex benchmark designed to evaluate Table Question Answering (TableQA) capabilities, aligning closely with the "Reasoning Complexity of Questions" dimension in real-world Table QA scenarios. It covers 18 question categories across 4 major ategories—including… See the full description on the dataset page: https://huggingface.co/datasets/Multilingual-Multimodal-NLP/TableBench.

  4. Multimodal Recommendation System Datasets

    • kaggle.com
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ignacio Avas (2023). Multimodal Recommendation System Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/6338676
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ignacio Avas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quick start

    To read any dataset you can use the following code

    >>> import numpy as np
    >>> embed_image = np.load('embed_image.npy')
    >>> embed_image.shape
    (33962, 768)
    >>> embed_text = np.load('embed_text.npy')
    >>> embed_text.shape
    (33962, 768)
    >>> import pandas as pd
    >>> items = pd.read_csv('items.txt')
    >>> m = len(items)
    >>> print(f'{m} items in dataset')
    33962
    >>> users = pd.read_csv('users.txt')
    >>> n = len(users)
    >>> print(f'{n} users in dataset')
    14790
    >>> train = pd.read_csv('train.txt')
    >>> train
         user  item
    0    13444 23557
    1    13444 33739
    ...    ...  ...
    317109 13506 29993
    317110 13506 13931
    >>> from scipy.sparse import csr_matrix
    >>> train_matrix = csr_matrix((np.ones(len(train)), (train.user, train.item)), shape=(n,m))
    

    Folders

    This dataset contains six datasets. Each dataset is duplicated with seven combinations of different Image and Text encoders, so you should see 42 folders.

    Each folder is the name of the dataset and the encoder used for the visual and textual parts. For example: bookcrossing-vit_bert.

    The datasets are: - Clothing, Shoes and Jewelry (Amazon) - Home and Kitchen (Amazon) - Musical Instruments (Amazon) - Movies and TV (Amazon) - Book-Crossing - Movielens 25M

    And the encoders are: - CLIP (Image and Text) (*-clip_clip). This is the main one used in the experiments. - ViT and BERT (*-vit_bert) - CLIP (only visual data) *-clip_none - ViT only *-vit_none - BERT only *-none_bert - CLIP (text only) *-clip_none - No textual or visual information *-none_none

    Files per folder

    For each dataset, we have the following files, considering we have M items and N users, textual embeddings with D (like 1024) dimensions, and Visual with E dimensions (like 768) - embed_image.npy A NumPy array of MxE elements. - embed_text.npy A NumPy array of MXD elements. - items.csv A CSV with the Item ID in the original dataset (like the Amazon ASIN, the Movie ID, etc.) and the item number, an integer from 0 to M-1 - users.csv A CSV with the User ID in the original dataset (like the Amazon Reviewer Id) and the item number, an integer from 0 to N-1 - train.txt, validation.txt and test.txt are CSV files with the portions of the reviews for train validation and test. It has the item the user liked or reviewed positively. Each row has a positive user item.

    We consider a review "positive" if the rating is four or more (or 8 or more for Book-crossing).

    The vector is zeroed out if an Item does not have an image or text.

    Dataset stats

    DatasetUsersItemRatingsDensity
    Clothing & Shoes & Jewelry23318384931789440.020%
    Home & Kitchen5968576451358390.040%
    Movies & TV21974239582161100.041%
    Musical Instruments1442929040939230.022%
    Book-crossing14790339625196130.103%
    Movielens 25M16254159047250000950.260%

    Modifications from the original source

    Only a tiny fraction of the dataset was taken for the Amazon Datasets by considering reviews in a specific date range.

    For the Bookcrossing dataset, only items with images were considered.

    There are various other minor tweaks on how to obtain images and texts. The repo https://github.com/igui/MultimodalRecomAnalysis has the Notebook and scripts to reproduce the dataset extraction from scratch.

  5. Steam Dataset 2025: Multi-Modal Gaming Analytics

    • kaggle.com
    zip
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CrainBramp (2025). Steam Dataset 2025: Multi-Modal Gaming Analytics [Dataset]. https://www.kaggle.com/datasets/crainbramp/steam-dataset-2025-multi-modal-gaming-analytics
    Explore at:
    zip(12478964226 bytes)Available download formats
    Dataset updated
    Oct 7, 2025
    Authors
    CrainBramp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Steam Dataset 2025: Multi-Modal Gaming Analytics Platform

    The first multi-modal Steam dataset with semantic search capabilities. 239,664 applications collected from official Steam Web APIs with PostgreSQL database architecture, vector embeddings for content discovery, and comprehensive review analytics.

    Made by a lifelong gamer for the gamer in all of us. Enjoy!🎮

    GitHub Repository https://github.com/vintagedon/steam-dataset-2025

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4b7eb73ac0f2c3cc9f0d57f37321b38f%2FScreenshot%202025-10-18%20180450.png?generation=1760825194507387&alt=media" alt=""> 1024-dimensional game embeddings projected to 2D via UMAP reveal natural genre clustering in semantic space

    What Makes This Different

    Unlike traditional flat-file Steam datasets, this is built as an analytically-native database optimized for advanced data science workflows:

    ☑️ Semantic Search Ready - 1024-dimensional BGE-M3 embeddings enable content-based game discovery beyond keyword matching

    ☑️ Multi-Modal Architecture - PostgreSQL + JSONB + pgvector in unified database structure

    ☑️ Production Scale - 239K applications vs typical 6K-27K in existing datasets

    ☑️ Complete Review Corpus - 1,048,148 user reviews with sentiment and metadata

    ☑️ 28-Year Coverage - Platform evolution from 1997-2025

    ☑️ Publisher Networks - Developer and publisher relationship data for graph analysis

    ☑️ Complete Methodology & Infrastructure - Full work logs document every technical decision and challenge encountered, while my API collection scripts, database schemas, and processing pipelines enable you to update the dataset, fork it for customized analysis, learn from real-world data engineering workflows, or critique and improve the methodology

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F649e9f7f46c6ce213101d0948c89e8ac%2F4_price_distribution_by_top_10_genres.png?generation=1760824835918620&alt=media" alt=""> Market segmentation and pricing strategy analysis across top 10 genres

    What's Included

    Core Data (CSV Exports): - 239,664 Steam applications with complete metadata - 1,048,148 user reviews with scores and statistics - 13 normalized relational tables for pandas/SQL workflows - Genre classifications, pricing history, platform support - Hardware requirements (min/recommended specs) - Developer and publisher portfolios

    Advanced Features (PostgreSQL): - Full database dump with optimized indexes - JSONB storage preserving complete API responses - Materialized columns for sub-second query performance - Vector embeddings table (pgvector-ready)

    Documentation: - Complete data dictionary with field specifications - Database schema documentation - Collection methodology and validation reports

    Example Analysis: Published Notebooks (v1.0)

    Three comprehensive analysis notebooks demonstrate dataset capabilities. All notebooks render directly on GitHub with full visualizations and output:

    📊 Platform Evolution & Market Landscape

    View on GitHub | PDF Export
    28 years of Steam's growth, genre evolution, and pricing strategies.

    🔍 Semantic Game Discovery

    View on GitHub | PDF Export
    Content-based recommendations using vector embeddings across genre boundaries.

    🎯 The Semantic Fingerprint

    View on GitHub | PDF Export
    Genre prediction from game descriptions - demonstrates text analysis capabilities.

    Notebooks render with full output on GitHub. Kaggle-native versions planned for v1.1 release. CSV data exports included in dataset for immediate analysis.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4079e43559d0068af00a48e2c31f0f1d%2FScreenshot%202025-10-18%20180214.png?generation=1760824950649726&alt=media" alt=""> *Steam platfor...

  6. Datasets for Evaluation of Multimodal Image Registration

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. http://doi.org/10.5281/zenodo.5557568
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    • Aerial data
    • The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.
      • Modality A: Near-Infrared (NIR)

      • Modality B: three colour channels (in B-G-R order)

    • Cytological data
    • The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.
      • Modality A: Fluorescence Images

      • Modality B: Quantitative Phase Images (QPI)

    • Histological dataset
    • For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.
      • Modality A: Second Harmonic Generation (SHG)

      • Modality B: Bright-Field (BF)

    The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

    • Radiological data
    • The Radiological dataset is divided into 3 sub-groups by patient IDs: {109, 106, 003, 006}, {108, 105, 007, 001}, {107, 102, 005, 009}. Since the Radiological dataset is non-isotropic (and also of varying resolution), it is resampled using B-spline interpolation to 1 mm3 cubic voxels, taking explicit care to not resample twice; displaced volumes are transformed and resampled in one step.
      • Modality A: T1-weighted MRI

      • Modality B: T2-weighted MRI

    (Run make_rire_patches.py to generate the sub-volumes.)

    Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.

    In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.

    Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

    Metadata

    In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

    • Filename: identifier(ID) of the image pair

    • X1_Ref: x-coordinate of the upper-left corner of reference patch IRef

    • Y1_Ref: y-coordinate of the upper-left corner of reference patch IRef

    • X2_Ref: x-coordinate of the lower-left corner of reference patch IRef

    • Y2_Ref: y-coordinate of the lower-left corner of reference patch IRef

    • X3_Ref: x-coordinate of the lower-right corner of reference patch IRef

    • Y3_Ref: y-coordinate of the lower-right corner of reference patch IRef

    • X4_Ref: x-coordinate of the upper-right corner of reference patch IRef

    • Y4_Ref: y-coordinate of the upper-right corner of reference patch IRef

    • X1_Trans: x-coordinate of the upper-left corner of transformed patch IInit

    • Y1_Trans: y-coordinate of the upper-left corner of transformed patch IInit

    • X2_Trans: x-coordinate of the lower-left corner of transformed patch IInit

    • Y2_Trans: y-coordinate of the lower-left corner of transformed patch IInit

    • X3_Trans: x-coordinate of the lower-right corner of transformed patch IInit

    • Y3_Trans: y-coordinate of the lower-right corner of transformed patch IInit

    • X4_Trans: x-coordinate of the upper-right corner of transformed patch IInit

    • Y4_Trans: y-coordinate of the upper-right corner of transformed patch IInit

    • Displacement: mean Euclidean distance between reference corner points and transformed corner points

    • RelativeDisplacement: the ratio of displacement to the width/height of image patch

    • Tx: randomly generated translation in the x-direction to synthesise the transformed patch IInit

    • Ty: randomly generated translation in the y-direction to synthesise the transformed patch IInit

    • AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch IInit

    • AngleRad: randomly generated rotation in radian to synthesise the transformed patch IInit

    In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:

    • Z1_Ref: z-coordinate of the upper-left corner of reference patch IRef
    • Z2_Ref: z-coordinate of the lower-left corner of reference patch IRef
    • Z3_Ref: z-coordinate of the lower-right corner of reference patch IRef
    • Z4_Ref: z-coordinate of the upper-right corner of reference patch IRef
    • Z1_Trans: z-coordinate of the upper-left corner of transformed patch IInit
    • Z2_Trans: z-coordinate of the lower-left corner of transformed patch IInit
    • Z3_Trans: z-coordinate of the lower-right corner of transformed patch IInit
    • Z4_Trans: z-coordinate of the upper-right corner of transformed patch IInit
    • (...and similarly, coordinates of the 5th-8th corners)
    • Tz: randomly generated translation in z-direction to synthesise the transformed patch IInit
    • AngleDegreeX: randomly generated rotation around X-axis in degrees to synthesise the transformed patch IInit
    • AngleRadX: randomly generated rotation around X-axis in radian to synthesise the transformed patch IInit
    • AngleDegreeY: randomly generated rotation around Y-axis in degrees to synthesise the transformed patch IInit
    • AngleRadY: randomly generated rotation around Y-axis in radian to synthesise the transformed patch IInit
    • AngleDegreeZ: randomly generated rotation around Z-axis in degrees to synthesise the transformed patch IInit
    • AngleRadZ: randomly generated rotation around Z-axis in radian to synthesise the transformed patch IInit

    Naming convention

    • Aerial Data
      •  zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png
      • Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.
      </li>
      <li><strong>Cytological data</strong>
      <ul>
        <li>
        <pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre>
        </li>
        <li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed
      
  7. f

    Table 1_Machine learning prediction of anxiety symptoms in social anxiety...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pack, Seung Pil; Hur, Ji-Won; Jung, Dooyoung; Cho, Chul-Hyun; Park, Jin-Hyun; Lee, Hwamin; Lee, Heon-Jeong; Shin, Yu-Bin (2025). Table 1_Machine learning prediction of anxiety symptoms in social anxiety disorder: utilizing multimodal data from virtual reality sessions.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001283930
    Explore at:
    Dataset updated
    Jan 7, 2025
    Authors
    Pack, Seung Pil; Hur, Ji-Won; Jung, Dooyoung; Cho, Chul-Hyun; Park, Jin-Hyun; Lee, Hwamin; Lee, Heon-Jeong; Shin, Yu-Bin
    Description

    IntroductionMachine learning (ML) is an effective tool for predicting mental states and is a key technology in digital psychiatry. This study aimed to develop ML algorithms to predict the upper tertile group of various anxiety symptoms based on multimodal data from virtual reality (VR) therapy sessions for social anxiety disorder (SAD) patients and to evaluate their predictive performance across each data type.MethodsThis study included 32 SAD-diagnosed individuals, and finalized a dataset of 132 samples from 25 participants. It utilized multimodal (physiological and acoustic) data from VR sessions to simulate social anxiety scenarios. This study employed extended Geneva minimalistic acoustic parameter set for acoustic feature extraction and extracted statistical attributes from time series-based physiological responses. We developed ML models that predict the upper tertile group for various anxiety symptoms in SAD using Random Forest, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) models. The best parameters were explored through grid search or random search, and the models were validated using stratified cross-validation and leave-one-out cross-validation.ResultsThe CatBoost, using multimodal features, exhibited high performance, particularly for the Social Phobia Scale with an area under the receiver operating characteristics curve (AUROC) of 0.852. It also showed strong performance in predicting cognitive symptoms, with the highest AUROC of 0.866 for the Post-Event Rumination Scale. For generalized anxiety, the LightGBM’s prediction for the State-Trait Anxiety Inventory-trait led to an AUROC of 0.819. In the same analysis, models using only physiological features had AUROCs of 0.626, 0.744, and 0.671, whereas models using only acoustic features had AUROCs of 0.788, 0.823, and 0.754.ConclusionsThis study showed that a ML algorithm using integrated multimodal data can predict upper tertile anxiety symptoms in patients with SAD with higher performance than acoustic or physiological data obtained during a VR session. The results of this study can be used as evidence for personalized VR sessions and to demonstrate the strength of the clinical use of multimodal data.

  8. MultiBanFakeDetect: Multimodal Bangla Fake News

    • kaggle.com
    zip
    Updated Aug 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mukaffi Moin (2025). MultiBanFakeDetect: Multimodal Bangla Fake News [Dataset]. https://www.kaggle.com/datasets/mukaffimoin/multibanfakedetect-multimodal-bangla-fake-news/code
    Explore at:
    zip(2608129399 bytes)Available download formats
    Dataset updated
    Aug 14, 2025
    Authors
    Mukaffi Moin
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    MultiBanFakeDetect Dataset

    The MultiBanFakeDetect dataset consists of 9,600 text–image instances collected from online forums, news websites, and social media. It covers multiple themes — political, social, technology, and entertainment — with a balanced distribution of real and fake instances.

    The dataset is split into:

    • Training: 7,680 instances
    • Testing: 960 instances
    • Validation: 960 instances

    📊 Statistical Overview – Types of Fake News

    TypeTrainingTestingValidation
    Misinformation1,288161162
    Rumor1,215152151
    Clickbait1,337167167
    Non-fake3,840480480
    Total7,680960960

    🏷️ Distribution by Labels

    LabelTrainingTestingValidation
    1 (Fake)3,840480480
    0 (Non-Fake)3,840480480
    Total7,680960960

    🌍 Statistical Overview – Categories of Fake News

    CategoryTrainingTestingValidation
    Entertainment6408080
    Sports6408080
    Technology6408080
    National6408080
    Lifestyle6408080
    Politics6408080
    Education6408080
    International6408080
    Crime6408080
    Finance6408080
    Business6408080
    Miscellaneous6408080
    Total7,680960960
    @article{FARIA2025100347,
    title = {MultiBanFakeDetect: Integrating advanced fusion techniques for multimodal detection of Bangla fake news in under-resourced contexts},
    journal = {International Journal of Information Management Data Insights},
    volume = {5},
    number = {2},
    pages = {100347},
    year = {2025},
    issn = {2667-0968},
    doi = {https://doi.org/10.1016/j.jjimei.2025.100347},
    url = {https://www.sciencedirect.com/science/article/pii/S2667096825000291},
    author = {Fatema Tuj Johora Faria and Mukaffi Bin Moin and Zayeed Hasan and Md. Arafat Alam Khandaker and Niful Islam and Khan Md Hasib and M.F. Mridha},
    keywords = {Fake news detection, Multimodal dataset, Textual analysis, Visual analysis, Bangla language, Under-resource, Fusion techniques, Deep learning}}
    
    
  9. h

    MMSci_Table

    • huggingface.co
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bohao yang (2025). MMSci_Table [Dataset]. https://huggingface.co/datasets/yangbh217/MMSci_Table
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 23, 2025
    Authors
    bohao yang
    Description

    MMSci_Table

    Dataset for the paper "Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning"

      📑 Paper   Github  
    
    
    
    
    
    
    
      MMSci Dataset Collection
    

    The MMSci dataset collection consists of three complementary datasets designed for scientific multimodal table understanding and reasoning: MMSci-Pre, MMSci-Ins, and MMSci-Eval.

      Dataset Summary
    

    MMSci-Pre: A domain-specific pre-training dataset… See the full description on the dataset page: https://huggingface.co/datasets/yangbh217/MMSci_Table.

  10. Z

    Data from: ScientISST MOVE: Annotated Wearable Multimodal Biosignals...

    • data.niaid.nih.gov
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Areias Saraiva, João; Abreu, Mariana; Carmo, Ana Sofia; Plácido da Silva, Hugo; Fred, Ana (2023). ScientISST MOVE: Annotated Wearable Multimodal Biosignals recorded during Everyday Life Activities in Naturalistic Environments [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7940438
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Instituto Superior Técnico
    Instituto de Telecomunicações
    Authors
    Areias Saraiva, João; Abreu, Mariana; Carmo, Ana Sofia; Plácido da Silva, Hugo; Fred, Ana
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    A multi-modality, multi-activity, and multi-subject dataset of wearable biosignals. Modalities: ECG, EMG, EDA, PPG, ACC, TEMP Main Activities: Lift object, Greet people, Gesticulate while talking, Jumping, Walking, and Running Cohort: 17 subjects (10 male, 7 female); median age: 24 Devices: 2x ScientISST Core + 1x Empatica E4 Body Locations: Chest, Abdomen, Left bicep, wrist and index finger No filter has been applied to the signals, but the correct transfer functions were applied, so the data is given in relevant unis (mV, uS, g, ºC).

    For more information on background, methods and the acquisition protocol, refer to https://doi.org/10.13026/0ppk-ha30.

    In this repository, there are two formats available: a) LTBio's Biosignal files. Should be open like: x = Biosignal.load(path) LTBio Package: https://pypi.org/project/LongTermBiosignals/ Under the directory biosignal, the following tree structure is found: subject/x.biosignal, where subject is the subject's code, and x is any of the following { acc_chest, acc_wrist, ecg, eda, emg, ppg, temp }. Each file includes the signals recorded from every sensor that acquires the modality after which the file is named, independently of the device. Channels, activities and time intervals can be easily indexed with the index operator . A sneak peak of the signals can also be quickly plotted with: x.preview.plot() Any Biosignal can be easily converted to NumPy arrays or DataFrames, if needed. b) CSV files. Can be open like: x = pandas.read_csv(path) Pandas Package: https://pypi.org/project/pandas/ These files can be found under the directory csv, named as subject.csv, where subject is the subject's code. There is only one file per subject, containing their full session and all biosignal modalities. When read as tables, the time axis is in the first column, each sensor is in one of the middle columns, and the activity labels are in the last column. In each row are the samples of each sensor, if any, at each timestamp. At any given timestamp, if there is no sample for a sensor, it means the acquisition was interrupted for that sensor, which happens between activities, and sometimes for short periods during the running activity. Also in each row, on the last column, is one or more activity labels, if an activity was taking place at that timestamp. If there are multiple annotations, the labels are separated by vertical bars (e.g 'run | sprint'). If there are no annotations, the column is empty for that timestamp.

    In order to provide a tabular format with sensors with different sampling frequencies, the sensors with sampling frequency lower than 500 Hz were upsampled to 500 Hz. This way, the tables are regularly sampled, i.e., there is a row every 2 ms. If a sensor was not acquiring at a given timestamp, the corresponding cell with be empty. So, not only the segments with samples are regularly sampled, but the interruptions are also discretised. This means that if, after an interruption, a sensor starts acquiring at a non regular timestamp, the first sample will be written on the previous or the following timestamp, by half-up rounding. Naturally, this process cumulatively introduces lags in the table, some of which cancel out. Each individual lag is no longer than half the sampling period (1 ms), hence negligible. The cumulative lags are no longer than 48 ms for all subjects, which is also negligible. Nevertheless, only the LBio's Biosignal format preserves the exact original timestamps (10E-6 precision) of all samples and the original sampling frequencies.

    Both include annotations of the activities, however LTBio bio signal files have better time resolution and include clinical data and demographic data as well.

  11. Text-audio pairs (1 of 4)

    • kaggle.com
    zip
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorvan (2024). Text-audio pairs (1 of 4) [Dataset]. https://www.kaggle.com/datasets/jorvan/text-audio-pairs-1-of-3/data
    Explore at:
    zip(181547102182 bytes)Available download formats
    Dataset updated
    Jul 15, 2024
    Authors
    Jorvan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is the first of the four datasets that we have created, for audio-text training tasks. These collect pairs of texts and audios, based on the audio-image pairs from our datasets [1, 2, 3]. These are only intended for research purposes.

    For the conversion, .csv tables were created, where audio values were separated in 16,000 columns and images were transformed into texts using the public model BLIP [4]. The original images are also preserved for future reference.

    To allow other researchers a quick evaluation of the potential usefulness of our datasets for their purposes, we have made available a public page where anyone can check 60 random samples that we extracted from all of our data [5].

    [1] Jorge E. León. Image-audio pairs (1 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-1-of-3. [2] Jorge E. León. Image-audio pairs (2 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-2-of-3. [3] Jorge E. León. Image-audio pairs (3 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-3-of-3. [4] Junnan Li et al. “BLIP: Bootstrapping Language-Image Pre-training for Unified VisionLanguage Understanding and Generation”. En: ArXiv 2201.12086 (2022). [5] Jorge E. León. AVT Multimodal Dataset. 2024. url: https://jorvan758.github.io/AVT-Multimodal-Dataset/.

  12. h

    Visual-TableQA

    • huggingface.co
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI 4 Everyone (2025). Visual-TableQA [Dataset]. https://huggingface.co/datasets/AI-4-Everyone/Visual-TableQA
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    AI 4 Everyone
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🧠 Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

    Welcome to Visual-TableQA, a project designed to generate high-quality synthetic question-answer datasets associated to images of tables. This resource is ideal for training and evaluating models on visually-grounded table understanding tasks such as document QA, table parsing, and multimodal reasoning.

      🚀 Latest Update
    

    We have refreshed the dataset with newly generated QA pairs created by… See the full description on the dataset page: https://huggingface.co/datasets/AI-4-Everyone/Visual-TableQA.

  13. The table presents the evaluation results of the selected formulas using the...

    • plos.figshare.com
    xls
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen (2025). The table presents the evaluation results of the selected formulas using the R2 metric, which measures the goodness of fit between the predicted values and the actual values. The R2 values for all the selected formulas are listed, providing a clear view of the fitting performance of each formula. [Dataset]. http://doi.org/10.1371/journal.pone.0335221.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The table presents the evaluation results of the selected formulas using the R2 metric, which measures the goodness of fit between the predicted values and the actual values. The R2 values for all the selected formulas are listed, providing a clear view of the fitting performance of each formula.

  14. Multimodal Sports Injury Dataset

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mugiwara_46 (2025). Multimodal Sports Injury Dataset [Dataset]. https://www.kaggle.com/datasets/anjalibhegam/multimodal-sports-injury-dataset
    Explore at:
    zip(2821789 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    Mugiwara_46
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Multimodal Sports Injury Prediction Dataset

    📊 Dataset Overview

    This comprehensive dataset contains 15,420 samples collected from 156 athletes over a 6-month monitoring period, designed for predictive modeling of sports injury risk using multimodal sensor data and machine learning techniques.

    🎯 Dataset Purpose

    The dataset enables researchers and data scientists to: - Predict sports injury risk using multimodal physiological and biomechanical data - Develop real-time athlete monitoring systems for injury prevention - Build deep learning models (CNN, LSTM, Transformers) for temporal pattern recognition - Analyze pre-injury patterns and early warning indicators - Study relationships between training load, fatigue, and injury occurrence

    📁 Dataset Structure

    Basic Information

    • Total Samples: 15,420
    • Number of Athletes: 156
    • Features: 22 multimodal features + 7 metadata columns
    • Target Variable: injury_occurred (3 classes: Healthy, Low Risk, High Risk/Injured)
    • File Format: CSV
    • File Size: ~5 MB
    • Missing Data: 2.97% (realistic missing patterns)

    🏷️ Feature Categories

    1. Physiological Metrics (6 features)

    FeatureUnitRangeMean ± SDDescriptionSensor Type
    heart_ratebpm40-18072.4 ± 18.3Cardiovascular stress indicatorChest-strap HR monitor
    body_temperature°C35.8-39.237.1 ± 0.6Core body temperatureInfrared thermometer
    hydration_level%45-10078.3 ± 12.4Fluid balance statusBioimpedance sensor
    sleep_qualityscore2-106.8 ± 1.9Recovery quality indicatorWearable sleep tracker
    recovery_scorescore25-9868.5 ± 15.2Overall recovery statusComposite metric
    stress_levela.u.0.1-0.950.42 ± 0.18Physiological stress levelHRV-based estimate

    2. Biomechanical Data (8 features)

    FeatureUnitRangeMean ± SDDescriptionSensor Type
    muscle_activityμV10-850245.6 ± 127.3Muscle activation levelSurface EMG
    joint_anglesdegrees45-175112.3 ± 28.4Joint range of motionIMU sensors (9-axis)
    gait_speedm/s0.8-3.51.85 ± 0.52Walking/running speedMotion capture
    cadencesteps/min50-20085.7 ± 22.1Step frequencyAccelerometer
    step_countcount2000-150007823 ± 2341Total steps per sessionPedometer
    jump_heightmeters0.15-0.850.48 ± 0.14Vertical jump performanceForce plate
    ground_reaction_forceN800-28001654 ± 387Impact force during movementForce plate
    range_of_motiondegrees60-180124.5 ± 23.7Joint flexibilityGoniometer

    3. Environmental Factors (4 features)

    FeatureUnitRangeMean ± SDDescription
    ambient_temperature°C15-3824.8 ± 5.3Training environment temperature
    humidity%30-8558.3 ± 14.2Air humidity level
    altitudemeters0-1200285 ± 234Training location elevation
    playing_surfacecategorical0-4-Surface type (0=Grass, 1=Turf, 2=Indoor, 3=Track, 4=Other)

    4. Workload Indicators (4 features)

    FeatureUnitRangeMean ± SDDescription
    training_intensityRPE2-106.4 ± 1.8Perceived exertion level
    training_durationminutes30-18087.5 ± 28.3Session duration
    training_loada.u.150-1800568 ± 287Intensity × Duration
    fatigue_indexscore15-8548.3 ± 18.7Cumulative fatigue measure

    5. Metadata Columns (7 features)

    ColumnTypeDescription
    athlete_idIntegerUnique athlete identifier (1-156)
    session_idIntegerSession number per athlete
    sport_typeCategoricalSport discipline (Soccer, Basketball, Track, Other)
    genderCategoricalMale (68%), Female (32%)
    ageIntegerAthlete age in years (18-35, Mean: 24.3 ± 4.2)
    bmiFloatBody Mass Index (18.5-28.3, Mean: 23.1 ± 2.4)
    injury_occurredIntegerTarget variable (see below)

    🎯 Target Variable: injury_occurred

    The dataset includes a 3-class target variable for injury risk prediction:

    ClassLabelCountPercentageDescription
    0Healthy9,86964.0%No injury risk indicators
    1Low Risk3,23821.0%Elevated fatigue or training load
    2High Risk/Injured2,31315.0%Injury occurred or imminent risk

    Imbalance Ratio: 4.27:1 (Majority:Minority)

    Injury Definition: ...

  15. h

    tablebench-tqa

    • huggingface.co
    Updated Oct 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    multimodal table benchmark (2025). tablebench-tqa [Dataset]. https://huggingface.co/datasets/table-benchmark/tablebench-tqa
    Explore at:
    Dataset updated
    Oct 21, 2025
    Dataset authored and provided by
    multimodal table benchmark
    Description

    table-benchmark/tablebench-tqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. XAI-FUNGI: Dataset from the user study on comprehensibility of XAI...

    • zenodo.org
    csv, pdf, zip
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Szymon Bobek; Szymon Bobek; Paloma Korycińska; Paloma Korycińska; Monika Krakowska; Monika Krakowska; Maciej Mozolewski; Maciej Mozolewski; Dorota Rak; Dorota Rak; Magdalena Zych; Magdalena Zych; Magdalena Wójcik; Magdalena Wójcik; Grzegorz J. Nalepa; Grzegorz J. Nalepa (2024). XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms [Dataset]. http://doi.org/10.5281/zenodo.11448395
    Explore at:
    csv, zip, pdfAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Szymon Bobek; Szymon Bobek; Paloma Korycińska; Paloma Korycińska; Monika Krakowska; Monika Krakowska; Maciej Mozolewski; Maciej Mozolewski; Dorota Rak; Dorota Rak; Magdalena Zych; Magdalena Zych; Magdalena Wójcik; Magdalena Wójcik; Grzegorz J. Nalepa; Grzegorz J. Nalepa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

    We present the dataset which was created during a user study on evaluation of explainability of artificial intelligence (AI) at the Jagielloninan University as a collaborative work of computer science (GEIST team) and information sciences research groups. The main goal of the research was to explore effective explanations of AI model patterns to diverse audiences.

    The dataset contains material collected from 39 participants during the interviews conducted by the Information Sciences research group. The participants were recruited from 149 candidates to form three groups that represented domain experts in the field of mycology (DE), students with data science and visualization background (IT) and students from social sciences and humanities (SSH). Each group was given an explanation of a machine learning model trained to predict edible and non-edible mushrooms and asked to interpret the explanations and answer various questions during the interview. The machine learning model and explanations for its decision were prepared by the computer science research team.

    The resulting dataset was constructed from the surveys obtained from the candidates, anonymized transcripts of the interviews, the results from thematic analysis, and original explanations with modifications suggested by the participants. The dataset is complemented with the source code allowing one to reproduce the initial machine leaning model and explanations.

    The general structure of the dataset is described in the following table. The files that contain in their names [RR]_[SS]_[NN] contain the individual results obtained from particular participant. The meaning of the prefix is as follows:

    • RR - initials of the researcher conducting the interview,
    • SS - type of the participant (DE for domain expert, SSH for social sciences and humanities students, or IT for computer science students),
    • NN - number of the participant

    FileDescription
    SURVEY.csvThe results from a survey that was filled by 149 participants out of which 39 were selected to form a final group of particiapnts.
    CODEBOOK.csvThe codebook used in thematic analysis and MAXQDA coding
    QUESTIONS.csvList of questions that the participants were asked during interviews.
    SLIDES.csvList of slides used in the study with their interpretation and reference to MAXQDA themes and VISUAL_MODIFICATIONS tables.
    MAXQDA_SUMMARY.csvSummary of thematic analysis performed with codes used in CODEBOOK for each participant
    PROBLEMS.csvList of problems that participants were asked to solve during interviews. They correspond to three instances from the dataset that the participants had to classify using knowledge gained from explanations.
    PROBLEMS_RESPONSES.csvThe responses to the problems for each participant to the problems listed in PROBLEMS.csv
    VISUALIZATION_MODIFICATIONS.csvInformation on how the order of the slides was modified by the participant, which slides (explanations) were removed, and what kind of additional explanation was suggested.
    ORIGINAL_VISUZALIZATIONS.pdfThe PDF file containing the visualization of explanations presented to the participants during the interviews
    VISUALIZATION_MODIFICATIONS.zipThe PDF file containing the original slides from ORIGINAL_VISUZALIZATIONS.pdf with the modifications suggested by the participant. Each file is a PDF file named with the participant ID, i.e. [RR]_[SS]_[NN].pdf
    TRANSCRIPTS.zipThe anonymized transcripts of interviews for each given participant, zipped into one archive. Each transcript is named after the particiapnt ID, i.e. [RR]_[SS]_[NN].csv and contains text tagged with slide number that it related to, question number from QUESTIONS.csv, and problem number from PROBLEMS.csv.

    The detailed structure of the files presented in the previous Table is given in the Technical info section.

    The source code used to train ML model and to generate explanations is available on Gitlab

  17. Data from: UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban...

    • zenodo.org
    zip
    Updated Sep 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Debora Russo; Debora Russo (2025). UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban Scenarios [Dataset]. http://doi.org/10.5281/zenodo.16748119
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Debora Russo; Debora Russo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🏙️ URMAT: URban MATerials Dataset

    URMAT (Urban Materials Dataset) is a large-scale, multimodal synthetic dataset designed for training and benchmarking material-aware semantic segmentation, scene understanding, and electromagnetic wave simulation tasks in complex urban environments.

    The dataset provides pixel-wise annotated images, depth maps, segmentation masks, physical material metadata, and aligned 3D point clouds, all derived from realistic 3D reconstructions of urban scenes including Trastevere, CityLife, Louvre, Canary Wharf, Bryggen, Siemensstadt, and Eixample.

    🧱 Key Features

    • 14 material classes: Brick, Glass, Steel, Tiles, Limestone, Plaster, Concrete, Wood, Cobblestone, Slate, Asphalt, Plastic, Gravel, Unknown.

    • Multimodal data: RGB, depth, material masks, mesh segmentation

    • Physically annotated metadata: includes permittivity, reflectance, attenuation

    • 7 diverse European city districts, georeferenced and stylistically accurate

    • Precomputed point clouds for 3D analysis or downstream simulation

    • Compatible with Unreal Engine, PyTorch, and MATLAB pipelines

    📁 Dataset Structure

    At the root of the dataset:

    • *_mapping/ folders: mapping files, mesh metadata, camera poses

    • *_pointclouds/ folders: colored 3D point clouds with material labels

    • train/, val/, test/: standard splits for training and evaluation

    Inside each split (train/, val/, test/):

    Folder NameDescription
    rgb/RGB images rendered from Unreal Engine
    depth_png/Depth maps as grayscale .png (normalized for visualization)
    depth_npy/Raw depth arrays saved as .npy
    segmentation_material_png/Color-encoded material segmentation masks for visualization
    segmentation_material_npy/Material masks in .npy format (integer IDs per pixel, for training)
    segmentation_mesh/Optional masks identifying the mesh origin of each pixel
    metadata/JSON metadata with material type and physical properties per mesh

    📦 Recommended Use Cases

    • Material-aware semantic segmentation

    • Scene-level reasoning for 3D reconstruction

    • Ray tracing and wireless signal propagation simulation

    • Urban AI and Smart City research

    • Synthetic-to-real generalization studies

    📜 Citation

    If you use URMAT v2 in your research, please cite the dataset.

    Paper: "UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban Scenarios" (https://www.researchgate.net/publication/395193944_UR-MAT_A_Multimodal_Material-Aware_Synthetic_Dataset_of_Urban_Scenarios) - to appear in ACM Multimedia 2025, Dataset Track

  18. w

    Global Multimodal AI Models Market Research Report: By Application...

    • wiseguyreports.com
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Multimodal AI Models Market Research Report: By Application (Healthcare, Finance, Retail, Transportation, Manufacturing), By Deployment Model (Cloud-based, On-premises, Hybrid), By End Use Industry (Automotive, Telecommunications, Education, Entertainment), By Model Type (Vision-Language Models, Audio-Visual Models, Text-Image Models) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/multimodal-ai-models-market
    Explore at:
    Dataset updated
    Aug 23, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20244.49(USD Billion)
    MARKET SIZE 20255.59(USD Billion)
    MARKET SIZE 203550.0(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Model, End Use Industry, Model Type, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSTechnological advancements, Increasing data availability, Rising demand for automation, Enhancing user experience, Competitive landscape growth
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDAdobe, OpenAI, Baidu, Microsoft, Google, C3.ai, Meta, Tencent, SAP, IBM, Amazon, Hugging Face, Alibaba, Salesforce, Nvidia
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESNatural language processing integration, Enhanced personalization in services, Advanced healthcare applications, Smart automation in industries, Scalable cloud-based solutions
    COMPOUND ANNUAL GROWTH RATE (CAGR) 24.5% (2025 - 2035)
  19. Multimodal Defensive Communication Database (DefComm-DB)

    • zenodo.org
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review (2023). Multimodal Defensive Communication Database (DefComm-DB) [Dataset]. http://doi.org/10.5281/zenodo.7706919
    Explore at:
    Dataset updated
    May 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review
    Description

    Description: DefComm-DB comprises 261 genuine non-acted dialogues between English-speaking individuals in 'real-world' settings that feature one of the defensive behaviours outlined in Birkenbihl's model of communication failures [1]:

    1. Attacking the conversation partner (class Attack): videos that depict individuals actively attacking verbally, blaming the other person, or shifting the other person's attention to themselves.
    2. Withdrawing from the communication (class Flight): videos where people refuse to respond, withdraw from the conversation or change the topic or focus.
    3. Making oneself greater (class Greater): videos that depict individuals boasting, self-justifying in an aggressive manner, denying accusations, exhibiting a sense of dominance or superiority, or expressing indignation.
    4. Making oneself smaller (class Smaller): videos that display individuals engaging in self-deprecation, self-blame, exhibiting a sense of guilt, apologising, and expressing feelings of vulnerability or worthlessness.

    [1] Birkenbihl, V. (2013). Kommunikationstraining: Zwischenmenschliche Beziehungen erfolgreich gestalten. Schritte 1–6. : mvg Verlag.

    Key statistics on the dataset are provided in Table 1. DefComm features a variety of video topics, including interviews with celebrities and professional athletes, political debates, legal trials, TV shows, and video footage obtained by paparazzi, among others. The situations, number of participants, gender, age, and ethnicity vary from scene to scene.

    From each video, we retrieve audio, visual, and textual modalities. In this paper, we focus on the audio modality and the speech transcriptions.

    Table 1: Statistics on Def-Comm: number of video clips, mean duration (μ), standard deviation (σ), minimum, maximum, and total duration of collected videos per class.
    Label# video clipsμ [s]σ [s]min [s]max [s]Σ duration [s]
    Attack11289246949
    Flight5798262494
    Greater4596225416
    Smaller47128349556
    Total261982622415
  20. f

    Supplementary Table 2 from Long-term Multimodal Recording Reveals Epigenetic...

    • datasetcatalog.nlm.nih.gov
    • aacr.figshare.com
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Canale, Eleonora; Zemlyanskiy, Grigory; Ghirardi, Chiara; Vingiani, Andrea; Bonaldi, Tiziana; Phillips, Henry; Magnani, Luca; Bertolotti, Alessia; James, Chela; Győrffy, Balázs; Lynn, Claire; Noberini, Roberta; Barozzi, Iros; Sofyali, Emre; Rehman, Farah; Dewhurst, Hannah F.; Dhiman, Heena; Heide, Timon; Rosano, Dalia; Li, Tong; Ivanoiu, Diana; Sottoriva, Andrea; Saha, Debjani; Pruneri, Giancarlo; Slaven, Neil; Cresswell, George D. (2024). Supplementary Table 2 from Long-term Multimodal Recording Reveals Epigenetic Adaptation Routes in Dormant Breast Cancer Cells [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001424948
    Explore at:
    Dataset updated
    May 1, 2024
    Authors
    Canale, Eleonora; Zemlyanskiy, Grigory; Ghirardi, Chiara; Vingiani, Andrea; Bonaldi, Tiziana; Phillips, Henry; Magnani, Luca; Bertolotti, Alessia; James, Chela; Győrffy, Balázs; Lynn, Claire; Noberini, Roberta; Barozzi, Iros; Sofyali, Emre; Rehman, Farah; Dewhurst, Hannah F.; Dhiman, Heena; Heide, Timon; Rosano, Dalia; Li, Tong; Ivanoiu, Diana; Sottoriva, Andrea; Saha, Debjani; Pruneri, Giancarlo; Slaven, Neil; Cresswell, George D.
    Description

    Coverage data for patient profiling

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Annamaria Neag; SARAH HEALY (2022). Table of three selected data events: Assembling a multimodal analysis [Dataset]. http://doi.org/10.26188/6285b8ec5db6b

Table of three selected data events: Assembling a multimodal analysis

Explore at:
pdfAvailable download formats
Dataset updated
May 19, 2022
Dataset provided by
The University of Melbourne
Authors
Annamaria Neag; SARAH HEALY
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Table of three selected data events showing initial analysis of multimodal elements that constituted the data, which were the:

Metadata, Social actors, Visual images (e.g., photo anaysis), Linguistic expressions of sentiment, Non-linguistic reactions, and 6) Broader social-economic-political relations

Search
Clear search
Close search
Google apps
Main menu