100+ datasets found

u
Table of three selected data events: Assembling a multimodal analysis
figshare.unimelb.edu.au
pdf
Updated May 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annamaria Neag; SARAH HEALY (2022). Table of three selected data events: Assembling a multimodal analysis [Dataset]. http://doi.org/10.26188/6285b8ec5db6b
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.26188/6285b8ec5db6b
Dataset updated
May 19, 2022
Dataset provided by
The University of Melbourne
Authors
Annamaria Neag; SARAH HEALY
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Table of three selected data events showing initial analysis of multimodal elements that constituted the data, which were the:

Metadata, Social actors, Visual images (e.g., photo anaysis), Linguistic expressions of sentiment, Non-linguistic reactions, and 6) Broader social-economic-political relations
Text-audio pairs (4 of 4)
kaggle.com
zip
Updated Aug 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorvan (2024). Text-audio pairs (4 of 4) [Dataset]. https://www.kaggle.com/jorvan/text-audio-pairs-4-of-4
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Aug 14, 2024
Authors
Jorvan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is the fourth of the four datasets that we have created, for audio-text training tasks. These collect pairs of texts and audios, based on the audio-image pairs from our datasets [1, 2, 3]. These are only intended for research purposes.

For the conversion, .csv tables were created, where audio values were separated in 16,000 columns and images were transformed into texts using the public model BLIP [4]. The original images are also preserved for future reference.

To allow other researchers a quick evaluation of the potential usefulness of our datasets for their purposes, we have made available a public page where anyone can check 60 random samples that we extracted from all of our data [5].

[1] Jorge E. León. Image-audio pairs (1 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-1-of-3. [2] Jorge E. León. Image-audio pairs (2 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-2-of-3. [3] Jorge E. León. Image-audio pairs (3 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-3-of-3. [4] Junnan Li et al. “BLIP: Bootstrapping Language-Image Pre-training for Unified VisionLanguage Understanding and Generation”. En: ArXiv 2201.12086 (2022). [5] Jorge E. León. AVT Multimodal Dataset. 2024. url: https://jorvan758.github.io/AVT-Multimodal-Dataset/.
h
TableBench
huggingface.co
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Multilingual-Multimodal-NLP (2025). TableBench [Dataset]. https://huggingface.co/datasets/Multilingual-Multimodal-NLP/TableBench
Explore at:
Dataset updated
Mar 28, 2025
Dataset authored and provided by
Multilingual-Multimodal-NLP
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for TableBench

📚 Paper

🏆 Leaderboard

💻 Code

Dataset Summary

TableBench is a comprehensive and complex benchmark designed to evaluate Table Question Answering (TableQA) capabilities, aligning closely with the "Reasoning Complexity of Questions" dimension in real-world Table QA scenarios. It covers 18 question categories across 4 major ategories—including… See the full description on the dataset page: https://huggingface.co/datasets/Multilingual-Multimodal-NLP/TableBench.

Multimodal Recommendation System Datasets

kaggle.com

Updated Aug 21, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Ignacio Avas (2023). Multimodal Recommendation System Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/6338676

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/6338676

Dataset updated

Aug 21, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ignacio Avas

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Quick start

To read any dataset you can use the following code

>>> import numpy as np
>>> embed_image = np.load('embed_image.npy')
>>> embed_image.shape
(33962, 768)
>>> embed_text = np.load('embed_text.npy')
>>> embed_text.shape
(33962, 768)
>>> import pandas as pd
>>> items = pd.read_csv('items.txt')
>>> m = len(items)
>>> print(f'{m} items in dataset')
33962
>>> users = pd.read_csv('users.txt')
>>> n = len(users)
>>> print(f'{n} users in dataset')
14790
>>> train = pd.read_csv('train.txt')
>>> train
     user  item
0    13444 23557
1    13444 33739
...    ...  ...
317109 13506 29993
317110 13506 13931
>>> from scipy.sparse import csr_matrix
>>> train_matrix = csr_matrix((np.ones(len(train)), (train.user, train.item)), shape=(n,m))

Folders

This dataset contains six datasets. Each dataset is duplicated with seven combinations of different Image and Text encoders, so you should see 42 folders.

Each folder is the name of the dataset and the encoder used for the visual and textual parts. For example: bookcrossing-vit_bert.

The datasets are: - Clothing, Shoes and Jewelry (Amazon) - Home and Kitchen (Amazon) - Musical Instruments (Amazon) - Movies and TV (Amazon) - Book-Crossing - Movielens 25M

And the encoders are: - CLIP (Image and Text) (*-clip_clip). This is the main one used in the experiments. - ViT and BERT (*-vit_bert) - CLIP (only visual data) *-clip_none - ViT only *-vit_none - BERT only *-none_bert - CLIP (text only) *-clip_none - No textual or visual information *-none_none

Files per folder

For each dataset, we have the following files, considering we have M items and N users, textual embeddings with D (like 1024) dimensions, and Visual with E dimensions (like 768) - embed_image.npy A NumPy array of MxE elements. - embed_text.npy A NumPy array of MXD elements. - items.csv A CSV with the Item ID in the original dataset (like the Amazon ASIN, the Movie ID, etc.) and the item number, an integer from 0 to M-1 - users.csv A CSV with the User ID in the original dataset (like the Amazon Reviewer Id) and the item number, an integer from 0 to N-1 - train.txt, validation.txt and test.txt are CSV files with the portions of the reviews for train validation and test. It has the item the user liked or reviewed positively. Each row has a positive user item.

We consider a review "positive" if the rating is four or more (or 8 or more for Book-crossing).

The vector is zeroed out if an Item does not have an image or text.

Dataset stats

Dataset	Users	Item	Ratings	Density
Clothing & Shoes & Jewelry	23318	38493	178944	0.020%
Home & Kitchen	5968	57645	135839	0.040%
Movies & TV	21974	23958	216110	0.041%
Musical Instruments	14429	29040	93923	0.022%
Book-crossing	14790	33962	519613	0.103%
Movielens 25M	162541	59047	25000095	0.260%

Modifications from the original source

Only a tiny fraction of the dataset was taken for the Amazon Datasets by considering reviews in a specific date range.

For the Bookcrossing dataset, only items with images were considered.

There are various other minor tweaks on how to obtain images and texts. The repo https://github.com/igui/MultimodalRecomAnalysis has the Notebook and scripts to reproduce the dataset extraction from scratch.

Steam Dataset 2025: Multi-Modal Gaming Analytics
kaggle.com
zip
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CrainBramp (2025). Steam Dataset 2025: Multi-Modal Gaming Analytics [Dataset]. https://www.kaggle.com/datasets/crainbramp/steam-dataset-2025-multi-modal-gaming-analytics
Explore at:
zip(12478964226 bytes)Available download formats
Dataset updated
Oct 7, 2025
Authors
CrainBramp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Steam Dataset 2025: Multi-Modal Gaming Analytics Platform

The first multi-modal Steam dataset with semantic search capabilities. 239,664 applications collected from official Steam Web APIs with PostgreSQL database architecture, vector embeddings for content discovery, and comprehensive review analytics.

Made by a lifelong gamer for the gamer in all of us. Enjoy!🎮

GitHub Repository https://github.com/vintagedon/steam-dataset-2025

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4b7eb73ac0f2c3cc9f0d57f37321b38f%2FScreenshot%202025-10-18%20180450.png?generation=1760825194507387&alt=media" alt=""> 1024-dimensional game embeddings projected to 2D via UMAP reveal natural genre clustering in semantic space

What Makes This Different

Unlike traditional flat-file Steam datasets, this is built as an analytically-native database optimized for advanced data science workflows:

☑️ Semantic Search Ready - 1024-dimensional BGE-M3 embeddings enable content-based game discovery beyond keyword matching

☑️ Multi-Modal Architecture - PostgreSQL + JSONB + pgvector in unified database structure

☑️ Production Scale - 239K applications vs typical 6K-27K in existing datasets

☑️ Complete Review Corpus - 1,048,148 user reviews with sentiment and metadata

☑️ 28-Year Coverage - Platform evolution from 1997-2025

☑️ Publisher Networks - Developer and publisher relationship data for graph analysis

☑️ Complete Methodology & Infrastructure - Full work logs document every technical decision and challenge encountered, while my API collection scripts, database schemas, and processing pipelines enable you to update the dataset, fork it for customized analysis, learn from real-world data engineering workflows, or critique and improve the methodology

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F649e9f7f46c6ce213101d0948c89e8ac%2F4_price_distribution_by_top_10_genres.png?generation=1760824835918620&alt=media" alt=""> Market segmentation and pricing strategy analysis across top 10 genres

What's Included

Core Data (CSV Exports): - 239,664 Steam applications with complete metadata - 1,048,148 user reviews with scores and statistics - 13 normalized relational tables for pandas/SQL workflows - Genre classifications, pricing history, platform support - Hardware requirements (min/recommended specs) - Developer and publisher portfolios

Advanced Features (PostgreSQL): - Full database dump with optimized indexes - JSONB storage preserving complete API responses - Materialized columns for sub-second query performance - Vector embeddings table (pgvector-ready)

Documentation: - Complete data dictionary with field specifications - Database schema documentation - Collection methodology and validation reports

Example Analysis: Published Notebooks (v1.0)

Three comprehensive analysis notebooks demonstrate dataset capabilities. All notebooks render directly on GitHub with full visualizations and output:

📊 Platform Evolution & Market Landscape

View on GitHub | PDF Export
28 years of Steam's growth, genre evolution, and pricing strategies.

🔍 Semantic Game Discovery

View on GitHub | PDF Export
Content-based recommendations using vector embeddings across genre boundaries.

🎯 The Semantic Fingerprint

View on GitHub | PDF Export
Genre prediction from game descriptions - demonstrates text analysis capabilities.

Notebooks render with full output on GitHub. Kaggle-native versions planned for v1.1 release. CSV data exports included in dataset for immediate analysis.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F28514182%2F4079e43559d0068af00a48e2c31f0f1d%2FScreenshot%202025-10-18%20180214.png?generation=1760824950649726&alt=media" alt=""> *Steam platfor...
Datasets for Evaluation of Multimodal Image Registration
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje (2021). Datasets for Evaluation of Multimodal Image Registration [Dataset]. http://doi.org/10.5281/zenodo.5557568
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5557568
Dataset updated
Oct 11, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiahao Lu; Jiahao Lu; Johan Öfverstedt; Johan Öfverstedt; Joakim Lindblad; Joakim Lindblad; Nataša Sladoje; Nataša Sladoje
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

Aerial data

The Aerial dataset is divided into 3 sub-groups by IDs: {7, 9, 20, 3, 15, 18}, {10, 1, 13, 4, 11, 6, 16}, {14, 8, 17, 5, 19, 12, 2}. Since the images vary in size, each image is subdivided into the maximal number of equal-sized non-overlapping regions such that each region can contain exactly one 300x300 px image patch. Then one 300x300 px image patch is extracted from the centre of each region. The particular 3-folded grouping followed by splitting leads to that each evaluation fold contains 72 test samples.

Modality A: Near-Infrared (NIR)

Modality B: three colour channels (in B-G-R order)

Cytological data

The Cytological data contains images from 3 different cell lines; all images from one cell line is treated as one fold in 3-folded cross-validation. Each image in the dataset is subdivided from 600x600 px into 2x2 patches of size 300x300 px, so that there are 420 test samples in each evaluation fold.

Modality A: Fluorescence Images

Modality B: Quantitative Phase Images (QPI)

Histological dataset

For the Histological data, to avoid too easy registration relying on the circular border of the TMA cores, the evaluation images are created by cutting 834x834 px patches from the centres of the original 134 TMA image pairs.

Modality A: Second Harmonic Generation (SHG)

Modality B: Bright-Field (BF)

The evaluation set created from the above three publicly available 2D datasets consists of images undergone 4 levels of (rigid) transformations of increasing size of displacement. The level of transformations is determined by the size of the rotation angle θ and the displacement tx & ty, detailed in this table. Each image sample is transformed exactly once at each transformation level so that all levels have the same number of samples.

Radiological data

The Radiological dataset is divided into 3 sub-groups by patient IDs: {109, 106, 003, 006}, {108, 105, 007, 001}, {107, 102, 005, 009}. Since the Radiological dataset is non-isotropic (and also of varying resolution), it is resampled using B-spline interpolation to 1 mm³ cubic voxels, taking explicit care to not resample twice; displaced volumes are transformed and resampled in one step.

Modality A: T1-weighted MRI

Modality B: T2-weighted MRI

(Run make_rire_patches.py to generate the sub-volumes.)

Reference sub-volumes of size 210x210x70 voxels are cropped directly from centres of the (non-displaced) resampled volumes. Similarly as for the aforementioned 2D datasets, random (uniformly-distributed) transformations are composed of rotations θx, θy ∈ [-4, 4] degrees around the x- and y-axes, rotation θz ∈ [-20, 20] degrees around the z-axis, translations tx, ty ∈ [-19.6, 19.6] voxels in x and y directions and translation tz ∈ [-6.5, 6.5] voxels in z direction. 40 rigid transformations of increasing sizes of displacement are applied to each volume. Transformed sub-volumes, of size 210x210x70 voxels, are cropped from centres of the transformed and resampled volumes.

In total, it contains 864 image pairs created from the aerial dataset, 5040 image pairs created from the cytological dataset, 536 image pairs created from the histological dataset, and metadata with scripts to create the 480 volume pairs from the radiological dataset. Each image pair consists of a reference patch \(I^{\text{Ref}}\) and its corresponding initial transformed patch \(I^{\text{Init}}\) in both modalities, along with the ground-truth transformation parameters to recover it.

Scripts to calculate the registration performance and to plot the overall results can be found in https://github.com/MIDA-group/MultiRegEval, and instructions to generate more evaluation data with different settings can be found in https://github.com/MIDA-group/MultiRegEval/tree/master/Datasets#instructions-for-customising-evaluation-data.

Metadata

In the *.zip files, each row in {Zurich,Balvan}_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv or Eliceiri_patches/patch_tlevel[1-4]/info_test.csv provides the information of an image pair as follow:

Filename: identifier(ID) of the image pair

X1_Ref: x-coordinate of the upper-left corner of reference patch I_Ref

Y1_Ref: y-coordinate of the upper-left corner of reference patch I_Ref

X2_Ref: x-coordinate of the lower-left corner of reference patch I_Ref

Y2_Ref: y-coordinate of the lower-left corner of reference patch I_Ref

X3_Ref: x-coordinate of the lower-right corner of reference patch I_Ref

Y3_Ref: y-coordinate of the lower-right corner of reference patch I_Ref

X4_Ref: x-coordinate of the upper-right corner of reference patch I_Ref

Y4_Ref: y-coordinate of the upper-right corner of reference patch I_Ref

X1_Trans: x-coordinate of the upper-left corner of transformed patch I_Init

Y1_Trans: y-coordinate of the upper-left corner of transformed patch I_Init

X2_Trans: x-coordinate of the lower-left corner of transformed patch I_Init

Y2_Trans: y-coordinate of the lower-left corner of transformed patch I_Init

X3_Trans: x-coordinate of the lower-right corner of transformed patch I_Init

Y3_Trans: y-coordinate of the lower-right corner of transformed patch I_Init

X4_Trans: x-coordinate of the upper-right corner of transformed patch I_Init

Y4_Trans: y-coordinate of the upper-right corner of transformed patch I_Init

Displacement: mean Euclidean distance between reference corner points and transformed corner points

RelativeDisplacement: the ratio of displacement to the width/height of image patch

Tx: randomly generated translation in the x-direction to synthesise the transformed patch I_Init

Ty: randomly generated translation in the y-direction to synthesise the transformed patch I_Init

AngleDegree: randomly generated rotation in degrees to synthesise the transformed patch I_Init

AngleRad: randomly generated rotation in radian to synthesise the transformed patch I_Init

In addition, each row in RIRE_patches/fold[1-3]/patch_tlevel[1-4]/info_test.csv has following columns:

Z1_Ref: z-coordinate of the upper-left corner of reference patch I_Ref

Z2_Ref: z-coordinate of the lower-left corner of reference patch I_Ref

Z3_Ref: z-coordinate of the lower-right corner of reference patch I_Ref

Z4_Ref: z-coordinate of the upper-right corner of reference patch I_Ref

Z1_Trans: z-coordinate of the upper-left corner of transformed patch I_Init

Z2_Trans: z-coordinate of the lower-left corner of transformed patch I_Init

Z3_Trans: z-coordinate of the lower-right corner of transformed patch I_Init

Z4_Trans: z-coordinate of the upper-right corner of transformed patch I_Init

(...and similarly, coordinates of the 5th-8th corners)

Tz: randomly generated translation in z-direction to synthesise the transformed patch I_Init

AngleDegreeX: randomly generated rotation around X-axis in degrees to synthesise the transformed patch I_Init

AngleRadX: randomly generated rotation around X-axis in radian to synthesise the transformed patch I_Init

AngleDegreeY: randomly generated rotation around Y-axis in degrees to synthesise the transformed patch I_Init

AngleRadY: randomly generated rotation around Y-axis in radian to synthesise the transformed patch I_Init

AngleDegreeZ: randomly generated rotation around Z-axis in degrees to synthesise the transformed patch I_Init

AngleRadZ: randomly generated rotation around Z-axis in radian to synthesise the transformed patch I_Init

Naming convention

Aerial Data

zh{ID}_{iRow}_{iCol}_{ReferenceOrTransformed}.png

Example: zh5_03_02_R.png indicates the Reference patch of the 3rd row and 2nd column cut from the image with ID zh5.

</li> <li><strong>Cytological data</strong> <ul> <li> <pre> {{cellline}_{treatment}_{fieldofview}_{iFrame}}_{iRow}_{iCol}_{ReferenceOrTransformed}.png</pre> </li> <li>Example: <code>PNT1A_do_1_f15_02_01_T.png</code> indicates the <em>Transformed
f
Table 1_Machine learning prediction of anxiety symptoms in social anxiety...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pack, Seung Pil; Hur, Ji-Won; Jung, Dooyoung; Cho, Chul-Hyun; Park, Jin-Hyun; Lee, Hwamin; Lee, Heon-Jeong; Shin, Yu-Bin (2025). Table 1_Machine learning prediction of anxiety symptoms in social anxiety disorder: utilizing multimodal data from virtual reality sessions.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001283930
Explore at:
Dataset updated
Jan 7, 2025
Authors
Pack, Seung Pil; Hur, Ji-Won; Jung, Dooyoung; Cho, Chul-Hyun; Park, Jin-Hyun; Lee, Hwamin; Lee, Heon-Jeong; Shin, Yu-Bin
Description
IntroductionMachine learning (ML) is an effective tool for predicting mental states and is a key technology in digital psychiatry. This study aimed to develop ML algorithms to predict the upper tertile group of various anxiety symptoms based on multimodal data from virtual reality (VR) therapy sessions for social anxiety disorder (SAD) patients and to evaluate their predictive performance across each data type.MethodsThis study included 32 SAD-diagnosed individuals, and finalized a dataset of 132 samples from 25 participants. It utilized multimodal (physiological and acoustic) data from VR sessions to simulate social anxiety scenarios. This study employed extended Geneva minimalistic acoustic parameter set for acoustic feature extraction and extracted statistical attributes from time series-based physiological responses. We developed ML models that predict the upper tertile group for various anxiety symptoms in SAD using Random Forest, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) models. The best parameters were explored through grid search or random search, and the models were validated using stratified cross-validation and leave-one-out cross-validation.ResultsThe CatBoost, using multimodal features, exhibited high performance, particularly for the Social Phobia Scale with an area under the receiver operating characteristics curve (AUROC) of 0.852. It also showed strong performance in predicting cognitive symptoms, with the highest AUROC of 0.866 for the Post-Event Rumination Scale. For generalized anxiety, the LightGBM’s prediction for the State-Trait Anxiety Inventory-trait led to an AUROC of 0.819. In the same analysis, models using only physiological features had AUROCs of 0.626, 0.744, and 0.671, whereas models using only acoustic features had AUROCs of 0.788, 0.823, and 0.754.ConclusionsThis study showed that a ML algorithm using integrated multimodal data can predict upper tertile anxiety symptoms in patients with SAD with higher performance than acoustic or physiological data obtained during a VR session. The results of this study can be used as evidence for personalized VR sessions and to demonstrate the strength of the clinical use of multimodal data.

MultiBanFakeDetect: Multimodal Bangla Fake News

kaggle.com

zip

Updated Aug 14, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Mukaffi Moin (2025). MultiBanFakeDetect: Multimodal Bangla Fake News [Dataset]. https://www.kaggle.com/datasets/mukaffimoin/multibanfakedetect-multimodal-bangla-fake-news/code

Explore at:

zip(2608129399 bytes)Available download formats

Dataset updated

Aug 14, 2025

Authors

Mukaffi Moin

License

https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

Description

MultiBanFakeDetect Dataset

The MultiBanFakeDetect dataset consists of 9,600 text–image instances collected from online forums, news websites, and social media. It covers multiple themes — political, social, technology, and entertainment — with a balanced distribution of real and fake instances.

The dataset is split into:

Training: 7,680 instances
Testing: 960 instances
Validation: 960 instances

📊 Statistical Overview – Types of Fake News

Type	Training	Testing	Validation
Misinformation	1,288	161	162
Rumor	1,215	152	151
Clickbait	1,337	167	167
Non-fake	3,840	480	480
Total	7,680	960	960

🏷️ Distribution by Labels

Label	Training	Testing	Validation
1 (Fake)	3,840	480	480
0 (Non-Fake)	3,840	480	480
Total	7,680	960	960

🌍 Statistical Overview – Categories of Fake News

Category	Training	Testing	Validation
Entertainment	640	80	80
Sports	640	80	80
Technology	640	80	80
National	640	80	80
Lifestyle	640	80	80
Politics	640	80	80
Education	640	80	80
International	640	80	80
Crime	640	80	80
Finance	640	80	80
Business	640	80	80
Miscellaneous	640	80	80
Total	7,680	960	960

@article{FARIA2025100347,
title = {MultiBanFakeDetect: Integrating advanced fusion techniques for multimodal detection of Bangla fake news in under-resourced contexts},
journal = {International Journal of Information Management Data Insights},
volume = {5},
number = {2},
pages = {100347},
year = {2025},
issn = {2667-0968},
doi = {https://doi.org/10.1016/j.jjimei.2025.100347},
url = {https://www.sciencedirect.com/science/article/pii/S2667096825000291},
author = {Fatema Tuj Johora Faria and Mukaffi Bin Moin and Zayeed Hasan and Md. Arafat Alam Khandaker and Niful Islam and Khan Md Hasib and M.F. Mridha},
keywords = {Fake news detection, Multimodal dataset, Textual analysis, Visual analysis, Bangla language, Under-resource, Fusion techniques, Deep learning}}

h
MMSci_Table
huggingface.co
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bohao yang (2025). MMSci_Table [Dataset]. https://huggingface.co/datasets/yangbh217/MMSci_Table
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2025
Authors
bohao yang
Description
MMSci_Table

Dataset for the paper "Does Table Source Matter? Benchmarking and Improving Multimodal Scientific Table Understanding and Reasoning"

📑 Paper Github MMSci Dataset Collection

The MMSci dataset collection consists of three complementary datasets designed for scientific multimodal table understanding and reasoning: MMSci-Pre, MMSci-Ins, and MMSci-Eval.

Dataset Summary

MMSci-Pre: A domain-specific pre-training dataset… See the full description on the dataset page: https://huggingface.co/datasets/yangbh217/MMSci_Table.
Z
Data from: ScientISST MOVE: Annotated Wearable Multimodal Biosignals...
data.niaid.nih.gov
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Areias Saraiva, João; Abreu, Mariana; Carmo, Ana Sofia; Plácido da Silva, Hugo; Fred, Ana (2023). ScientISST MOVE: Annotated Wearable Multimodal Biosignals recorded during Everyday Life Activities in Naturalistic Environments [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7940438
Explore at:
Dataset updated
Nov 14, 2023
Dataset provided by
Instituto Superior Técnico
Instituto de Telecomunicações
Authors
Areias Saraiva, João; Abreu, Mariana; Carmo, Ana Sofia; Plácido da Silva, Hugo; Fred, Ana
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
A multi-modality, multi-activity, and multi-subject dataset of wearable biosignals. Modalities: ECG, EMG, EDA, PPG, ACC, TEMP Main Activities: Lift object, Greet people, Gesticulate while talking, Jumping, Walking, and Running Cohort: 17 subjects (10 male, 7 female); median age: 24 Devices: 2x ScientISST Core + 1x Empatica E4 Body Locations: Chest, Abdomen, Left bicep, wrist and index finger No filter has been applied to the signals, but the correct transfer functions were applied, so the data is given in relevant unis (mV, uS, g, ºC).

For more information on background, methods and the acquisition protocol, refer to https://doi.org/10.13026/0ppk-ha30.

In this repository, there are two formats available: a) LTBio's Biosignal files. Should be open like: x = Biosignal.load(path) LTBio Package: https://pypi.org/project/LongTermBiosignals/ Under the directory biosignal, the following tree structure is found: subject/x.biosignal, where subject is the subject's code, and x is any of the following { acc_chest, acc_wrist, ecg, eda, emg, ppg, temp }. Each file includes the signals recorded from every sensor that acquires the modality after which the file is named, independently of the device. Channels, activities and time intervals can be easily indexed with the index operator . A sneak peak of the signals can also be quickly plotted with: x.preview.plot() Any Biosignal can be easily converted to NumPy arrays or DataFrames, if needed. b) CSV files. Can be open like: x = pandas.read_csv(path) Pandas Package: https://pypi.org/project/pandas/ These files can be found under the directory csv, named as subject.csv, where subject is the subject's code. There is only one file per subject, containing their full session and all biosignal modalities. When read as tables, the time axis is in the first column, each sensor is in one of the middle columns, and the activity labels are in the last column. In each row are the samples of each sensor, if any, at each timestamp. At any given timestamp, if there is no sample for a sensor, it means the acquisition was interrupted for that sensor, which happens between activities, and sometimes for short periods during the running activity. Also in each row, on the last column, is one or more activity labels, if an activity was taking place at that timestamp. If there are multiple annotations, the labels are separated by vertical bars (e.g 'run | sprint'). If there are no annotations, the column is empty for that timestamp.

In order to provide a tabular format with sensors with different sampling frequencies, the sensors with sampling frequency lower than 500 Hz were upsampled to 500 Hz. This way, the tables are regularly sampled, i.e., there is a row every 2 ms. If a sensor was not acquiring at a given timestamp, the corresponding cell with be empty. So, not only the segments with samples are regularly sampled, but the interruptions are also discretised. This means that if, after an interruption, a sensor starts acquiring at a non regular timestamp, the first sample will be written on the previous or the following timestamp, by half-up rounding. Naturally, this process cumulatively introduces lags in the table, some of which cancel out. Each individual lag is no longer than half the sampling period (1 ms), hence negligible. The cumulative lags are no longer than 48 ms for all subjects, which is also negligible. Nevertheless, only the LBio's Biosignal format preserves the exact original timestamps (10E-6 precision) of all samples and the original sampling frequencies.

Both include annotations of the activities, however LTBio bio signal files have better time resolution and include clinical data and demographic data as well.
Text-audio pairs (1 of 4)
kaggle.com
zip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorvan (2024). Text-audio pairs (1 of 4) [Dataset]. https://www.kaggle.com/datasets/jorvan/text-audio-pairs-1-of-3/data
Explore at:
zip(181547102182 bytes)Available download formats
Dataset updated
Jul 15, 2024
Authors
Jorvan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is the first of the four datasets that we have created, for audio-text training tasks. These collect pairs of texts and audios, based on the audio-image pairs from our datasets [1, 2, 3]. These are only intended for research purposes.

For the conversion, .csv tables were created, where audio values were separated in 16,000 columns and images were transformed into texts using the public model BLIP [4]. The original images are also preserved for future reference.

To allow other researchers a quick evaluation of the potential usefulness of our datasets for their purposes, we have made available a public page where anyone can check 60 random samples that we extracted from all of our data [5].

[1] Jorge E. León. Image-audio pairs (1 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-1-of-3. [2] Jorge E. León. Image-audio pairs (2 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-2-of-3. [3] Jorge E. León. Image-audio pairs (3 of 3). 2024. url: https://www.kaggle.com/datasets/jorvan/image-audio-pairs-3-of-3. [4] Junnan Li et al. “BLIP: Bootstrapping Language-Image Pre-training for Unified VisionLanguage Understanding and Generation”. En: ArXiv 2201.12086 (2022). [5] Jorge E. León. AVT Multimodal Dataset. 2024. url: https://jorvan758.github.io/AVT-Multimodal-Dataset/.
h
Visual-TableQA
huggingface.co
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI 4 Everyone (2025). Visual-TableQA [Dataset]. https://huggingface.co/datasets/AI-4-Everyone/Visual-TableQA
Explore at:
Dataset updated
Sep 30, 2025
Dataset authored and provided by
AI 4 Everyone
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
🧠 Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

Welcome to Visual-TableQA, a project designed to generate high-quality synthetic question-answer datasets associated to images of tables. This resource is ideal for training and evaluating models on visually-grounded table understanding tasks such as document QA, table parsing, and multimodal reasoning.

🚀 Latest Update

We have refreshed the dataset with newly generated QA pairs created by… See the full description on the dataset page: https://huggingface.co/datasets/AI-4-Everyone/Visual-TableQA.
The table presents the evaluation results of the selected formulas using the...
plos.figshare.com
xls
Updated Oct 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen (2025). The table presents the evaluation results of the selected formulas using the R2 metric, which measures the goodness of fit between the predicted values and the actual values. The R2 values for all the selected formulas are listed, providing a clear view of the fitting performance of each formula. [Dataset]. http://doi.org/10.1371/journal.pone.0335221.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0335221.t007
Dataset updated
Oct 31, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Zhifeng Wang; Wanxuan Wu; Chunyan Zeng; Jialiang Shen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The table presents the evaluation results of the selected formulas using the R2 metric, which measures the goodness of fit between the predicted values and the actual values. The R2 values for all the selected formulas are listed, providing a clear view of the fitting performance of each formula.

Multimodal Sports Injury Dataset

kaggle.com

zip

Updated Oct 30, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Mugiwara_46 (2025). Multimodal Sports Injury Dataset [Dataset]. https://www.kaggle.com/datasets/anjalibhegam/multimodal-sports-injury-dataset

Explore at:

zip(2821789 bytes)Available download formats

Dataset updated

Oct 30, 2025

Authors

Mugiwara_46

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Multimodal Sports Injury Prediction Dataset

📊 Dataset Overview

This comprehensive dataset contains 15,420 samples collected from 156 athletes over a 6-month monitoring period, designed for predictive modeling of sports injury risk using multimodal sensor data and machine learning techniques.

🎯 Dataset Purpose

The dataset enables researchers and data scientists to: - Predict sports injury risk using multimodal physiological and biomechanical data - Develop real-time athlete monitoring systems for injury prevention - Build deep learning models (CNN, LSTM, Transformers) for temporal pattern recognition - Analyze pre-injury patterns and early warning indicators - Study relationships between training load, fatigue, and injury occurrence

📁 Dataset Structure

Basic Information

Total Samples: 15,420
Number of Athletes: 156
Features: 22 multimodal features + 7 metadata columns
Target Variable: injury_occurred (3 classes: Healthy, Low Risk, High Risk/Injured)
File Format: CSV
File Size: ~5 MB
Missing Data: 2.97% (realistic missing patterns)

🏷️ Feature Categories

1. Physiological Metrics (6 features)

Feature	Unit	Range	Mean ± SD	Description	Sensor Type
`heart_rate`	bpm	40-180	72.4 ± 18.3	Cardiovascular stress indicator	Chest-strap HR monitor
`body_temperature`	°C	35.8-39.2	37.1 ± 0.6	Core body temperature	Infrared thermometer
`hydration_level`	%	45-100	78.3 ± 12.4	Fluid balance status	Bioimpedance sensor
`sleep_quality`	score	2-10	6.8 ± 1.9	Recovery quality indicator	Wearable sleep tracker
`recovery_score`	score	25-98	68.5 ± 15.2	Overall recovery status	Composite metric
`stress_level`	a.u.	0.1-0.95	0.42 ± 0.18	Physiological stress level	HRV-based estimate

2. Biomechanical Data (8 features)

Feature	Unit	Range	Mean ± SD	Description	Sensor Type
`muscle_activity`	μV	10-850	245.6 ± 127.3	Muscle activation level	Surface EMG
`joint_angles`	degrees	45-175	112.3 ± 28.4	Joint range of motion	IMU sensors (9-axis)
`gait_speed`	m/s	0.8-3.5	1.85 ± 0.52	Walking/running speed	Motion capture
`cadence`	steps/min	50-200	85.7 ± 22.1	Step frequency	Accelerometer
`step_count`	count	2000-15000	7823 ± 2341	Total steps per session	Pedometer
`jump_height`	meters	0.15-0.85	0.48 ± 0.14	Vertical jump performance	Force plate
`ground_reaction_force`	N	800-2800	1654 ± 387	Impact force during movement	Force plate
`range_of_motion`	degrees	60-180	124.5 ± 23.7	Joint flexibility	Goniometer

3. Environmental Factors (4 features)

Feature	Unit	Range	Mean ± SD	Description
`ambient_temperature`	°C	15-38	24.8 ± 5.3	Training environment temperature
`humidity`	%	30-85	58.3 ± 14.2	Air humidity level
`altitude`	meters	0-1200	285 ± 234	Training location elevation
`playing_surface`	categorical	0-4	-	Surface type (0=Grass, 1=Turf, 2=Indoor, 3=Track, 4=Other)

4. Workload Indicators (4 features)

Feature	Unit	Range	Mean ± SD	Description
`training_intensity`	RPE	2-10	6.4 ± 1.8	Perceived exertion level
`training_duration`	minutes	30-180	87.5 ± 28.3	Session duration
`training_load`	a.u.	150-1800	568 ± 287	Intensity × Duration
`fatigue_index`	score	15-85	48.3 ± 18.7	Cumulative fatigue measure

5. Metadata Columns (7 features)

Column	Type	Description
`athlete_id`	Integer	Unique athlete identifier (1-156)
`session_id`	Integer	Session number per athlete
`sport_type`	Categorical	Sport discipline (Soccer, Basketball, Track, Other)
`gender`	Categorical	Male (68%), Female (32%)
`age`	Integer	Athlete age in years (18-35, Mean: 24.3 ± 4.2)
`bmi`	Float	Body Mass Index (18.5-28.3, Mean: 23.1 ± 2.4)
`injury_occurred`	Integer	Target variable (see below)

🎯 Target Variable: `injury_occurred`

The dataset includes a 3-class target variable for injury risk prediction:

Class	Label	Count	Percentage	Description
0	Healthy	9,869	64.0%	No injury risk indicators
1	Low Risk	3,238	21.0%	Elevated fatigue or training load
2	High Risk/Injured	2,313	15.0%	Injury occurred or imminent risk

Imbalance Ratio: 4.27:1 (Majority:Minority)

Injury Definition: ...

h
tablebench-tqa
huggingface.co
Updated Oct 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
multimodal table benchmark (2025). tablebench-tqa [Dataset]. https://huggingface.co/datasets/table-benchmark/tablebench-tqa
Explore at:
Dataset updated
Oct 21, 2025
Dataset authored and provided by
multimodal table benchmark
Description
table-benchmark/tablebench-tqa dataset hosted on Hugging Face and contributed by the HF Datasets community

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI...

zenodo.org

csv, pdf, zip

Updated Oct 15, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Szymon Bobek; Szymon Bobek; Paloma Korycińska; Paloma Korycińska; Monika Krakowska; Monika Krakowska; Maciej Mozolewski; Maciej Mozolewski; Dorota Rak; Dorota Rak; Magdalena Zych; Magdalena Zych; Magdalena Wójcik; Magdalena Wójcik; Grzegorz J. Nalepa; Grzegorz J. Nalepa (2024). XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms [Dataset]. http://doi.org/10.5281/zenodo.11448395

Explore at:

csv, zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.11448395

Dataset updated

Oct 15, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

We present the dataset which was created during a user study on evaluation of explainability of artificial intelligence (AI) at the Jagielloninan University as a collaborative work of computer science (GEIST team) and information sciences research groups. The main goal of the research was to explore effective explanations of AI model patterns to diverse audiences.

The dataset contains material collected from 39 participants during the interviews conducted by the Information Sciences research group. The participants were recruited from 149 candidates to form three groups that represented domain experts in the field of mycology (DE), students with data science and visualization background (IT) and students from social sciences and humanities (SSH). Each group was given an explanation of a machine learning model trained to predict edible and non-edible mushrooms and asked to interpret the explanations and answer various questions during the interview. The machine learning model and explanations for its decision were prepared by the computer science research team.

The resulting dataset was constructed from the surveys obtained from the candidates, anonymized transcripts of the interviews, the results from thematic analysis, and original explanations with modifications suggested by the participants. The dataset is complemented with the source code allowing one to reproduce the initial machine leaning model and explanations.

The general structure of the dataset is described in the following table. The files that contain in their names [RR]_[SS]_[NN] contain the individual results obtained from particular participant. The meaning of the prefix is as follows:

RR - initials of the researcher conducting the interview,
SS - type of the participant (DE for domain expert, SSH for social sciences and humanities students, or IT for computer science students),
NN - number of the participant

File	Description
SURVEY.csv	The results from a survey that was filled by 149 participants out of which 39 were selected to form a final group of particiapnts.
CODEBOOK.csv	The codebook used in thematic analysis and MAXQDA coding
QUESTIONS.csv	List of questions that the participants were asked during interviews.
SLIDES.csv	List of slides used in the study with their interpretation and reference to MAXQDA themes and VISUAL_MODIFICATIONS tables.
MAXQDA_SUMMARY.csv	Summary of thematic analysis performed with codes used in CODEBOOK for each participant
PROBLEMS.csv	List of problems that participants were asked to solve during interviews. They correspond to three instances from the dataset that the participants had to classify using knowledge gained from explanations.
PROBLEMS_RESPONSES.csv	The responses to the problems for each participant to the problems listed in PROBLEMS.csv
VISUALIZATION_MODIFICATIONS.csv	Information on how the order of the slides was modified by the participant, which slides (explanations) were removed, and what kind of additional explanation was suggested.
ORIGINAL_VISUZALIZATIONS.pdf	The PDF file containing the visualization of explanations presented to the participants during the interviews
VISUALIZATION_MODIFICATIONS.zip	The PDF file containing the original slides from ORIGINAL_VISUZALIZATIONS.pdf with the modifications suggested by the participant. Each file is a PDF file named with the participant ID, i.e. [RR]_[SS]_[NN].pdf
TRANSCRIPTS.zip	The anonymized transcripts of interviews for each given participant, zipped into one archive. Each transcript is named after the particiapnt ID, i.e. [RR]_[SS]_[NN].csv and contains text tagged with slide number that it related to, question number from QUESTIONS.csv, and problem number from PROBLEMS.csv.

The detailed structure of the files presented in the previous Table is given in the Technical info section.

The source code used to train ML model and to generate explanations is available on Gitlab

Data from: UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban...

zenodo.org

zip

Updated Sep 3, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Debora Russo; Debora Russo (2025). UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban Scenarios [Dataset]. http://doi.org/10.5281/zenodo.16748119

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.16748119

Dataset updated

Sep 3, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Debora Russo; Debora Russo

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

🏙️ URMAT: URban MATerials Dataset

URMAT (Urban Materials Dataset) is a large-scale, multimodal synthetic dataset designed for training and benchmarking material-aware semantic segmentation, scene understanding, and electromagnetic wave simulation tasks in complex urban environments.

The dataset provides pixel-wise annotated images, depth maps, segmentation masks, physical material metadata, and aligned 3D point clouds, all derived from realistic 3D reconstructions of urban scenes including Trastevere, CityLife, Louvre, Canary Wharf, Bryggen, Siemensstadt, and Eixample.

🧱 Key Features

14 material classes: Brick, Glass, Steel, Tiles, Limestone, Plaster, Concrete, Wood, Cobblestone, Slate, Asphalt, Plastic, Gravel, Unknown.
Multimodal data: RGB, depth, material masks, mesh segmentation
Physically annotated metadata: includes permittivity, reflectance, attenuation
7 diverse European city districts, georeferenced and stylistically accurate
Precomputed point clouds for 3D analysis or downstream simulation
Compatible with Unreal Engine, PyTorch, and MATLAB pipelines

📁 Dataset Structure

At the root of the dataset:

*_mapping/ folders: mapping files, mesh metadata, camera poses
*_pointclouds/ folders: colored 3D point clouds with material labels
train/, val/, test/: standard splits for training and evaluation

Inside each split (train/, val/, test/):

Folder Name	Description
`rgb/`	RGB images rendered from Unreal Engine
`depth_png/`	Depth maps as grayscale `.png` (normalized for visualization)
`depth_npy/`	Raw depth arrays saved as `.npy`
`segmentation_material_png/`	Color-encoded material segmentation masks for visualization
`segmentation_material_npy/`	Material masks in `.npy` format (integer IDs per pixel, for training)
`segmentation_mesh/`	Optional masks identifying the mesh origin of each pixel
`metadata/`	JSON metadata with material type and physical properties per mesh

📦 Recommended Use Cases

Material-aware semantic segmentation
Scene-level reasoning for 3D reconstruction
Ray tracing and wireless signal propagation simulation
Urban AI and Smart City research
Synthetic-to-real generalization studies

📜 Citation

If you use URMAT v2 in your research, please cite the dataset.

Paper: "UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban Scenarios" (https://www.researchgate.net/publication/395193944_UR-MAT_A_Multimodal_Material-Aware_Synthetic_Dataset_of_Urban_Scenarios) - to appear in ACM Multimedia 2025, Dataset Track

Global Multimodal AI Models Market Research Report: By Application...

wiseguyreports.com

Updated Aug 23, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Multimodal AI Models Market Research Report: By Application (Healthcare, Finance, Retail, Transportation, Manufacturing), By Deployment Model (Cloud-based, On-premises, Hybrid), By End Use Industry (Automotive, Telecommunications, Education, Entertainment), By Model Type (Vision-Language Models, Audio-Visual Models, Text-Image Models) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/multimodal-ai-models-market

Explore at:

Dataset updated

Aug 23, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Aug 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	4.49(USD Billion)
MARKET SIZE 2025	5.59(USD Billion)
MARKET SIZE 2035	50.0(USD Billion)
SEGMENTS COVERED	Application, Deployment Model, End Use Industry, Model Type, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Technological advancements, Increasing data availability, Rising demand for automation, Enhancing user experience, Competitive landscape growth
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Adobe, OpenAI, Baidu, Microsoft, Google, C3.ai, Meta, Tencent, SAP, IBM, Amazon, Hugging Face, Alibaba, Salesforce, Nvidia
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Natural language processing integration, Enhanced personalization in services, Advanced healthcare applications, Smart automation in industries, Scalable cloud-based solutions
COMPOUND ANNUAL GROWTH RATE (CAGR)	24.5% (2025 - 2035)

Multimodal Defensive Communication Database (DefComm-DB)

zenodo.org

Updated May 30, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review (2023). Multimodal Defensive Communication Database (DefComm-DB) [Dataset]. http://doi.org/10.5281/zenodo.7706919

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.7706919

Dataset updated

May 30, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review; Anonymised for review

Description

Description: DefComm-DB comprises 261 genuine non-acted dialogues between English-speaking individuals in 'real-world' settings that feature one of the defensive behaviours outlined in Birkenbihl's model of communication failures [1]:

Attacking the conversation partner (class Attack): videos that depict individuals actively attacking verbally, blaming the other person, or shifting the other person's attention to themselves.
Withdrawing from the communication (class Flight): videos where people refuse to respond, withdraw from the conversation or change the topic or focus.
Making oneself greater (class Greater): videos that depict individuals boasting, self-justifying in an aggressive manner, denying accusations, exhibiting a sense of dominance or superiority, or expressing indignation.
Making oneself smaller (class Smaller): videos that display individuals engaging in self-deprecation, self-blame, exhibiting a sense of guilt, apologising, and expressing feelings of vulnerability or worthlessness.

[1] Birkenbihl, V. (2013). Kommunikationstraining: Zwischenmenschliche Beziehungen erfolgreich gestalten. Schritte 1–6. : mvg Verlag.

Key statistics on the dataset are provided in Table 1. DefComm features a variety of video topics, including interviews with celebrities and professional athletes, political debates, legal trials, TV shows, and video footage obtained by paparazzi, among others. The situations, number of participants, gender, age, and ethnicity vary from scene to scene.

From each video, we retrieve audio, visual, and textual modalities. In this paper, we focus on the audio modality and the speech transcriptions.

**Table 1: Statistics on Def-Comm:** number of video clips, mean duration (μ), standard deviation (σ), minimum, maximum, and total duration of collected videos per class.
Label	# video clips	μ [s]	σ [s]	min [s]	max [s]	Σ duration [s]
Attack	112	8	9	2	46	949
Flight	57	9	8	2	62	494
Greater	45	9	6	2	25	416
Smaller	47	12	8	3	49	556
Total	261	9	8	2	62	2415

f
Supplementary Table 2 from Long-term Multimodal Recording Reveals Epigenetic...
datasetcatalog.nlm.nih.gov
aacr.figshare.com
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Canale, Eleonora; Zemlyanskiy, Grigory; Ghirardi, Chiara; Vingiani, Andrea; Bonaldi, Tiziana; Phillips, Henry; Magnani, Luca; Bertolotti, Alessia; James, Chela; Győrffy, Balázs; Lynn, Claire; Noberini, Roberta; Barozzi, Iros; Sofyali, Emre; Rehman, Farah; Dewhurst, Hannah F.; Dhiman, Heena; Heide, Timon; Rosano, Dalia; Li, Tong; Ivanoiu, Diana; Sottoriva, Andrea; Saha, Debjani; Pruneri, Giancarlo; Slaven, Neil; Cresswell, George D. (2024). Supplementary Table 2 from Long-term Multimodal Recording Reveals Epigenetic Adaptation Routes in Dormant Breast Cancer Cells [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001424948
Explore at:
Dataset updated
May 1, 2024
Authors
Canale, Eleonora; Zemlyanskiy, Grigory; Ghirardi, Chiara; Vingiani, Andrea; Bonaldi, Tiziana; Phillips, Henry; Magnani, Luca; Bertolotti, Alessia; James, Chela; Győrffy, Balázs; Lynn, Claire; Noberini, Roberta; Barozzi, Iros; Sofyali, Emre; Rehman, Farah; Dewhurst, Hannah F.; Dhiman, Heena; Heide, Timon; Rosano, Dalia; Li, Tong; Ivanoiu, Diana; Sottoriva, Andrea; Saha, Debjani; Pruneri, Giancarlo; Slaven, Neil; Cresswell, George D.
Description
Coverage data for patient profiling

Facebook

Twitter

Click to copy link

Link copied

Cite

Annamaria Neag; SARAH HEALY (2022). Table of three selected data events: Assembling a multimodal analysis [Dataset]. http://doi.org/10.26188/6285b8ec5db6b

Table of three selected data events: Assembling a multimodal analysis

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.26188/6285b8ec5db6b

Dataset updated

May 19, 2022

Dataset provided by

The University of Melbourne

Authors

Annamaria Neag; SARAH HEALY

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Table of three selected data events showing initial analysis of multimodal elements that constituted the data, which were the:

Metadata, Social actors, Visual images (e.g., photo anaysis), Linguistic expressions of sentiment, Non-linguistic reactions, and 6) Broader social-economic-political relations

Clear search

Close search

Google apps

Main menu

Table of three selected data events: Assembling a multimodal analysis

Text-audio pairs (4 of 4)

TableBench

Multimodal Recommendation System Datasets

Quick start

Folders

Files per folder

Dataset stats

Modifications from the original source

Steam Dataset 2025: Multi-Modal Gaming Analytics

Steam Dataset 2025: Multi-Modal Gaming Analytics Platform

What Makes This Different

What's Included

Example Analysis: Published Notebooks (v1.0)

📊 Platform Evolution & Market Landscape

🔍 Semantic Game Discovery

🎯 The Semantic Fingerprint

Datasets for Evaluation of Multimodal Image Registration

Table 1_Machine learning prediction of anxiety symptoms in social anxiety...

MultiBanFakeDetect: Multimodal Bangla Fake News

MultiBanFakeDetect Dataset

📊 Statistical Overview – Types of Fake News

🏷️ Distribution by Labels

🌍 Statistical Overview – Categories of Fake News

MMSci_Table

Data from: ScientISST MOVE: Annotated Wearable Multimodal Biosignals...

For more information on background, methods and the acquisition protocol, refer to https://doi.org/10.13026/0ppk-ha30.

Text-audio pairs (1 of 4)

Visual-TableQA

The table presents the evaluation results of the selected formulas using the...

Multimodal Sports Injury Dataset

Multimodal Sports Injury Prediction Dataset

📊 Dataset Overview

🎯 Dataset Purpose

📁 Dataset Structure

Basic Information

🏷️ Feature Categories

1. Physiological Metrics (6 features)

2. Biomechanical Data (8 features)

3. Environmental Factors (4 features)

4. Workload Indicators (4 features)

5. Metadata Columns (7 features)

🎯 Target Variable: injury_occurred

tablebench-tqa

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI...

XAI-FUNGI: Dataset from the user study on comprehensibility of XAI algorithms

Data from: UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban...

🏙️ URMAT: URban MATerials Dataset

🧱 Key Features

📁 Dataset Structure

📦 Recommended Use Cases

📜 Citation

Global Multimodal AI Models Market Research Report: By Application...

Multimodal Defensive Communication Database (DefComm-DB)

Supplementary Table 2 from Long-term Multimodal Recording Reveals Epigenetic...

Table of three selected data events: Assembling a multimodal analysis

🎯 Target Variable: `injury_occurred`