100+ datasets found

h
Multimodal-Mind2Web
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OSU NLP Group, Multimodal-Mind2Web [Dataset]. https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
OSU NLP Group
License
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Description
Dataset Summary

Multimodal-Mind2Web is the multimodal version of Mind2Web, a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. In this dataset, we align each HTML document in the dataset with its corresponding webpage screenshot image from the Mind2Web raw dump. This multimodal version addresses the inconvenience of loading images from the ~300GB Mind2Web Raw Dump.

Dataset… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web.
i
MultiModal dataset from Instragram
ieee-dataport.org
Updated May 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi Yang (2022). MultiModal dataset from Instragram [Dataset]. https://ieee-dataport.org/documents/multimodal-dataset-instragram
Explore at:
Dataset updated
May 18, 2022
Authors
Qi Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We collect almost 248
h
omega-multimodal
huggingface.co
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OMEGA Labs, Inc. (2025). omega-multimodal [Dataset]. https://huggingface.co/datasets/omegalabsinc/omega-multimodal
Explore at:
Dataset updated
Jun 22, 2025
Dataset authored and provided by
OMEGA Labs, Inc.
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
OMEGA Labs Bittensor Subnet: Multimodal Dataset for AGI Research

Introduction

The OMEGA Labs Bittensor Subnet Dataset is a groundbreaking resource for accelerating Artificial General Intelligence (AGI) research and development. This dataset, powered by the Bittensor decentralized network, aims to be the world's largest multimodal dataset, capturing the vast landscape of human knowledge and creation. With over 1 million hours of footage and 30 million+ 2-minute video… See the full description on the dataset page: https://huggingface.co/datasets/omegalabsinc/omega-multimodal.
p
Data from: A Multimodal Dataset for Investigating Working Memory in Presence...
physionet.org
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saman Khazaei; Srinidhi Parshi; Samiul Alam; Md Rafiul Amin; Rose T Faghih (2025). A Multimodal Dataset for Investigating Working Memory in Presence of Music [Dataset]. http://doi.org/10.13026/6vh4-dk68
Explore at:
Unique identifier
https://doi.org/10.13026/6vh4-dk68
Dataset updated
Feb 26, 2025
Authors
Saman Khazaei; Srinidhi Parshi; Samiul Alam; Md Rafiul Amin; Rose T Faghih
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
We present the accompanying dataset to the study "A Multimodal Dataset for Investigating Working Memory in Presenceof Music". The experiment is conducted with the aim of investigating the viability of music as an intervention to regulate cognitive arousal and performance states. We recorded the multimodal physiological signals and behavioral data during a working memory task called the n-back task while the background music was playing. We requested the participants to provide the music, and two types of music were employed with the calming and exciting content. The calming music was played during the first session of the experiment, and the exciting music was presented during the second session. Each session includes an equal number of 1-back and 3-back task blocks, where 22 trials are presentedwithin each task block. A total number of 16 task blocks are implemented in each session (8 blocks of 1-back task and 8 blocks of 3-back task). In this experiment,11 participants/subjects originally participated, while we removed participants/subjects with small modalities. The recorded signals are skin conductance (SC), electrocardiogram (ECG), skin surface temperature (SKT), respiration, photoplethysmography (PPG), functional near-infrared spectroscopy (fNIRS), electromyogram (EMG), de-identified facial expression scores, sequence of correct/incorrect responses, and reaction time.
g
Multimodal Sentiment Dataset
gts.ai
json
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Multimodal Sentiment Dataset [Dataset]. https://gts.ai/dataset-download/multimodal-sentiment-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Aug 20, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our Multimodal Sentiment Dataset, featuring 100 diverse classes of images and corresponding texts with sentiment labels. Ideal for AI-driven sentiment analysis, image classification, and multimodal fusion tasks.
Data from: A Multimodal Dataset for Mixed Emotion Recognition
zenodo.org
Updated May 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu; Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu (2024). A Multimodal Dataset for Mixed Emotion Recognition [Dataset]. http://doi.org/10.5281/zenodo.11194571
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11194571
Dataset updated
May 25, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu; Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu
Description
ABSTRACT: Mixed emotions have attracted increasing interest recently, but existing datasets rarely focus on mixed emotion recognition from multimodal signals, hindering the affective computing of mixed emotions. On this basis, we present a multimodal dataset with four kinds of signals recorded while watching mixed and non-mixed emotion videos. To ensure effective emotion induction, we first implemented a rule-based video filtering step to select the videos that could elicit stronger positive, negative, and mixed emotions. Then, an experiment with 80 participants was conducted, in which the data of EEG, GSR, PPG, and frontal face videos were recorded while they watched the selected video clips. We also recorded the subjective emotional rating on PANAS, VAD, and amusement-disgust dimensions. In total, the dataset consists of multimodal signal data and self-assessment data from 73 participants. We also present technical validations for emotion induction and mixed emotion classification from physiological signals and face videos. The average accuracy of the 3-class classification (i.e., positive, negative, and mixed) can reach 80.96\% when using SVM and features from all modalities, which indicates the possibility of identifying mixed emotional states.
h
synthetic-multiturn-multimodal
huggingface.co
Updated Jan 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mesolitica (2024). synthetic-multiturn-multimodal [Dataset]. https://huggingface.co/datasets/mesolitica/synthetic-multiturn-multimodal
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2024
Dataset authored and provided by
Mesolitica
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Multiturn Multimodal

We want to generate synthetic data that able to understand position and relationship between multi-images and multi-audio, example as below, All notebooks at https://github.com/mesolitica/malaysian-dataset/tree/master/chatbot/multiturn-multimodal

multi-images

synthetic-multi-images-relationship.jsonl, 100000 rows, 109MB. Images at https://huggingface.co/datasets/mesolitica/translated-LLaVA-Pretrain/tree/main

Example data

{'filename':… See the full description on the dataset page: https://huggingface.co/datasets/mesolitica/synthetic-multiturn-multimodal.
Mudestreda Multimodal Device State Recognition Dataset
zenodo.org
data.niaid.nih.gov
bin, pdf, png, zip
Updated Jul 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hubert Truchan; Hubert Truchan; Zahra Admadi; Zahra Admadi (2024). Mudestreda Multimodal Device State Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.8238653
Explore at:
zip, png, pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8238653
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hubert Truchan; Hubert Truchan; Zahra Admadi; Zahra Admadi
License
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Time period covered
Jan 24, 2024
Description
Mudestreda Multimodal Device State Recognition Dataset

obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes.

The official dataset used in the paper "Multimodal Isotropic Neural Architecture with Patch Embedding" ICONIP23.

Official repository: https://github.com/hubtru/Minape

Conference paper: https://link.springer.com/chapter/10.1007/978-981-99-8079-6_14

Mudestreda (MD) | Size 512 Samples (Instances, Observations)| Modalities 4 | Classes 3 |

Future research: Regression, Remaining Useful Life (RUL) estimation, Signal Drift detection, Anomaly Detection, Multivariate Time Series Prediction, and Feature Engineering.

Notice: Tables and images do not render properly.

Recommended: README.md includes the Mudestreda description and images Mudestreda.png and Mudestreda_Stage.png.

Data Overview

Task: Uni/Multi-Modal Classification

Domain: Industrial Flank Tool Wear of the Milling Machine

Input (sample): 4 Images: 1 Tool Image, 3 Spectrograms (X, Y, Z axis)

Output: Machine state classes: Sharp, Used, Dulled

Evaluation: Accuracies, Precision, Recal, F1-score, ROC curve

Each tool's wear is categorized sequentially: Sharp → Used → Dulled.

The dataset includes measurements from ten tools: T1 to T10.

Data splitting options include random or chronological distribution, without shuffling.

Options:

Original data or Augmented data

Random distribution or Tool Distribution (see Dataset Splitting)
i
Data from: S3E: A Multi-Robot Multimodal Dataset for Collaborative SLAM
ieee-dataport.org
Updated Aug 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dapeng Feng (2024). S3E: A Multi-Robot Multimodal Dataset for Collaborative SLAM [Dataset]. https://ieee-dataport.org/documents/s3e-multi-robot-multimodal-dataset-collaborative-slam
Explore at:
Dataset updated
Aug 12, 2024
Authors
Dapeng Feng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
the scalability and diversity of existing datasets for collaborative trajectories remain limited
P
PanCancer Multimodal Dataset
paperswithcode.com
Updated May 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aakash Tripathi; Asim Waqas; Matthew B. Schabath; Yasin Yilmaz; Ghulam Rasool (2024). PanCancer Multimodal Dataset [Dataset]. https://paperswithcode.com/dataset/pancancer-multimodal
Explore at:
Dataset updated
May 12, 2024
Authors
Aakash Tripathi; Asim Waqas; Matthew B. Schabath; Yasin Yilmaz; Ghulam Rasool
Description
Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset

The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, molecular, and slide images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models such as GatorTron, SeNMo, and UNI.

Curated by: Lab Rasool Language(s) (NLP): English

Uses

from datasets import load_dataset clinical_dataset = load_dataset("Lab-Rasool/TCGA", "clinical", split="train") pathology_report_dataset = load_dataset("Lab-Rasool/TCGA", "pathology_report", split="train") wsi_dataset = load_dataset("Lab-Rasool/TCGA", "wsi", split="train") molecular_dataset = load_dataset("Lab-Rasool/TCGA", "molecular", split="train")
h
multimodal-open-r1-8k-verified
huggingface.co
Updated Jan 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LMMs-Lab (2025). multimodal-open-r1-8k-verified [Dataset]. https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2025
Dataset authored and provided by
LMMs-Lab
Description
lmms-lab/multimodal-open-r1-8k-verified dataset hosted on Hugging Face and contributed by the HF Datasets community
p
A multimodal dental dataset facilitating machine learning research and...
physionet.org
Updated Oct 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenjing Liu; Yunyou Huang; Suqin Tang (2024). A multimodal dental dataset facilitating machine learning research and clinic services [Dataset]. http://doi.org/10.13026/h1tt-fc69
Explore at:
Unique identifier
https://doi.org/10.13026/h1tt-fc69
Dataset updated
Oct 11, 2024
Authors
Wenjing Liu; Yunyou Huang; Suqin Tang
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Oral diseases affect nearly 3.5 billion people, with the majority residing in low- and middle-income countries. Due to limited healthcare resources, many individuals are unable to access proper oral healthcare services. Image-based machine learning technology is one of the most promising approaches to improving oral healthcare services and reducing patient costs. Openly accessible datasets play a crucial role in facilitating the development of machine learning techniques. However, existing dental datasets have limitations such as a scarcity of Cone Beam Computed Tomography (CBCT) data, lack of matched multi-modal data, and insufficient complexity and diversity of the data. This project addresses these challenges by providing a dataset that includes 329 CBCT images from 169 patients, multi-modal data with matching modalities, and images representing various oral health conditions.
h
deepfashion-multimodal
huggingface.co
Updated Aug 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marqo (2024). deepfashion-multimodal [Dataset]. https://huggingface.co/datasets/Marqo/deepfashion-multimodal
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 23, 2024
Dataset authored and provided by
Marqo
Description
Disclaimer: We do not own this dataset. DeepFashion dataset is a public dataset which can be accessed through its website. This dataset was used to evaluate Marqo-FashionCLIP and Marqo-FashionSigLIP - see details below.

Marqo-FashionSigLIP Model Card

Marqo-FashionSigLIP leverages Generalised Contrastive Learning (GCL) which allows the model to be trained on not just text descriptions but also categories, style, colors, materials, keywords and fine-details to provide highly relevant… See the full description on the dataset page: https://huggingface.co/datasets/Marqo/deepfashion-multimodal.
OMuSense-23: A Multimodal dataset for contactless breathing pattern...
zenodo.org
application/gzip
Updated Jun 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Lage Cañellas; Manuel Lage Cañellas; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López (2025). OMuSense-23: A Multimodal dataset for contactless breathing pattern recognition and biometric analysis [Dataset]. http://doi.org/10.5281/zenodo.12705176
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12705176
Dataset updated
Jun 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Manuel Lage Cañellas; Manuel Lage Cañellas; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OMuSense-23 is a multimodal dataset for non-contact biometric and breathing analysis.
This database comprises RGBD and mmWave radar data collected from 50 participants.
The data capture process involves participants engaging in four breathing pattern activities
(normal breathing, reading, guided breathing, and breath holding to simulate apnea)
each one performed in three distinct static poses: standing (A), sitting (B), and lying down (C).

For citations please refer to the paper:
Manuel Lage Cañellas, Le Nguyen, Anirban Mukherjee, Constantino Álvarez Casado,
Xiaoting Wu, Nhi Nguyen, Praneeth Susarla, Sasan Sharifipour, Dinesh B. Jayagopi, Miguel Bordallo López,
"OmuSense-23: A Multimodal Dataset For Contactless Breathing Pattern Recognition And Biometric Analysis",
arXiv:2407.06137, 2024
P
Vi-Fi Multi-modal Dataset Dataset
paperswithcode.com
Updated Jan 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Vi-Fi Multi-modal Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/vi-fi-multi-modal-dataset
Explore at:
Dataset updated
Jan 3, 2023
Description
A large-scale multi-modal dataset to facilitate research and studies that concentrate on vision-wireless systems. The Vi-Fi dataset is a large-scale multi-modal dataset that consists of vision, wireless and smartphone motion sensor data of multiple participants and passer-by pedestrians in both indoor and outdoor scenarios. In Vi-Fi, vision modality includes RGB-D video from a mounted camera. Wireless modality comprises smartphone data from participants including WiFi FTM and IMU measurements.

The presence of Vi-Fi dataset facilitates and innovates multi-modal system research, especially, vision-wireless sensor data fusion, association and localization.

(Data collection was in accordance with IRB protocols and subject faces have been blurred for subject privacy.)
i
Chinese Multimodal Depression Corpus
ieee-dataport.org
Updated Nov 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bochao Zou (2022). Chinese Multimodal Depression Corpus [Dataset]. https://ieee-dataport.org/open-access/chinese-multimodal-depression-corpus
Explore at:
Dataset updated
Nov 29, 2022
Authors
Bochao Zou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
acoustic
E
WAT 2019 Hindi-English Multimodal Dataset
live.european-language-grid.eu
txt
Updated Dec 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). WAT 2019 Hindi-English Multimodal Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5160
Explore at:
txtAvailable download formats
Dataset updated
Dec 30, 2019
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset consists of multimodal English-to-Hindi translation. It inputs an image, rectangular region in the image and english caption. It outputs a caption in Hindi.
h
websight-5K-multimodal
huggingface.co
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2024). websight-5K-multimodal [Dataset]. https://huggingface.co/datasets/argilla/websight-5K-multimodal
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 25, 2024
Dataset authored and provided by
Argilla
Description
Dataset Card for websight-5K-multimodal

This dataset has been created with Argilla. It is a subset of 5000 records from the Websight collection, which is used for HTML/CSS code generation from an input image. Below you can see a screenshot of the UI from where annotators can work comfortably.

As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

Dataset… See the full description on the dataset page: https://huggingface.co/datasets/argilla/websight-5K-multimodal.
h
SWE-bench_Multimodal
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench_Multimodal [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_Multimodal
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
SWE-bench Multimodal

SWE-bench Multimodal is a dataset of 617 task instances that evalutes Language Models and AI Systems on their ability to resolve real world GitHub issues. To learn more about the dataset, please visit our website. More updates coming soon!
Multimodal AI Market Size, Share, Trends & Insights Report, 2035
rootsanalysis.com
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roots Analysis (2025). Multimodal AI Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/multimodal-ai-market
Explore at:
Dataset updated
May 15, 2025
Dataset provided by
Authors
Roots Analysis
License
https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html
Time period covered
2021 - 2031
Area covered
Global
Description
The multimodal AI market size is predicted to rise from $2.36 billion in 2024 to $93.99 billion by 2035, growing at a CAGR of 39.81% from 2024 to 2035.

Facebook

Twitter

Click to copy link

Link copied

Cite

OSU NLP Group, Multimodal-Mind2Web [Dataset]. https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web

Multimodal-Mind2Web

osunlp/Multimodal-Mind2Web

Explore at:

14 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset authored and provided by

OSU NLP Group

License

https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

Description

Dataset Summary

Multimodal-Mind2Web is the multimodal version of Mind2Web, a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. In this dataset, we align each HTML document in the dataset with its corresponding webpage screenshot image from the Mind2Web raw dump. This multimodal version addresses the inconvenience of loading images from the ~300GB Mind2Web Raw Dump.

  Dataset… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web.

Clear search

Close search

Google apps

Main menu

Multimodal-Mind2Web

MultiModal dataset from Instragram

omega-multimodal

Data from: A Multimodal Dataset for Investigating Working Memory in Presence...

Multimodal Sentiment Dataset

Data from: A Multimodal Dataset for Mixed Emotion Recognition

synthetic-multiturn-multimodal

Mudestreda Multimodal Device State Recognition Dataset

Mudestreda Multimodal Device State Recognition Dataset

Data Overview

Data from: S3E: A Multi-Robot Multimodal Dataset for Collaborative SLAM

PanCancer Multimodal Dataset

multimodal-open-r1-8k-verified

A multimodal dental dataset facilitating machine learning research and...

deepfashion-multimodal

OMuSense-23: A Multimodal dataset for contactless breathing pattern...

Vi-Fi Multi-modal Dataset Dataset

Chinese Multimodal Depression Corpus

WAT 2019 Hindi-English Multimodal Dataset

websight-5K-multimodal

SWE-bench_Multimodal

Multimodal AI Market Size, Share, Trends & Insights Report, 2035

Multimodal-Mind2Web

osunlp/Multimodal-Mind2Web