100+ datasets found
  1. h

    Multimodal-Mind2Web

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OSU NLP Group, Multimodal-Mind2Web [Dataset]. https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    OSU NLP Group
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    Dataset Summary

    Multimodal-Mind2Web is the multimodal version of Mind2Web, a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. In this dataset, we align each HTML document in the dataset with its corresponding webpage screenshot image from the Mind2Web raw dump. This multimodal version addresses the inconvenience of loading images from the ~300GB Mind2Web Raw Dump.

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web.
    
  2. i

    MultiModal dataset from Instragram

    • ieee-dataport.org
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qi Yang (2022). MultiModal dataset from Instragram [Dataset]. https://ieee-dataport.org/documents/multimodal-dataset-instragram
    Explore at:
    Dataset updated
    May 18, 2022
    Authors
    Qi Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We collect almost 248

  3. h

    omega-multimodal

    • huggingface.co
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OMEGA Labs, Inc. (2025). omega-multimodal [Dataset]. https://huggingface.co/datasets/omegalabsinc/omega-multimodal
    Explore at:
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    OMEGA Labs, Inc.
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    OMEGA Labs Bittensor Subnet: Multimodal Dataset for AGI Research

      Introduction
    

    The OMEGA Labs Bittensor Subnet Dataset is a groundbreaking resource for accelerating Artificial General Intelligence (AGI) research and development. This dataset, powered by the Bittensor decentralized network, aims to be the world's largest multimodal dataset, capturing the vast landscape of human knowledge and creation. With over 1 million hours of footage and 30 million+ 2-minute video… See the full description on the dataset page: https://huggingface.co/datasets/omegalabsinc/omega-multimodal.

  4. p

    Data from: A Multimodal Dataset for Investigating Working Memory in Presence...

    • physionet.org
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saman Khazaei; Srinidhi Parshi; Samiul Alam; Md Rafiul Amin; Rose T Faghih (2025). A Multimodal Dataset for Investigating Working Memory in Presence of Music [Dataset]. http://doi.org/10.13026/6vh4-dk68
    Explore at:
    Dataset updated
    Feb 26, 2025
    Authors
    Saman Khazaei; Srinidhi Parshi; Samiul Alam; Md Rafiul Amin; Rose T Faghih
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    We present the accompanying dataset to the study "A Multimodal Dataset for Investigating Working Memory in Presenceof Music". The experiment is conducted with the aim of investigating the viability of music as an intervention to regulate cognitive arousal and performance states. We recorded the multimodal physiological signals and behavioral data during a working memory task called the n-back task while the background music was playing. We requested the participants to provide the music, and two types of music were employed with the calming and exciting content. The calming music was played during the first session of the experiment, and the exciting music was presented during the second session. Each session includes an equal number of 1-back and 3-back task blocks, where 22 trials are presentedwithin each task block. A total number of 16 task blocks are implemented in each session (8 blocks of 1-back task and 8 blocks of 3-back task). In this experiment,11 participants/subjects originally participated, while we removed participants/subjects with small modalities. The recorded signals are skin conductance (SC), electrocardiogram (ECG), skin surface temperature (SKT), respiration, photoplethysmography (PPG), functional near-infrared spectroscopy (fNIRS), electromyogram (EMG), de-identified facial expression scores, sequence of correct/incorrect responses, and reaction time.

  5. g

    Multimodal Sentiment Dataset

    • gts.ai
    json
    Updated Aug 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Multimodal Sentiment Dataset [Dataset]. https://gts.ai/dataset-download/multimodal-sentiment-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore our Multimodal Sentiment Dataset, featuring 100 diverse classes of images and corresponding texts with sentiment labels. Ideal for AI-driven sentiment analysis, image classification, and multimodal fusion tasks.

  6. Data from: A Multimodal Dataset for Mixed Emotion Recognition

    • zenodo.org
    Updated May 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu; Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu (2024). A Multimodal Dataset for Mixed Emotion Recognition [Dataset]. http://doi.org/10.5281/zenodo.11194571
    Explore at:
    Dataset updated
    May 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu; Pei Yang; Niqi Liu; Xinge Liu; Yezhi Shu; Wenqi Ji; Ziqi Ren; Jenny Sheng; Minjing Yu; Ran Yi; Dan Zhang; Yong-Jin Liu
    Description

    ABSTRACT: Mixed emotions have attracted increasing interest recently, but existing datasets rarely focus on mixed emotion recognition from multimodal signals, hindering the affective computing of mixed emotions. On this basis, we present a multimodal dataset with four kinds of signals recorded while watching mixed and non-mixed emotion videos. To ensure effective emotion induction, we first implemented a rule-based video filtering step to select the videos that could elicit stronger positive, negative, and mixed emotions. Then, an experiment with 80 participants was conducted, in which the data of EEG, GSR, PPG, and frontal face videos were recorded while they watched the selected video clips. We also recorded the subjective emotional rating on PANAS, VAD, and amusement-disgust dimensions. In total, the dataset consists of multimodal signal data and self-assessment data from 73 participants. We also present technical validations for emotion induction and mixed emotion classification from physiological signals and face videos. The average accuracy of the 3-class classification (i.e., positive, negative, and mixed) can reach 80.96\% when using SVM and features from all modalities, which indicates the possibility of identifying mixed emotional states.

  7. h

    synthetic-multiturn-multimodal

    • huggingface.co
    Updated Jan 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mesolitica (2024). synthetic-multiturn-multimodal [Dataset]. https://huggingface.co/datasets/mesolitica/synthetic-multiturn-multimodal
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2024
    Dataset authored and provided by
    Mesolitica
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Multiturn Multimodal

    We want to generate synthetic data that able to understand position and relationship between multi-images and multi-audio, example as below, All notebooks at https://github.com/mesolitica/malaysian-dataset/tree/master/chatbot/multiturn-multimodal

      multi-images
    

    synthetic-multi-images-relationship.jsonl, 100000 rows, 109MB. Images at https://huggingface.co/datasets/mesolitica/translated-LLaVA-Pretrain/tree/main

      Example data
    

    {'filename':… See the full description on the dataset page: https://huggingface.co/datasets/mesolitica/synthetic-multiturn-multimodal.

  8. Mudestreda Multimodal Device State Recognition Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, pdf, png, zip
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hubert Truchan; Hubert Truchan; Zahra Admadi; Zahra Admadi (2024). Mudestreda Multimodal Device State Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.8238653
    Explore at:
    zip, png, pdf, binAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hubert Truchan; Hubert Truchan; Zahra Admadi; Zahra Admadi
    License

    https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html

    Time period covered
    Jan 24, 2024
    Description

    Mudestreda Multimodal Device State Recognition Dataset

    obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes.


    The official dataset used in the paper "Multimodal Isotropic Neural Architecture with Patch Embedding" ICONIP23.
    Official repository: https://github.com/hubtru/Minape
    Mudestreda (MD) | Size 512 Samples (Instances, Observations)| Modalities 4 | Classes 3 |
    Future research: Regression, Remaining Useful Life (RUL) estimation, Signal Drift detection, Anomaly Detection, Multivariate Time Series Prediction, and Feature Engineering.
    Notice: Tables and images do not render properly.
    Recommended: <strong>README.md</strong> includes the Mudestreda description and images <strong>Mudestreda.png</strong> and <strong>Mudestreda_Stage.png</strong>.

    Data Overview

    • Task: Uni/Multi-Modal Classification
    • Domain: Industrial Flank Tool Wear of the Milling Machine
    • Input (sample): 4 Images: 1 Tool Image, 3 Spectrograms (X, Y, Z axis)
    • Output: Machine state classes: Sharp, Used, Dulled
    • Evaluation: Accuracies, Precision, Recal, F1-score, ROC curve
    • Each tool's wear is categorized sequentially: Sharp → Used → Dulled.
    • The dataset includes measurements from ten tools: T1 to T10.
    • Data splitting options include random or chronological distribution, without shuffling.
    • Options:

  9. i

    Data from: S3E: A Multi-Robot Multimodal Dataset for Collaborative SLAM

    • ieee-dataport.org
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dapeng Feng (2024). S3E: A Multi-Robot Multimodal Dataset for Collaborative SLAM [Dataset]. https://ieee-dataport.org/documents/s3e-multi-robot-multimodal-dataset-collaborative-slam
    Explore at:
    Dataset updated
    Aug 12, 2024
    Authors
    Dapeng Feng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    the scalability and diversity of existing datasets for collaborative trajectories remain limited

  10. P

    PanCancer Multimodal Dataset

    • paperswithcode.com
    Updated May 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aakash Tripathi; Asim Waqas; Matthew B. Schabath; Yasin Yilmaz; Ghulam Rasool (2024). PanCancer Multimodal Dataset [Dataset]. https://paperswithcode.com/dataset/pancancer-multimodal
    Explore at:
    Dataset updated
    May 12, 2024
    Authors
    Aakash Tripathi; Asim Waqas; Matthew B. Schabath; Yasin Yilmaz; Ghulam Rasool
    Description

    Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset

    The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, molecular, and slide images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models such as GatorTron, SeNMo, and UNI.

    Curated by: Lab Rasool Language(s) (NLP): English

    Uses

    from datasets import load_dataset
    
    clinical_dataset = load_dataset("Lab-Rasool/TCGA", "clinical", split="train")
    pathology_report_dataset = load_dataset("Lab-Rasool/TCGA", "pathology_report", split="train")
    wsi_dataset = load_dataset("Lab-Rasool/TCGA", "wsi", split="train")
    molecular_dataset = load_dataset("Lab-Rasool/TCGA", "molecular", split="train")
    
  11. h

    multimodal-open-r1-8k-verified

    • huggingface.co
    Updated Jan 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMMs-Lab (2025). multimodal-open-r1-8k-verified [Dataset]. https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2025
    Dataset authored and provided by
    LMMs-Lab
    Description

    lmms-lab/multimodal-open-r1-8k-verified dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. p

    A multimodal dental dataset facilitating machine learning research and...

    • physionet.org
    Updated Oct 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenjing Liu; Yunyou Huang; Suqin Tang (2024). A multimodal dental dataset facilitating machine learning research and clinic services [Dataset]. http://doi.org/10.13026/h1tt-fc69
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Wenjing Liu; Yunyou Huang; Suqin Tang
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Oral diseases affect nearly 3.5 billion people, with the majority residing in low- and middle-income countries. Due to limited healthcare resources, many individuals are unable to access proper oral healthcare services. Image-based machine learning technology is one of the most promising approaches to improving oral healthcare services and reducing patient costs. Openly accessible datasets play a crucial role in facilitating the development of machine learning techniques. However, existing dental datasets have limitations such as a scarcity of Cone Beam Computed Tomography (CBCT) data, lack of matched multi-modal data, and insufficient complexity and diversity of the data. This project addresses these challenges by providing a dataset that includes 329 CBCT images from 169 patients, multi-modal data with matching modalities, and images representing various oral health conditions.

  13. h

    deepfashion-multimodal

    • huggingface.co
    Updated Aug 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marqo (2024). deepfashion-multimodal [Dataset]. https://huggingface.co/datasets/Marqo/deepfashion-multimodal
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2024
    Dataset authored and provided by
    Marqo
    Description

    Disclaimer: We do not own this dataset. DeepFashion dataset is a public dataset which can be accessed through its website. This dataset was used to evaluate Marqo-FashionCLIP and Marqo-FashionSigLIP - see details below.

      Marqo-FashionSigLIP Model Card
    

    Marqo-FashionSigLIP leverages Generalised Contrastive Learning (GCL) which allows the model to be trained on not just text descriptions but also categories, style, colors, materials, keywords and fine-details to provide highly relevant… See the full description on the dataset page: https://huggingface.co/datasets/Marqo/deepfashion-multimodal.

  14. OMuSense-23: A Multimodal dataset for contactless breathing pattern...

    • zenodo.org
    application/gzip
    Updated Jun 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Lage Cañellas; Manuel Lage Cañellas; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López (2025). OMuSense-23: A Multimodal dataset for contactless breathing pattern recognition and biometric analysis [Dataset]. http://doi.org/10.5281/zenodo.12705176
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Manuel Lage Cañellas; Manuel Lage Cañellas; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López; Le Nguyen; Anirban Mukherjee; Constantino Álvarez Casado; Xiaoting Wu; Nhi Nguyen; Praneeth Susarla; Sasan Sharifipour; Dinesh B. Jayagopi; Miguel Bordallo López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OMuSense-23 is a multimodal dataset for non-contact biometric and breathing analysis.
    This database comprises RGBD and mmWave radar data collected from 50 participants.
    The data capture process involves participants engaging in four breathing pattern activities
    (normal breathing, reading, guided breathing, and breath holding to simulate apnea)
    each one performed in three distinct static poses: standing (A), sitting (B), and lying down (C).

    For citations please refer to the paper:
    Manuel Lage Cañellas, Le Nguyen, Anirban Mukherjee, Constantino Álvarez Casado,
    Xiaoting Wu, Nhi Nguyen, Praneeth Susarla, Sasan Sharifipour, Dinesh B. Jayagopi, Miguel Bordallo López,
    "OmuSense-23: A Multimodal Dataset For Contactless Breathing Pattern Recognition And Biometric Analysis",
    arXiv:2407.06137, 2024

  15. P

    Vi-Fi Multi-modal Dataset Dataset

    • paperswithcode.com
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Vi-Fi Multi-modal Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/vi-fi-multi-modal-dataset
    Explore at:
    Dataset updated
    Jan 3, 2023
    Description

    A large-scale multi-modal dataset to facilitate research and studies that concentrate on vision-wireless systems. The Vi-Fi dataset is a large-scale multi-modal dataset that consists of vision, wireless and smartphone motion sensor data of multiple participants and passer-by pedestrians in both indoor and outdoor scenarios. In Vi-Fi, vision modality includes RGB-D video from a mounted camera. Wireless modality comprises smartphone data from participants including WiFi FTM and IMU measurements.

    The presence of Vi-Fi dataset facilitates and innovates multi-modal system research, especially, vision-wireless sensor data fusion, association and localization.

    (Data collection was in accordance with IRB protocols and subject faces have been blurred for subject privacy.)

  16. i

    Chinese Multimodal Depression Corpus

    • ieee-dataport.org
    Updated Nov 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bochao Zou (2022). Chinese Multimodal Depression Corpus [Dataset]. https://ieee-dataport.org/open-access/chinese-multimodal-depression-corpus
    Explore at:
    Dataset updated
    Nov 29, 2022
    Authors
    Bochao Zou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    acoustic

  17. E

    WAT 2019 Hindi-English Multimodal Dataset

    • live.european-language-grid.eu
    txt
    Updated Dec 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). WAT 2019 Hindi-English Multimodal Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5160
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 30, 2019
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset consists of multimodal English-to-Hindi translation. It inputs an image, rectangular region in the image and english caption. It outputs a caption in Hindi.

  18. h

    websight-5K-multimodal

    • huggingface.co
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2024). websight-5K-multimodal [Dataset]. https://huggingface.co/datasets/argilla/websight-5K-multimodal
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2024
    Dataset authored and provided by
    Argilla
    Description

    Dataset Card for websight-5K-multimodal

    This dataset has been created with Argilla. It is a subset of 5000 records from the Websight collection, which is used for HTML/CSS code generation from an input image. Below you can see a screenshot of the UI from where annotators can work comfortably.

    As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/argilla/websight-5K-multimodal.
    
  19. h

    SWE-bench_Multimodal

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group, SWE-bench_Multimodal [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_Multimodal
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Princeton NLP group
    Description

    SWE-bench Multimodal

    SWE-bench Multimodal is a dataset of 617 task instances that evalutes Language Models and AI Systems on their ability to resolve real world GitHub issues. To learn more about the dataset, please visit our website. More updates coming soon!

  20. Multimodal AI Market Size, Share, Trends & Insights Report, 2035

    • rootsanalysis.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roots Analysis (2025). Multimodal AI Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/multimodal-ai-market
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset provided by
    Authors
    Roots Analysis
    License

    https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html

    Time period covered
    2021 - 2031
    Area covered
    Global
    Description

    The multimodal AI market size is predicted to rise from $2.36 billion in 2024 to $93.99 billion by 2035, growing at a CAGR of 39.81% from 2024 to 2035.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OSU NLP Group, Multimodal-Mind2Web [Dataset]. https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web

Multimodal-Mind2Web

osunlp/Multimodal-Mind2Web

Explore at:
14 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
OSU NLP Group
License

https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

Description

Dataset Summary

Multimodal-Mind2Web is the multimodal version of Mind2Web, a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. In this dataset, we align each HTML document in the dataset with its corresponding webpage screenshot image from the Mind2Web raw dump. This multimodal version addresses the inconvenience of loading images from the ~300GB Mind2Web Raw Dump.

  Dataset… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web.
Search
Clear search
Close search
Google apps
Main menu