https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Dataset Summary
Multimodal-Mind2Web is the multimodal version of Mind2Web, a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. In this dataset, we align each HTML document in the dataset with its corresponding webpage screenshot image from the Mind2Web raw dump. This multimodal version addresses the inconvenience of loading images from the ~300GB Mind2Web Raw Dump.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We collect almost 248
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
OMEGA Labs Bittensor Subnet: Multimodal Dataset for AGI Research
Introduction
The OMEGA Labs Bittensor Subnet Dataset is a groundbreaking resource for accelerating Artificial General Intelligence (AGI) research and development. This dataset, powered by the Bittensor decentralized network, aims to be the world's largest multimodal dataset, capturing the vast landscape of human knowledge and creation. With over 1 million hours of footage and 30 million+ 2-minute video… See the full description on the dataset page: https://huggingface.co/datasets/omegalabsinc/omega-multimodal.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
We present the accompanying dataset to the study "A Multimodal Dataset for Investigating Working Memory in Presenceof Music". The experiment is conducted with the aim of investigating the viability of music as an intervention to regulate cognitive arousal and performance states. We recorded the multimodal physiological signals and behavioral data during a working memory task called the n-back task while the background music was playing. We requested the participants to provide the music, and two types of music were employed with the calming and exciting content. The calming music was played during the first session of the experiment, and the exciting music was presented during the second session. Each session includes an equal number of 1-back and 3-back task blocks, where 22 trials are presentedwithin each task block. A total number of 16 task blocks are implemented in each session (8 blocks of 1-back task and 8 blocks of 3-back task). In this experiment,11 participants/subjects originally participated, while we removed participants/subjects with small modalities. The recorded signals are skin conductance (SC), electrocardiogram (ECG), skin surface temperature (SKT), respiration, photoplethysmography (PPG), functional near-infrared spectroscopy (fNIRS), electromyogram (EMG), de-identified facial expression scores, sequence of correct/incorrect responses, and reaction time.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our Multimodal Sentiment Dataset, featuring 100 diverse classes of images and corresponding texts with sentiment labels. Ideal for AI-driven sentiment analysis, image classification, and multimodal fusion tasks.
ABSTRACT: Mixed emotions have attracted increasing interest recently, but existing datasets rarely focus on mixed emotion recognition from multimodal signals, hindering the affective computing of mixed emotions. On this basis, we present a multimodal dataset with four kinds of signals recorded while watching mixed and non-mixed emotion videos. To ensure effective emotion induction, we first implemented a rule-based video filtering step to select the videos that could elicit stronger positive, negative, and mixed emotions. Then, an experiment with 80 participants was conducted, in which the data of EEG, GSR, PPG, and frontal face videos were recorded while they watched the selected video clips. We also recorded the subjective emotional rating on PANAS, VAD, and amusement-disgust dimensions. In total, the dataset consists of multimodal signal data and self-assessment data from 73 participants. We also present technical validations for emotion induction and mixed emotion classification from physiological signals and face videos. The average accuracy of the 3-class classification (i.e., positive, negative, and mixed) can reach 80.96\% when using SVM and features from all modalities, which indicates the possibility of identifying mixed emotional states.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Multiturn Multimodal
We want to generate synthetic data that able to understand position and relationship between multi-images and multi-audio, example as below, All notebooks at https://github.com/mesolitica/malaysian-dataset/tree/master/chatbot/multiturn-multimodal
multi-images
synthetic-multi-images-relationship.jsonl, 100000 rows, 109MB. Images at https://huggingface.co/datasets/mesolitica/translated-LLaVA-Pretrain/tree/main
Example data
{'filename':… See the full description on the dataset page: https://huggingface.co/datasets/mesolitica/synthetic-multiturn-multimodal.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
<strong>README.md</strong>
includes the Mudestreda description and images <strong>Mudestreda.png</strong>
and <strong>Mudestreda_Stage.png</strong>
.Sharp
, Used
, Dulled
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the scalability and diversity of existing datasets for collaborative trajectories remain limited
Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset
The Cancer Genome Atlas (TCGA) Multimodal Dataset is a comprehensive collection of clinical data, pathology reports, molecular, and slide images for cancer patients. This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models such as GatorTron, SeNMo, and UNI.
Curated by: Lab Rasool Language(s) (NLP): English
Uses
from datasets import load_dataset
clinical_dataset = load_dataset("Lab-Rasool/TCGA", "clinical", split="train")
pathology_report_dataset = load_dataset("Lab-Rasool/TCGA", "pathology_report", split="train")
wsi_dataset = load_dataset("Lab-Rasool/TCGA", "wsi", split="train")
molecular_dataset = load_dataset("Lab-Rasool/TCGA", "molecular", split="train")
lmms-lab/multimodal-open-r1-8k-verified dataset hosted on Hugging Face and contributed by the HF Datasets community
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Oral diseases affect nearly 3.5 billion people, with the majority residing in low- and middle-income countries. Due to limited healthcare resources, many individuals are unable to access proper oral healthcare services. Image-based machine learning technology is one of the most promising approaches to improving oral healthcare services and reducing patient costs. Openly accessible datasets play a crucial role in facilitating the development of machine learning techniques. However, existing dental datasets have limitations such as a scarcity of Cone Beam Computed Tomography (CBCT) data, lack of matched multi-modal data, and insufficient complexity and diversity of the data. This project addresses these challenges by providing a dataset that includes 329 CBCT images from 169 patients, multi-modal data with matching modalities, and images representing various oral health conditions.
Disclaimer: We do not own this dataset. DeepFashion dataset is a public dataset which can be accessed through its website. This dataset was used to evaluate Marqo-FashionCLIP and Marqo-FashionSigLIP - see details below.
Marqo-FashionSigLIP Model Card
Marqo-FashionSigLIP leverages Generalised Contrastive Learning (GCL) which allows the model to be trained on not just text descriptions but also categories, style, colors, materials, keywords and fine-details to provide highly relevant… See the full description on the dataset page: https://huggingface.co/datasets/Marqo/deepfashion-multimodal.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OMuSense-23 is a multimodal dataset for non-contact biometric and breathing analysis.
This database comprises RGBD and mmWave radar data collected from 50 participants.
The data capture process involves participants engaging in four breathing pattern activities
(normal breathing, reading, guided breathing, and breath holding to simulate apnea)
each one performed in three distinct static poses: standing (A), sitting (B), and lying down (C).
For citations please refer to the paper:
Manuel Lage Cañellas, Le Nguyen, Anirban Mukherjee, Constantino Álvarez Casado,
Xiaoting Wu, Nhi Nguyen, Praneeth Susarla, Sasan Sharifipour, Dinesh B. Jayagopi, Miguel Bordallo López,
"OmuSense-23: A Multimodal Dataset For Contactless Breathing Pattern Recognition And Biometric Analysis",
arXiv:2407.06137, 2024
A large-scale multi-modal dataset to facilitate research and studies that concentrate on vision-wireless systems. The Vi-Fi dataset is a large-scale multi-modal dataset that consists of vision, wireless and smartphone motion sensor data of multiple participants and passer-by pedestrians in both indoor and outdoor scenarios. In Vi-Fi, vision modality includes RGB-D video from a mounted camera. Wireless modality comprises smartphone data from participants including WiFi FTM and IMU measurements.
The presence of Vi-Fi dataset facilitates and innovates multi-modal system research, especially, vision-wireless sensor data fusion, association and localization.
(Data collection was in accordance with IRB protocols and subject faces have been blurred for subject privacy.)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
acoustic
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset consists of multimodal English-to-Hindi translation. It inputs an image, rectangular region in the image and english caption. It outputs a caption in Hindi.
Dataset Card for websight-5K-multimodal
This dataset has been created with Argilla. It is a subset of 5000 records from the Websight collection, which is used for HTML/CSS code generation from an input image. Below you can see a screenshot of the UI from where annotators can work comfortably.
As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/argilla/websight-5K-multimodal.
SWE-bench Multimodal
SWE-bench Multimodal is a dataset of 617 task instances that evalutes Language Models and AI Systems on their ability to resolve real world GitHub issues. To learn more about the dataset, please visit our website. More updates coming soon!
https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html
The multimodal AI market size is predicted to rise from $2.36 billion in 2024 to $93.99 billion by 2035, growing at a CAGR of 39.81% from 2024 to 2035.
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Dataset Summary
Multimodal-Mind2Web is the multimodal version of Mind2Web, a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. In this dataset, we align each HTML document in the dataset with its corresponding webpage screenshot image from the Mind2Web raw dump. This multimodal version addresses the inconvenience of loading images from the ~300GB Mind2Web Raw Dump.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web.