ngqtrung/full-modality-data dataset hosted on Hugging Face and contributed by the HF Datasets community
The 2021-2022 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2021-2022 school year and the Fall 2022 semester, from August 2021 – December 2022. These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021. School learning modality types are defined as follows: In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels. Remote: Schools within the district do not offer face-to-face instruction; all learning is conducted online/remotely to all students at all available grade levels. Hybrid: Schools within the district offer a combination of in-person and remote learning; face-to-face instruction is offered less than 5 days per week, or only to a subset of students. Data Information School learning modality data provided here are model estimates using combined input data and are not guaranteed to be 100% accurate. This learning modality dataset was generated by combining data from four different sources: Burbio [1], MCH Strategic Data [2], the AEI/Return to Learn Tracker [3], and state dashboards [4-20]. These data were combined using a Hidden Markov model which infers the sequence of learning modalities (In-Person, Hybrid, or Remote) for each district that is most likely to produce the modalities reported by these sources. This model was trained using data from the 2020-2021 school year. Metadata describing the location, number of schools and number of students in each district comes from NCES [21]. You can read more about the model in the CDC MMWR: COVID-19–Related School Closures and Learning Modality Changes — United States, August 1–September 17, 2021. The metrics listed for each school learning modality reflect totals by district and the number of enrolled students per district for which data are available. School districts represented here exclude private schools and include the following NCES subtypes: Public school district that is NOT a component of a supervisory union Public school district that is a component of a supervisory union Independent charter district “BI” in the state column refers to school districts funded by the Bureau of Indian Education. Technical Notes Data from August 1, 2021 to June 24, 2022 correspond to the 2021-2022 school year. During this time frame, data from the AEI/Return to Learn Tracker and most state dashboards were not available. Inferred modalities with a probability below 0.6 were deemed inconclusive and were omitted. During the Fall 2022 semester, modalities for districts with a school closure reported by Burbio were updated to either “Remote”, if the closure spanned the entire week, or “Hybrid”, if the closure spanned 1-4 days of the week. Data from August
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contsin the raw fMRI data of a preregistered study. Dataset includes:
session pre 1. anat/ anatomical scans (T1-weighted images) for each subject 2. func/ whole-brain EPI data from all task runs (8x single task, 2x dual task, 1x resting state and 2x localizer task) 3. fmap/ fieldmaps with magnitude1, magnitude2 and phasediff
session post 2. func/ whole-brain EPI data from all task runs (8x single task, 2x dual task) 3. fmap/ fieldmaps with magnitude1, magnitude2 and phasediff
Please note, some participants did not complete the post session. We updated our consent form to get explicit permission to publish the individual data, although not all participants resigned the new version. Those participants are excluded here but part of the t-maps on neurovault (compare participants.tsv).
Tasks were always included either visual or/and auditory input and required either manual or/and vocal responses (visual+manual and auditory+vocal are modality compatible and visual+vocal and auditory+manual are modality incompatible). Tasks were presented as either single task, or dual task. Participants completed a practice intervention prior to session post in which one group worked for 80 minutes outside the scanner on modality incompatible dual-tasks, one on modality compatible dual-task and the third one paused for 80 min.
For exact tasks description and material and scripts, please see the preregistration: https://osf.io/whpz8
The 2020-2021 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2020-2021 school year, from August 2020 – June 2021. These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021. School learning modality types are defined as follows: In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels. Remote: Schools within the district do not offer face-to-face instruction; all learning is conducted online/remotely to all students at all available grade levels. Hybrid: Schools within the district offer a combination of in-person and remote learning; face-to-face instruction is offered less than 5 days per week, or only to a subset of students. Data Information School learning modality data provided here are model estimates using combined input data and are not guaranteed to be 100% accurate. This learning modality dataset was generated by combining data from four different sources: Burbio [1], MCH Strategic Data [2], the AEI/Return to Learn Tracker [3], and state dashboards [4-20]. These data were combined using a Hidden Markov model which infers the sequence of learning modalities (In-Person, Hybrid, or Remote) for each district that is most likely to produce the modalities reported by these sources. This model was trained using data from the 2020-2021 school year. Metadata describing the _location, number of schools and number of students in each district comes from NCES [21]. You can read more about the model in the CDC MMWR: COVID-19–Related School Closures and Learning Modality Changes — United States, August 1–September 17, 2021. The metrics listed for each school learning modality reflect totals by district and the number of enrolled students per district for which data are available. School districts represented here exclude private schools and include the following NCES subtypes: Public school district that is NOT a component of a supervisory union Public school district that is a component of a supervisory union Independent charter district “BI” in the state column refers to school districts funded by the Bureau of Indian Education. Technical Notes Data from September 1, 2020 to June 25, 2021 correspond to the 2020-2021 school year. During this timeframe, all four sources of data were available. Inferred modalities with a probability below 0.75 were deemed inconclusive and were omitted. Data for the month of July may show “In Person” status although most school districts are effectively closed during this time for summer break. Users may wish to exclude July data from use for this reason where applicable. Sources K-12 School Opening Tracker. Burbio 2021; https
https://api.github.com/licenses/mithttps://api.github.com/licenses/mit
Cross-modal retrieval takes one modality data as a query and retrieves semantically relevant data in another modality. Most existing cross-modal retrieval methods are designed for scenarios with complete modality data. However, in real-world applications, incomplete modality data often exists, which these methods struggle to handle effectively. In this paper, we propose a typical concept-driven modality-missing deep cross-modal retrieval model. Specifically, we first propose a multi-modal Transformer integrated with multi-modal pretraining networks, which can fully capture the multi-modal fine-grained semantic interaction in the incomplete modality data, extract multi-modal fusion semantics and construct cross-modal subspace, and at the same time supervise the learning process to generate typical concepts. In addition, the typical concepts are used as the cross-attention key and value to drive the training of the modal mapping network, so that it can adaptively preserve the implicit multi-modal semantic concepts of the query modality data, generate cross-modal retrieval features, and fully preserve the pre-extracted multi-modal fusion semantics. More information about the source code: https://gitee.com/MrSummer123/CPCMR
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of medical imaging files for use in the "Medical Image Processing with Python" lesson, developed by the Netherlands eScience Center.
The dataset includes:
These files represent various medical imaging modalities and formats commonly used in clinical research and practice. They are intended for educational purposes, allowing students to practice image processing techniques, machine learning applications, and statistical analysis of medical images using Python libraries such as scikit-image, pydicom, and SimpleITK.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the COVID-19 pandemic, many public schools across the United States shifted from fully in-person learning to alternative learning modalities such as hybrid and fully remote learning. In this study, data from 14,688 unique school districts from August 2020 to June 2021 were collected to track changes in the proportion of schools offering fully in-person, hybrid and fully remote learning over time. These data were provided by Burbio, MCH Strategic Data, the American Enterprise Institute’s Return to Learn Tracker and individual state dashboards. Because the modalities reported by these sources were incomplete and occasionally misaligned, a model was needed to combine and deconflict these data to provide a more comprehensive description of modalities nationwide. A hidden Markov model (HMM) was used to infer the most likely learning modality for each district on a weekly basis. This method yielded higher spatiotemporal coverage than any individual data source and higher agreement with three of the four data sources than any other single source. The model output revealed that the percentage of districts offering fully in-person learning rose from 40.3% in September 2020 to 54.7% in June of 2021 with increases across 45 states and in both urban and rural districts. This type of probabilistic model can serve as a tool for fusion of incomplete and contradictory data sources in order to obtain more reliable data in support of public health surveillance and research efforts.
Student enrollment data disaggregated by students from low-income families, students from each racial and ethnic group, gender, English learners, children with disabilities, children experiencing homelessness, children in foster care, and migratory students for each mode of instruction.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MixBench: A Benchmark for Mixed Modality Retrieval
MixBench is a benchmark for evaluating retrieval across text, images, and multimodal documents. It is designed to test how well retrieval models handle queries and documents that span different modalities, such as pure text, pure images, and combined image+text inputs. MixBench includes four subsets, each curated from a different data source:
MSCOCO Google_WIT VisualNews OVEN
Each subset contains:
queries.jsonl: each entry… See the full description on the dataset page: https://huggingface.co/datasets/mixed-modality-search/MixBench.
https://researchdata.ntu.edu.sg/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.21979/N9/I0HOYZhttps://researchdata.ntu.edu.sg/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.21979/N9/I0HOYZ
The proliferation of edge devices has generated an unprecedented volume of time series data across different domains, motivating various well-customized methods. Recently, Large Language Models (LLMs) have emerged as a new paradigm for time series analytics by leveraging the shared sequential nature of textual data and time series. However, a fundamental cross-modality gap between time series and LLMs exists, as LLMs are pre-trained on textual corpora and are not inherently optimized for time series. Many recent proposals are designed to address this issue. In this survey, we provide an up-to-date overview of LLMs-based cross-modality modeling for time series analytics. We first introduce a taxonomy that classifies existing approaches into four groups based on the type of textual data employed for time series modeling. We then summarize key cross-modality strategies, e.g., alignment and fusion, and discuss their applications across a range of downstream tasks. Furthermore, we conduct experiments on multimodal datasets from different application domains to investigate effective combinations of textual data and cross-modality strategies for enhancing time series analytics. Finally, we suggest several promising directions for future research. This survey is designed for a range of professionals, researchers, and practitioners interested in LLM-based time series modeling.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
anonymous052025/multimodal-modality-conflict-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Algorithms that classify hyper-scale multi-modal datasets, comprising of millions of images, into constituent modality types can help researchers quickly retrieve and classify diagnostic imaging data, accelerating clinical outcomes. This research aims to demonstrate that a deep neural network that is trained on a hyper-scale dataset (4.5 million images) composed of heterogeneous multi-modal data can be used to obtain significant modality classification accuracy (96%). By combining 102 medical imaging datasets, a dataset of 4.5 million images was created. A ResNet-50, ResNet-18, and VGG16 were trained to classify these images by the imaging modality used to capture them (Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), and X-ray) across many body locations. The classification accuracy of the models was then tested on unseen data. The best performing model achieved classification accuracy of 96% on unseen data, which is on-par, or exceeds the accuracy of more complex implementations using EfficientNets or Vision Transformers (ViTs). The model achieved a balanced accuracy of 86%. This research shows it is possible to train Deep Learning (DL) Convolutional Neural Networks (CNNs) with hyper-scale multimodal datasets, composed of millions of images. Such models can find use in real-world applications with volumes of image data in the hyper-scale range, such as medical imaging repositories, or national healthcare institutions. Further research can expand this classification capability to include 3D-scans.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mean strength ratings (0–5) for the disyllabic words, 95% confidence intervals, and standard deviations per perceptual modality and per dominant perceptual modality.
According to our latest research, the global Multi-Modal Imaging Data-Integration market size reached USD 1.67 billion in 2024. The market is expected to expand at a robust CAGR of 9.5% during the forecast period, reaching a projected value of USD 3.78 billion by 2033. This impressive growth is driven by the increasing demand for integrated imaging solutions in clinical diagnostics and research, as well as technological advancements in imaging modalities and data analytics platforms. As per our detailed analysis, the integration of multiple imaging modalities is revolutionizing the way healthcare professionals diagnose and treat complex diseases, offering comprehensive insights that single-modality imaging cannot provide.
One of the primary growth factors propelling the Multi-Modal Imaging Data-Integration market is the rising prevalence of chronic diseases such as cancer, cardiovascular disorders, and neurological conditions. These diseases often require precise and multifaceted diagnostic approaches, which multi-modal imaging excels at delivering. By combining data from modalities like MRI, CT, PET, and ultrasound, clinicians can achieve a more holistic view of patient pathology, leading to improved treatment planning and patient outcomes. Moreover, the increasing adoption of personalized medicine is further driving the need for integrated imaging data, as tailored therapeutic strategies rely heavily on accurate, multi-dimensional diagnostic information.
Another significant driver is the rapid technological evolution in both imaging hardware and software. Innovations such as artificial intelligence (AI) and machine learning are enabling more effective integration and interpretation of complex imaging datasets. Advanced integration techniques, including software-based and hybrid solutions, are making it feasible to seamlessly combine anatomical, functional, and molecular information from various imaging platforms. This technological leap is not only enhancing diagnostic precision but also reducing the time and cost associated with traditional, single-modality imaging workflows. The ongoing investment in research and development by both public and private sectors is ensuring a steady pipeline of improvements in multi-modal imaging data-integration.
The growing adoption of digital health solutions, including cloud-based imaging data repositories and telemedicine platforms, is also contributing to market expansion. Healthcare institutions are increasingly recognizing the value of integrated imaging data in facilitating remote consultations, multidisciplinary team discussions, and collaborative research. The shift toward value-based care models emphasizes outcomes and efficiency, making multi-modal data-integration an attractive proposition for hospitals, diagnostic centers, and research institutes. Additionally, regulatory support for interoperability and data standardization is gradually lowering barriers to adoption, fostering a more conducive environment for market growth.
From a regional perspective, North America continues to dominate the Multi-Modal Imaging Data-Integration market, accounting for the largest revenue share in 2024. This leadership is attributed to the region’s advanced healthcare infrastructure, high adoption rates of cutting-edge imaging technologies, and significant investments in healthcare IT. Europe follows closely, benefiting from robust government initiatives and a strong focus on research collaborations. The Asia Pacific region is emerging as the fastest-growing market, driven by expanding healthcare access, rising investments in medical technology, and an increasing burden of chronic diseases. Latin America and the Middle East & Africa, while currently holding smaller shares, are expected to witness steady growth due to improving healthcare systems and rising awareness of integrated imaging benefits.
The Imaging Modality segment forms the b
🇺🇸 미국 English The 2020-2021 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2020-2021 school year, from August 2020 – June 2021. These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021. School learning modality types are defined as follows: In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Covid-19 pandemic and school closures adversely affected adolescents’ mental health and wellbeing, with the weight of evidence indicating worse outcomes for students attending school remotely or in hybrid modality compared to fully in-person. We leverage survey data from the Adolescent Brain and Cognitive DevelopmentSM Study (ABCD Studyâ) collected from 6,245 adolescents (mean age = 13.2) during the 2020-2021 school year to investigate the moderating effects of race/ethnicity, household income, and neighborhood disadvantage on the relationship between 2020-2021 school modality and outcomes including perceived stress, sadness, and positive affect. For relatively-advantaged students, our results corroborate prior findings that students in remote or hybrid schooling report worse mental health outcomes than students who attended fully in-person. However, this pattern between schooling modality and mental health disappears or reverses for relatively-disadvantaged students. Given substantial within-group variation, these findings underscore the importance of considering varied student needs in developing mental health supports.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Here we provide pruned TCGA transcriptomics data from manuscript "MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention". Code is available at GitHub.
The TCGA [1] transcriptomics data were collected from Xena [2] and preprocessed using the proposed novel pipeline in MIRROR [3]. For raw transcriptomics data, we first apply RFE [4] with 5-fold cross-validation for each cohort to identify the most performant support set for the subtyping task. To enhance interpretability from a biological perspective, we manually incorporate genes associated with specific cancer subtypes based on the COSMIC database [5], resulting in a one-dimensional transcriptomics feature vector.
[1] K. Tomczak et al., “Review the cancer genome atlas (tcga): an immeasurable source of knowledge,” Contemporary Oncology/Wsp´ołczesna Onkologia, vol. 2015, no. 1, pp. 68–77, 2015.
[2] M. J. Goldman et al., “Visualizing and interpreting cancer genomics data via the xena platform,” Nat. Biotechnol., vol. 38, no. 6, pp. 675–678, 2020.
[3] Wang, Tianyi, et al. "MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention." arXiv preprint arXiv:2503.00374 (2025).
[4] I. Guyon et al., “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46, pp. 389–422, 2002.
[5] Z. Sondka et al., “Cosmic: a curated database of somatic variants and clinical data for cancer,” Nucleic Acids Research, vol. 52, no. D1, pp. D1210–D1217, 2024.
During a 2023 survey, people enrolled in face-to-face only educational modes spend on average more than those in online or mixed modalities. Around 48 percent of online only spent less than 2,000 Mexican pesos in tuition.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Linguistic annotation (including modality) of the Gospels in Ancient Greek and Latin
Contents
xml
: contains the XML-TEI dataset. There is a file for each language and Gospel. These files contain the linguistic annotation of modal passages (with three verses before and after each occurrence of a modal marker to provide context).
summary
: two tabular sheets with a summary of the annotation.
scripts
: XQuery code to create the tables presented in summary
and to align the semantic annotation in Greek and Latin with any other language available in the project Multilingual Bible Parallel Corpus.
to-be-aligned
: empty folder where you need to save the files downloaded from the Multilingual Bible Parallel Corpus in order to be aligned.
build.xml
: Apache Ant build file: read instructions to run it
https://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal
Rural Tourist Accommodation Occupancy Survey: Employed personnel by modality and month. National.
ngqtrung/full-modality-data dataset hosted on Hugging Face and contributed by the HF Datasets community