HealthGPT : A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
Tianwei Lin1, Wenqiao Zhang1, Sijing Li1, Yuqian Yuan1, Binhe Yu2, Haoyuan Li3, Wanggui He3, Hao Jiang3,
Mengze Li4, Xiaohui Song1, Siliang Tang1, Jun Xiao1, Hui Lin1, Yueting Zhuang1, Beng Chin Ooi5
1Zhejiang University, 2University of Electronic Science and Technology of China, 3Alibaba, 4The Hong Kong University of Science and Technology, 5National… See the full description on the dataset page: https://huggingface.co/datasets/acthuan/LLaVA-Med.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BUAADreamer/llava-med-zh-eval dataset hosted on Hugging Face and contributed by the HF Datasets community
LLaVA-Med-60K-IM-text
This dataset is a text format of llava_med_instruct_60k_inline_mention.json. We built this dataset using the Meta-Llama-3-70B-Instruct, and the instruction we used is: Rewrite the question-answer pairs into a paragraph format (Do not use the words 'question' and 'answer' in your responses):.
PMC articles that failed to download are excluded. Non-medical images (e.g., diagrams) are excluded in an automatic way. Despite these efforts, this dataset is not… See the full description on the dataset page: https://huggingface.co/datasets/myeongkyunkang/LLaVA-Med-60K-IM-text.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Augmented Reality Large Language Model Medical Teaching System integrates Augmented Reality with LLaVA-Med, a medical multimodal large language model based on LLaVA and specifically designed for biomedical applications, employing QLoRA to advance medical education. Deployed on resource-constrained AR devices, such as INMO Air2 glasses, ARLMT overlays real-time visual annotations and textual feedback on medical scenarios to create an immersive and interactive learning environment. Key advancements include a 66\% reduction in memory footprint (from 15.2 GB to 5.1 GB) through QLoRA, enabling efficient operation without compromising performance, and an average response time of 1.009 seconds across various medical imaging categories, surpassing the GPT-4 baseline in both speed and accuracy. The system achieves 98.3% diagnostic accuracy, demonstrating its reliability in real-time applications. By combining visual and textual elements, ARLMT enhances comprehension of complex medical concepts, providing a scalable, real-time solution that bridges technological innovation and pedagogical needs in medical training.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
LLaVA-Rad MIMIC-CXR features more accurate section extractions from MIMIC-CXR free-text radiology reports. Traditionally, rule-based methods were used to extract sections such as the reason for exam, findings, and impression. However, these approaches often fail due to inconsistencies in report structure and clinical language. In this work, we leverage GPT-4 to extract these sections more reliably, adding 237,073 image-text pairs to the training split and 1,952 pairs to the validation split. This enhancement afforded the development and fine-tuning of LLaVA-Rad, a multimodal large language model (LLM) tailored for radiology applications, achieving improved performance on report generation tasks.
This resource is provided to support reproducibility and for the benefit of the research community, enabling further exploration in vision–language modeling. For more details, please refer to the accompanying paper [1].
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🫁 CXR-10K Reasoning Dataset
A dataset of 10,000 chest X-ray images paired with step-by-step clinical reasoning and radiology impression summaries, curated for training and evaluating medical vision-language models like MedGEMMA, LLaVA-Med, and others.
📂 Dataset Structure
This dataset is saved in Arrow format and was built using the Hugging Face datasets library. Each sample includes:
image: Chest X-ray image (PNG or JPEG) reasoning: Step-wise radiological reasoning in… See the full description on the dataset page: https://huggingface.co/datasets/Manusinhh/cxr-10k-reasoning-dataset.
http://www.companywall.rs/Home/Licencehttp://www.companywall.rs/Home/Licence
Ovaj skup podataka uključuje finansijske izvještaje, račune i blokade, te nekretnine. Podaci uključuju prihode, rashode, dobit, imovinu, obaveze i informacije o nekretninama u vlasništvu kompanije. Finansijski podaci, finansijski sažetak, sažetak kompanije, preduzetnik, zanatlija, udruženje, poslovni subjekti.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
HealthGPT : A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
Tianwei Lin1, Wenqiao Zhang1, Sijing Li1, Yuqian Yuan1, Binhe Yu2, Haoyuan Li3, Wanggui He3, Hao Jiang3,
Mengze Li4, Xiaohui Song1, Siliang Tang1, Jun Xiao1, Hui Lin1, Yueting Zhuang1, Beng Chin Ooi5
1Zhejiang University, 2University of Electronic Science and Technology of China, 3Alibaba, 4The Hong Kong University of Science and Technology, 5National… See the full description on the dataset page: https://huggingface.co/datasets/acthuan/LLaVA-Med.