13 datasets found
  1. h

    GeoQA-train-Vision-R1-cot-rewrite

    • huggingface.co
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fanxing Bu (2025). GeoQA-train-Vision-R1-cot-rewrite [Dataset]. https://huggingface.co/datasets/LoadingBFX/GeoQA-train-Vision-R1-cot-rewrite
    Explore at:
    Dataset updated
    Apr 21, 2025
    Authors
    Fanxing Bu
    Description

    Dataset Card for GeoQA-train-Vision-R1-cot-rewrite

    This dataset provides a rewritten version of the CoT (Chain-of-Thought) annotations for the GeoQA subset of the Vision-R1-cold dataset. It is designed to support efficient and structured multimodal reasoning with large language models.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The original Vision-R1 dataset, introduced in the paper Vision-R1: Reflective Multimodal Reasoning with Aha Moments, features detailed and… See the full description on the dataset page: https://huggingface.co/datasets/LoadingBFX/GeoQA-train-Vision-R1-cot-rewrite.

  2. h

    Vision-COT

    • huggingface.co
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sathish Kumar R (2024). Vision-COT [Dataset]. https://huggingface.co/datasets/pt-sk/Vision-COT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2024
    Authors
    Sathish Kumar R
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    pt-sk/Vision-COT dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    3D-CoT

    • huggingface.co
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanjun CHEN (2025). 3D-CoT [Dataset]. https://huggingface.co/datasets/Battam/3D-CoT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2025
    Authors
    Yanjun CHEN
    Description

    3D-CoT Benchmark: Chain-of-Thought Datasets for 3D Point Cloud-Language Models

      Overview
    

    The 3D-CoT Benchmark is a structured reasoning dataset designed explicitly to facilitate the systematic study of Chain-of-Thought (CoT)'s impact on 3D vision-language alignment. By extending existing 3D datasets with carefully structured reasoning annotations, this benchmark enables rigorous exploration of multimodal reasoning capabilities, significantly enhancing interpretability and… See the full description on the dataset page: https://huggingface.co/datasets/Battam/3D-CoT.

  4. ViT-CoT with different architecture sizes.

    • plos.figshare.com
    xls
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lalit Pandey; Donsuk Lee; Samantha M. W. Wood; Justin N. Wood (2024). ViT-CoT with different architecture sizes. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012600.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lalit Pandey; Donsuk Lee; Samantha M. W. Wood; Justin N. Wood
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    How do newborns learn to see? We propose that visual systems are space-time fitters, meaning visual development can be understood as a blind fitting process (akin to evolution) in which visual systems gradually adapt to the spatiotemporal data distributions in the newborn’s environment. To test whether space-time fitting is a viable theory for learning how to see, we performed parallel controlled-rearing experiments on newborn chicks and deep neural networks (DNNs), including CNNs and transformers. First, we raised newborn chicks in impoverished environments containing a single object, then simulated those environments in a video game engine. Second, we recorded first-person images from agents moving through the virtual animal chambers and used those images to train DNNs. Third, we compared the viewpoint-invariant object recognition performance of the chicks and DNNs. When DNNs received the same visual diet (training data) as chicks, the models developed common object recognition skills as chicks. DNNs that used time as a teaching signal—space-time fitters—also showed common patterns of successes and failures across the test viewpoints as chicks. Thus, DNNs can learn object recognition in the same impoverished environments as newborn animals. We argue that space-time fitters can serve as formal scientific models of newborn visual systems, providing image-computable models for studying how newborns learn to see from raw visual experiences.

  5. h

    LISA_Plus_COT

    • huggingface.co
    Updated Dec 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Senqiao Yang (2024). LISA_Plus_COT [Dataset]. https://huggingface.co/datasets/Senqiao/LISA_Plus_COT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2024
    Authors
    Senqiao Yang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model

    🤗Data | 📄Paper | 🚀Code | 💻Model | 🔥Citation

      Dataset Details
    

    Dataset type: The LISA++ COT dataset is a QA dataset designed to train MLLM models for Visual COT and reasoning segmentation, enhancing the model's global understanding ability. It is based on the COCO2017 dataset. Where to send questions or comments about the dataset: https://github.com/dvlab-research/LISAPaper:… See the full description on the dataset page: https://huggingface.co/datasets/Senqiao/LISA_Plus_COT.

  6. h

    WeThink-Multimodal-Reasoning-120K

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WeThink (2025). WeThink-Multimodal-Reasoning-120K [Dataset]. https://huggingface.co/datasets/WeThink/WeThink-Multimodal-Reasoning-120K
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    WeThink
    Description

    WeThink-Multimodal-Reasoning-120K

      Image Type
    

    Images data can be access from https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k

    Image Type Source Dataset Images

    General Images COCO 25,344

    SAM-1B 18,091

    Visual Genome 4,441

    GQA 3,251

    PISC 835

    LLaVA 134

    Text-Intensive Images TextVQA 25,483

    ShareTextVQA 538

    DocVQA 4,709

    OCR-VQA5,142

    ChartQA 21,781

    Scientific & Technical GeoQA+ 4,813

    ScienceQA 4,990

    AI2D 1,812

    CLEVR-Math 677… See the full description on the dataset page: https://huggingface.co/datasets/WeThink/WeThink-Multimodal-Reasoning-120K.

  7. h

    GameQA-140K

    • huggingface.co
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Game-RL (2025). GameQA-140K [Dataset]. https://huggingface.co/datasets/Code2Logic/GameQA-140K
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    Game-RL
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    1. Overview

    GameQA is a large-scale, diverse, and challenging multimodal reasoning dataset designed to enhance the general reasoning capabilities of Vision Language Models (VLMs). Generated using the innovative Code2Logic framework, it leverages game code to synthesize high-quality visual-language Chain-of-Thought (CoT) data. The dataset addresses the scarcity of multimodal reasoning data, critical for advancing complex multi-step reasoning in VLMs. Each sample includes visual game… See the full description on the dataset page: https://huggingface.co/datasets/Code2Logic/GameQA-140K.

  8. h

    ViC-Bench

    • huggingface.co
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LongCat (2025). ViC-Bench [Dataset]. https://huggingface.co/datasets/meituan-longcat/ViC-Bench
    Explore at:
    Dataset updated
    May 31, 2025
    Dataset authored and provided by
    LongCat
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    ViC-Bench

      About ViC-Bench
    

    Visual-Interleaved Chain-of-Thought (VI-CoT) enables MLLMs to continually update their understanding and decisions based on step-wise intermediate visual states (IVS), much like a human would, which demonstrates impressive success in various tasks, thereby leading to emerged advancements in related benchmarks. Despite promising progress, current benchmarks provide models with relatively fixed IVS, rather than free-style IVS, whch might forcibly… See the full description on the dataset page: https://huggingface.co/datasets/meituan-longcat/ViC-Bench.

  9. h

    SSRBench

    • huggingface.co
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Liu (2025). SSRBench [Dataset]. https://huggingface.co/datasets/yliu-cs/SSRBench
    Explore at:
    Dataset updated
    Jul 7, 2025
    Authors
    Yang Liu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    SSR-CoT and SSRBench: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

    Paper: https://arxiv.org/abs/2505.12448 Project Page: https://yliu-cs.github.io/SSR/ Code: https://github.com/yliu-cs/SSR

      Abstract
    

    Despite impressive advancements in Visual-Language Models (VLMs) for multi-modal tasks, their reliance on RGB inputs limits precise spatial understanding. Existing methods for integrating spatial cues, such as point clouds or depth… See the full description on the dataset page: https://huggingface.co/datasets/yliu-cs/SSRBench.

  10. h

    Athenea-VL

    • huggingface.co
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aquiles-ai (2025). Athenea-VL [Dataset]. https://huggingface.co/datasets/Aquiles-ai/Athenea-VL
    Explore at:
    Dataset updated
    Dec 2, 2025
    Dataset authored and provided by
    Aquiles-ai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Athenea-VL: Advanced Multimodal Reasoning Dataset (Aquiles-ai/Athenea-VL)

      Dataset Description
    

    Athenea-VL is a comprehensive multimodal reasoning dataset designed for training vision-language models on complex scientific and analytical tasks. This dataset combines high-quality visual content with Chain-of-Thought (CoT) reasoning, making it ideal for developing models capable of step-by-step problem-solving across multiple domains.

      Key Features
    

    20,913… See the full description on the dataset page: https://huggingface.co/datasets/Aquiles-ai/Athenea-VL.

  11. h

    UniVG-R1-data

    • huggingface.co
    Updated May 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMAP-ML (2025). UniVG-R1-data [Dataset]. https://huggingface.co/datasets/GD-ML/UniVG-R1-data
    Explore at:
    Dataset updated
    May 26, 2025
    Dataset authored and provided by
    AMAP-ML
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UniVG-R1 Model Card

      Model details
    

    We propose UniVG-R1, a reasoning-guided MLLM for universal visual grounding, which leverages reinforcement learning to enhance reasoning across complex multi-image and multi-modal scenarios.

      Dataset details
    

    We provide three JSON files as follows:

    revised_MIG_bench.json: which contains our revised version of the MIG_bench. stage1_cotsft.json: which contains the CoT-SFT data required for stage 1. stage2_rl.json: which… See the full description on the dataset page: https://huggingface.co/datasets/GD-ML/UniVG-R1-data.

  12. h

    ReasonGen-R1-RL-Geneval-12k

    • huggingface.co
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Franklin Zhang (2025). ReasonGen-R1-RL-Geneval-12k [Dataset]. https://huggingface.co/datasets/Franklin0/ReasonGen-R1-RL-Geneval-12k
    Explore at:
    Dataset updated
    Jun 5, 2025
    Authors
    Franklin Zhang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is the RL dataset for the paper: "ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL". ReasonGen-R1 is a two-stage framework that imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales. It then refines its outputs using Group Relative Policy Optimization (GRPO). This dataset contains the model-crafted rationales paired with visual prompts… See the full description on the dataset page: https://huggingface.co/datasets/Franklin0/ReasonGen-R1-RL-Geneval-12k.

  13. h

    Grammer_dataset

    • huggingface.co
    Updated Oct 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shawn (2025). Grammer_dataset [Dataset]. https://huggingface.co/datasets/csfufu/Grammer_dataset
    Explore at:
    Dataset updated
    Oct 9, 2025
    Authors
    Shawn
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🌟 ReVisual-R1 (7B) — Open-Source Multimodal Reasoner

    One cold-start, two RL stages, endless reasoning power.

      🔑 Highlights
    

    SOTA on 9 tough benchmarks covering visual–math + text reasoning.

    Three-Stage SRO Training

    Text Cold-Start — seed deep reflection Multimodal RL — align vision & logic Text RL — polish fluency & brevity

    PAD (Prioritized Advantage Distillation) keeps gradients alive.

    Efficient-Length Reward = concise, self-reflective CoT.

      📚… See the full description on the dataset page: https://huggingface.co/datasets/csfufu/Grammer_dataset.
    
  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fanxing Bu (2025). GeoQA-train-Vision-R1-cot-rewrite [Dataset]. https://huggingface.co/datasets/LoadingBFX/GeoQA-train-Vision-R1-cot-rewrite

GeoQA-train-Vision-R1-cot-rewrite

LoadingBFX/GeoQA-train-Vision-R1-cot-rewrite

Explore at:
Dataset updated
Apr 21, 2025
Authors
Fanxing Bu
Description

Dataset Card for GeoQA-train-Vision-R1-cot-rewrite

This dataset provides a rewritten version of the CoT (Chain-of-Thought) annotations for the GeoQA subset of the Vision-R1-cold dataset. It is designed to support efficient and structured multimodal reasoning with large language models.

  Dataset Details





  Dataset Description

The original Vision-R1 dataset, introduced in the paper Vision-R1: Reflective Multimodal Reasoning with Aha Moments, features detailed and… See the full description on the dataset page: https://huggingface.co/datasets/LoadingBFX/GeoQA-train-Vision-R1-cot-rewrite.

Search
Clear search
Close search
Google apps
Main menu