7 datasets found
  1. h

    MMVP

    • huggingface.co
    Updated Jan 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MultiModal Visual Patterns (2024). MMVP [Dataset]. https://huggingface.co/datasets/MMVP/MMVP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2024
    Dataset authored and provided by
    MultiModal Visual Patterns
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MMVP Benchmark Datacard

      Basic Information
    

    Title: MMVP Benchmark Description: The MMVP (Multimodal Visual Patterns) Benchmark focuses on identifying “CLIP-blind pairs” – images that are perceived as similar by CLIP despite having clear visual differences. MMVP benchmarks the performance of state-of-the-art systems, including GPT-4V, across nine basic visual patterns. It highlights the challenges these systems face in answering straightforward questions, often leading to… See the full description on the dataset page: https://huggingface.co/datasets/MMVP/MMVP.

  2. P

    MMVP-VLM Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengbang Tong; Zhuang Liu; Yuexiang Zhai; Yi Ma; Yann Lecun; Saining Xie (2024). MMVP-VLM Dataset [Dataset]. https://paperswithcode.com/dataset/mmvp-vlm
    Explore at:
    Dataset updated
    Feb 19, 2024
    Authors
    Shengbang Tong; Zhuang Liu; Yuexiang Zhai; Yi Ma; Yann Lecun; Saining Xie
    Description

    The MMVP-VLM (Multimodal Visual Patterns - Visual Language Models) Benchmark is specifically designed to systematically evaluate the performance of recent CLIP-based models in understanding and processing visual patterns. Let's delve into the details:

    Purpose: The MMVP-VLM Benchmark aims to assess how well CLIP models can match image-text combinations that represent distinct visual patterns. It distills a subset of questions from the original MMVP benchmark into simpler language descriptions, categorizing them into different visual patterns. Each visual pattern is represented by 15 text-image pairs.

    Dataset Composition:

    Text-Image Pairs: The benchmark includes a balanced number of questions for each visual pattern, with each pattern represented by 15 pairs. These pairs are a subset of the MMVP benchmark, supplemented with additional questions for balance. Visual Patterns: The questions cover various visual patterns, allowing evaluation of CLIP models' ability to understand and process these patterns.

    Insights and Limitations: By assessing whether CLIP models can accurately match the provided image-text combinations, the MMVP-VLM Benchmark provides insights into the capabilities and limitations of these models.

  3. P

    MMVP Dataset

    • paperswithcode.com
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengbang Tong; Zhuang Liu; Yuexiang Zhai; Yi Ma; Yann Lecun; Saining Xie (2023). MMVP Dataset [Dataset]. https://paperswithcode.com/dataset/musicqa-dataset
    Explore at:
    Dataset updated
    Aug 26, 2023
    Authors
    Shengbang Tong; Zhuang Liu; Yuexiang Zhai; Yi Ma; Yann Lecun; Saining Xie
    Description

    The MMVP (Multimodal Visual Patterns) Benchmark focuses on identifying "CLIP-blind pairs" – images that appear similar to the CLIP model despite having clear visual differences. These patterns highlight the challenges these systems face in answering straightforward questions, often leading to incorrect responses and hallucinated explanations.

  4. h

    MMVP

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mao Song, MMVP [Dataset]. https://huggingface.co/datasets/MaoSong2022/MMVP
    Explore at:
    Authors
    Mao Song
    Description

    MMVP Benchmark

    refactor MMVP to support VLMEalKit

      Benchmark Information
    

    number of questions: 300 question type: multiple choice question question format: image + text

      Reference
    

    VLMEvalKit MMVP

  5. h

    RSMMVP

    • huggingface.co
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Image and Video Understanding Lab (2025). RSMMVP [Dataset]. https://huggingface.co/datasets/IVUlab/RSMMVP
    Explore at:
    Dataset updated
    Feb 27, 2025
    Dataset authored and provided by
    Image and Video Understanding Lab
    Description

    Remote Sensing MMVP

    This dataset follows a similar procedure to the original MMVP benchmark on natural images but directed towards the remote sensing domain. Challenging visual patterns are identified based on CLIP blind pairs, accompanied with the correpsonding questions, options and ground-truth answer.

      Authors:
    

    Abduljaleel Adejumo*, Faegheh Yeganli*, Clifford Broni-Bediako, Aoran Xiao, Naoto Yokoya**, Mennatullah Siam**

    Symbol:* denotes equal contribution Symbol:**… See the full description on the dataset page: https://huggingface.co/datasets/IVUlab/RSMMVP.

  6. P

    CV-Bench Dataset

    • library.toponeai.link
    • paperswithcode.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengbang Tong; Ellis Brown; Penghao Wu; Sanghyun Woo; Manoj Middepogu; Sai Charitha Akula; Jihan Yang; Shusheng Yang; Adithya Iyer; Xichen Pan; Ziteng Wang; Rob Fergus; Yann Lecun; Saining Xie (2024). CV-Bench Dataset [Dataset]. https://library.toponeai.link/dataset/cv-bench
    Explore at:
    Dataset updated
    Jun 25, 2024
    Authors
    Shengbang Tong; Ellis Brown; Penghao Wu; Sanghyun Woo; Manoj Middepogu; Sai Charitha Akula; Jihan Yang; Shusheng Yang; Adithya Iyer; Xichen Pan; Ziteng Wang; Rob Fergus; Yann Lecun; Saining Xie
    Description

    The Cambrian Vision-Centric Benchmark (CV-Bench) is designed to address the limitations of existing vision-centric benchmarks by providing a comprehensive evaluation framework for multimodal large language models (MLLMs). With 2,638 manually-inspected examples, CV-Bench significantly surpasses other vision-centric MLLM benchmarks, offering 3.5 times more examples than RealWorldQA and 8.8 times more than MMVP.

    Motivation and Content Summary:

    CV-Bench repurposes standard vision benchmarks such as ADE20K, COCO, and Omni3D to assess models on classic vision tasks within a multimodal context. Leveraging the rich ground truth annotations from these benchmarks, natural language questions are formulated to probe the fundamental 2D and 3D understanding of models.

    Potential Use Cases:

    Evaluating the spatial relationship and object counting capabilities of models (2D understanding). Assessing the depth order and relative distance understanding of models (3D understanding). Benchmarking the performance of multimodal models in both vision-specific and cross-modal tasks.

    Dataset Characteristics:

    2D Understanding Tasks: Spatial Relationship: Determine the relative position of an object with respect to the anchor object, considering left-right or top-bottom relationships.

    Object Count: Determine the number of instances present in the image.

    3D Understanding Tasks:

    Depth Order: Determine which of the two distinct objects is closer to the camera. Relative Distance: Determine which of the two distinct objects is closer to the anchor object.

    TypeTaskDescriptionSources# Samples
    2DSpatial RelationshipDetermine the relative position of an object w.r.t. the anchor object.ADE20K, COCO650
    2DObject CountDetermine the number of instances present in the image.ADE20K, COCO788
    3DDepth OrderDetermine which of the two distinct objects is closer to the camera.Omni3D600
    3DRelative DistanceDetermine which of the two distinct objects is closer to the anchor object.Omni3D600

    Curation Process:

    Questions for each task are programmatically constructed and then manually inspected to ensure clarity and accuracy. Any unclear, ambiguous, or erroneous questions are removed to maintain the benchmark's reliability.

  7. h

    VStar_Bench

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shukang Yin (2025). VStar_Bench [Dataset]. https://huggingface.co/datasets/xjtupanda/VStar_Bench
    Explore at:
    Dataset updated
    Jun 1, 2025
    Authors
    Shukang Yin
    Description

    V* Benchmark

    refactor V* Benchmark to add support for VLMEalKit.

      Benchmark Information
    

    Number of questions: 191 Question type: MCQ (multiple choice question) Question format: image + text

      Reference
    

    VLMEvalKit V* Benchmark MMVP

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
MultiModal Visual Patterns (2024). MMVP [Dataset]. https://huggingface.co/datasets/MMVP/MMVP

MMVP

MMVP/MMVP

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2024
Dataset authored and provided by
MultiModal Visual Patterns
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

MMVP Benchmark Datacard

  Basic Information

Title: MMVP Benchmark Description: The MMVP (Multimodal Visual Patterns) Benchmark focuses on identifying “CLIP-blind pairs” – images that are perceived as similar by CLIP despite having clear visual differences. MMVP benchmarks the performance of state-of-the-art systems, including GPT-4V, across nine basic visual patterns. It highlights the challenges these systems face in answering straightforward questions, often leading to… See the full description on the dataset page: https://huggingface.co/datasets/MMVP/MMVP.

Search
Clear search
Close search
Google apps
Main menu