6 datasets found
  1. h

    realworldqa

    • huggingface.co
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirajan Dhakal (2024). realworldqa [Dataset]. https://huggingface.co/datasets/nirajandhakal/realworldqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 13, 2024
    Authors
    Nirajan Dhakal
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    Real World QA Dataset

    This is a benchmark dataset released by xAI under CC-by-nd-4.0 license along with Grok-1.5 Vision Announcement. This benchmark is designed to evaluate basic real-world spatial understanding capabilities of multimodal models. While many of the examples in the current benchmark are relatively easy for humans, they often pose a challenge for frontier models. This release of the RealWorldQA consists of 765 images, with a question and easily verifiable answer for… See the full description on the dataset page: https://huggingface.co/datasets/nirajandhakal/realworldqa.

  2. h

    RealWorldQA

    • huggingface.co
    Updated Oct 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMMs-Lab (2024). RealWorldQA [Dataset]. https://huggingface.co/datasets/lmms-lab/RealWorldQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2024
    Dataset authored and provided by
    LMMs-Lab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    lmms-lab/RealWorldQA dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    realworldqa-subquestions

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratham, realworldqa-subquestions [Dataset]. https://huggingface.co/datasets/yobro4619/realworldqa-subquestions
    Explore at:
    Authors
    Pratham
    Description

    yobro4619/realworldqa-subquestions dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    realworldqa-incorrect-samples

    • huggingface.co
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratham (2024). realworldqa-incorrect-samples [Dataset]. https://huggingface.co/datasets/yobro4619/realworldqa-incorrect-samples
    Explore at:
    Dataset updated
    Apr 13, 2024
    Authors
    Pratham
    Description

    yobro4619/realworldqa-incorrect-samples dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. P

    CV-Bench Dataset

    • paperswithcode.com
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengbang Tong; Ellis Brown; Penghao Wu; Sanghyun Woo; Manoj Middepogu; Sai Charitha Akula; Jihan Yang; Shusheng Yang; Adithya Iyer; Xichen Pan; Ziteng Wang; Rob Fergus; Yann Lecun; Saining Xie (2024). CV-Bench Dataset [Dataset]. https://paperswithcode.com/dataset/cv-bench
    Explore at:
    Dataset updated
    Jun 23, 2024
    Authors
    Shengbang Tong; Ellis Brown; Penghao Wu; Sanghyun Woo; Manoj Middepogu; Sai Charitha Akula; Jihan Yang; Shusheng Yang; Adithya Iyer; Xichen Pan; Ziteng Wang; Rob Fergus; Yann Lecun; Saining Xie
    Description

    The Cambrian Vision-Centric Benchmark (CV-Bench) is designed to address the limitations of existing vision-centric benchmarks by providing a comprehensive evaluation framework for multimodal large language models (MLLMs). With 2,638 manually-inspected examples, CV-Bench significantly surpasses other vision-centric MLLM benchmarks, offering 3.5 times more examples than RealWorldQA and 8.8 times more than MMVP.

    Motivation and Content Summary:

    CV-Bench repurposes standard vision benchmarks such as ADE20K, COCO, and Omni3D to assess models on classic vision tasks within a multimodal context. Leveraging the rich ground truth annotations from these benchmarks, natural language questions are formulated to probe the fundamental 2D and 3D understanding of models.

    Potential Use Cases:

    Evaluating the spatial relationship and object counting capabilities of models (2D understanding). Assessing the depth order and relative distance understanding of models (3D understanding). Benchmarking the performance of multimodal models in both vision-specific and cross-modal tasks.

    Dataset Characteristics:

    2D Understanding Tasks: Spatial Relationship: Determine the relative position of an object with respect to the anchor object, considering left-right or top-bottom relationships.

    Object Count: Determine the number of instances present in the image.

    3D Understanding Tasks:

    Depth Order: Determine which of the two distinct objects is closer to the camera. Relative Distance: Determine which of the two distinct objects is closer to the anchor object.

    TypeTaskDescriptionSources# Samples
    2DSpatial RelationshipDetermine the relative position of an object w.r.t. the anchor object.ADE20K, COCO650
    2DObject CountDetermine the number of instances present in the image.ADE20K, COCO788
    3DDepth OrderDetermine which of the two distinct objects is closer to the camera.Omni3D600
    3DRelative DistanceDetermine which of the two distinct objects is closer to the anchor object.Omni3D600

    Curation Process:

    Questions for each task are programmatically constructed and then manually inspected to ensure clarity and accuracy. Any unclear, ambiguous, or erroneous questions are removed to maintain the benchmark's reliability.

  6. h

    object-detection-bench

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JigsawStack (2025). object-detection-bench [Dataset]. https://huggingface.co/datasets/JigsawStack/object-detection-bench
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    JigsawStack
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Object Detection Bench

    This dataset is a customized version of the RealworldQA dataset, specifically tailored for object detection and segmentation benchmarking tasks.

      Dataset Description
    

    This benchmark dataset contains real-world images with questions, answers, and custom prompts designed for evaluating object detection and segmentation models. Each sample includes:

    Image: Real-world photographs Question: Original question about the image content Answer: Ground truth… See the full description on the dataset page: https://huggingface.co/datasets/JigsawStack/object-detection-bench.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nirajan Dhakal (2024). realworldqa [Dataset]. https://huggingface.co/datasets/nirajandhakal/realworldqa

realworldqa

RealWorldQA

nirajandhakal/realworldqa

Explore at:
92 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2024
Authors
Nirajan Dhakal
License

Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically

Description

Real World QA Dataset

This is a benchmark dataset released by xAI under CC-by-nd-4.0 license along with Grok-1.5 Vision Announcement. This benchmark is designed to evaluate basic real-world spatial understanding capabilities of multimodal models. While many of the examples in the current benchmark are relatively easy for humans, they often pose a challenge for frontier models. This release of the RealWorldQA consists of 765 images, with a question and easily verifiable answer for… See the full description on the dataset page: https://huggingface.co/datasets/nirajandhakal/realworldqa.

Search
Clear search
Close search
Google apps
Main menu