31 datasets found
  1. h

    RefCOCO

    • huggingface.co
    Updated Jun 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMMs-Lab (2024). RefCOCO [Dataset]. https://huggingface.co/datasets/lmms-lab/RefCOCO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2024
    Dataset authored and provided by
    LMMs-Lab
    Description

    Large-scale Multi-modality Models Evaluation Suite

    Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

    🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

      This Dataset
    

    This is a formatted version of RefCOCO. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @inproceedings{kazemzadeh-etal-2014-referitgame, title = "{R}efer{I}t{G}ame: Referring to Objects in… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/RefCOCO.

  2. T

    ref_coco

    • tensorflow.org
    • opendatalab.com
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ref_coco [Dataset]. https://www.tensorflow.org/datasets/catalog/ref_coco
    Explore at:
    Dataset updated
    May 31, 2024
    Description

    A collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.

    RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.

    Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".

    Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):

    datasetpartitionsplitrefsimages
    refcocogoogletrain4000019213
    refcocogoogleval50004559
    refcocogoogletest50004527
    refcocounctrain4240416994
    refcocouncval38111500
    refcocounctestA1975750
    refcocounctestB1810750
    refcoco+unctrain4227816992
    refcoco+uncval38051500
    refcoco+unctestA1975750
    refcoco+unctestB1798750
    refcocoggoogletrain4482224698
    refcocoggoogleval50004650
    refcocogumdtrain4222621899
    refcocogumdval25731300
    refcocogumdtest50232600

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ref_coco', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/ref_coco-refcoco_unc-1.1.0.png" alt="Visualization" width="500px">

  3. h

    refcoco-m

    • huggingface.co
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    moondream (2025). refcoco-m [Dataset]. https://huggingface.co/datasets/moondream/refcoco-m
    Explore at:
    Dataset updated
    Nov 18, 2025
    Dataset authored and provided by
    moondream
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    RefCOCO-M: Refined Referring Expression Segmentation

    RefCOCO has long been a standard benchmark for referring expression segmentation, but it has two major issues: poor mask quality and harmful referring expressions. Modern models now produce masks that are more accurate than the ground-truth annotations, which makes RefCOCO an imprecise measure of segmentation quality. RefCOCO-M is a cleaned version of the RefCOCO (UNC) validation split. We replace the original instance masks with… See the full description on the dataset page: https://huggingface.co/datasets/moondream/refcoco-m.

  4. t

    RefCOCO - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). RefCOCO - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/refcoco
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.

  5. h

    refcoco

    • huggingface.co
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linkangheng (2024). refcoco [Dataset]. https://huggingface.co/datasets/Kangheng/refcoco
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2024
    Authors
    Linkangheng
    Description

    Kangheng/refcoco dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    RefCOCO-M

    • huggingface.co
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Voxel51 (2025). RefCOCO-M [Dataset]. https://huggingface.co/datasets/Voxel51/RefCOCO-M
    Explore at:
    Dataset updated
    Nov 18, 2025
    Dataset authored and provided by
    Voxel51
    Description

    Dataset Card for RefCOCO-M

    This is a FiftyOne dataset with 1190 samples.

      Installation
    

    If you haven't already, install FiftyOne: pip install -U fiftyone

      Usage
    

    import fiftyone as fo from fiftyone.utils.huggingface import load_from_hub

    Load the dataset

    Note: other available arguments include 'max_samples', etc

    dataset = load_from_hub("Voxel51/RefCOCO-M")

    Launch the App

    session = fo.launch_app(dataset)

      Dataset Details
    
    
    
    
    
    
    
      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/RefCOCO-M.
    
  7. t

    RefCOCO dataset - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). RefCOCO dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/refcoco-dataset
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The authors used the RefCOCO dataset, a large-scale dataset for object detection and scene understanding, to train and evaluate their models.

  8. h

    RefCOCO

    • huggingface.co
    Updated Oct 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PaDT (2025). RefCOCO [Dataset]. https://huggingface.co/datasets/PaDT-MLLM/RefCOCO
    Explore at:
    Dataset updated
    Oct 9, 2025
    Authors
    PaDT
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs

    [🔗 Released Code] [🤗 Datasets] [🤗 Checkpoints] [📄 Tech Report] [🤗 Paper]

    Figure A. PaDT pipeline.

      🌟 Introduction
    

    We are pleased to introduce Patch-as-Decodable Token (PaDT), a unified paradigm that enables multimodal large language models (MLLMs) to directly generate both textual and visual outputs.At the core of PaDT are Visual Reference Tokens (VRTs). Unlike conventional MLLMs that represent… See the full description on the dataset page: https://huggingface.co/datasets/PaDT-MLLM/RefCOCO.

  9. r

    RefCOCO, RefCOCO+, and RefCOCOg

    • resodate.org
    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yucheng Suo; Linchao Zhu; Yi Yang (2024). RefCOCO, RefCOCO+, and RefCOCOg [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvcmVmY29jby0tcmVmY29jby0tLWFuZC1yZWZjb2NvZw==
    Explore at:
    Dataset updated
    Dec 2, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Yucheng Suo; Linchao Zhu; Yi Yang
    Description

    Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.

  10. h

    refcoco

    • huggingface.co
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VDebugger (2024). refcoco [Dataset]. https://huggingface.co/datasets/VDebugger/refcoco
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2024
    Dataset authored and provided by
    VDebugger
    Description

    VDebugger/refcoco dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. r

    RefCOCO, RefCOCO+, Flickr30k

    • resodate.org
    • service.tib.eu
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Jain; Nikolaos Gkanatsios; Ishita Mediratta; Katerina Fragkiadaki (2024). RefCOCO, RefCOCO+, Flickr30k [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvcmVmY29jby0tcmVmY29jby0tLWZsaWNrcjMwaw==
    Explore at:
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Ayush Jain; Nikolaos Gkanatsios; Ishita Mediratta; Katerina Fragkiadaki
    Description

    RefCOCO, RefCOCO+, Flickr30k

  12. h

    refcoco-m-metadata

    • huggingface.co
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quentin Lhoest (2025). refcoco-m-metadata [Dataset]. https://huggingface.co/datasets/lhoestq/refcoco-m-metadata
    Explore at:
    Dataset updated
    Nov 20, 2025
    Authors
    Quentin Lhoest
    Description

    lhoestq/refcoco-m-metadata dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    refcoco

    • huggingface.co
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Voyage_Wang (2025). refcoco [Dataset]. https://huggingface.co/datasets/VoyageWang/refcoco
    Explore at:
    Dataset updated
    Oct 3, 2025
    Authors
    Voyage_Wang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    VoyageWang/refcoco dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. f

    Experimental results of TBKIN on RefCOCOg.

    • figshare.com
    xls
    Updated Jun 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihan Guo; Xiang Shen; Chongqing Chen (2025). Experimental results of TBKIN on RefCOCOg. [Dataset]. http://doi.org/10.1371/journal.pone.0325543.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zihan Guo; Xiang Shen; Chongqing Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vision-language models aim to seamlessly integrate visual and linguistic information for multi-modal tasks, demanding precise semantic alignments between image-text pairs while minimizing the influence of irrelevant data. While existing methods leverage intra-modal and cross-modal knowledge to enhance alignments, they often fall short in sufficiently reducing interference, which ultimately constrains model performance. To address this gap, we propose a novel vision-language model, the threshold-based knowledge integration network (TBKIN), designed to effectively capture intra-modal and cross-modal knowledge while systematically mitigating the impact of extraneous information. TBKIN employs unified scene graph structures and advanced masking strategies to strengthen semantic alignments and introduces a fine-tuning strategy based on threshold selection to eliminate noise. Comprehensive experimental evaluations demonstrate the efficacy of TBKIN, with our best model achieving state-of-the-art accuracy of 73.90% on the VQA 2.0 dataset and 84.60% on the RefCOCO dataset. Attention visualization and detailed result analysis further validate the robustness of TBKIN in tackling vision-language tasks. The model’s ability to reduce interference while enhancing semantic alignments underscores its potential for advancing multi-modal learning. Extensive experiments across four widely-used benchmark datasets confirm its superior performance on two typical vision-language tasks, offering a practical and effective solution for real-world applications.

  15. h

    RefCOCO-Triplets

    • huggingface.co
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zeyu Han (2024). RefCOCO-Triplets [Dataset]. https://huggingface.co/datasets/CresCat01/RefCOCO-Triplets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 12, 2024
    Authors
    Zeyu Han
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for RefCOCO Triplets

    This dataset contains annotations derived from using ChatGPT to decompose the referring expressions (captions) of the RefCOCO/+/g dataset into triples (subject, predicate, object).

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: Zeyu Han Language(s) (NLP): English License: cc-by-4.0

      Dataset Sources
    

    Repository: https://github.com/Show-han/Zeroshot_REC Paper: Zero-shot Referring Expression Comprehension via… See the full description on the dataset page: https://huggingface.co/datasets/CresCat01/RefCOCO-Triplets.

  16. f

    Experimental results of TBKIN on RefCOCO+.

    • datasetcatalog.nlm.nih.gov
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Jun 10, 2025
    Authors
    Chen, Chongqing; Shen, Xiang; Guo, Zihan
    Description

    Vision-language models aim to seamlessly integrate visual and linguistic information for multi-modal tasks, demanding precise semantic alignments between image-text pairs while minimizing the influence of irrelevant data. While existing methods leverage intra-modal and cross-modal knowledge to enhance alignments, they often fall short in sufficiently reducing interference, which ultimately constrains model performance. To address this gap, we propose a novel vision-language model, the threshold-based knowledge integration network (TBKIN), designed to effectively capture intra-modal and cross-modal knowledge while systematically mitigating the impact of extraneous information. TBKIN employs unified scene graph structures and advanced masking strategies to strengthen semantic alignments and introduces a fine-tuning strategy based on threshold selection to eliminate noise. Comprehensive experimental evaluations demonstrate the efficacy of TBKIN, with our best model achieving state-of-the-art accuracy of 73.90% on the VQA 2.0 dataset and 84.60% on the RefCOCO dataset. Attention visualization and detailed result analysis further validate the robustness of TBKIN in tackling vision-language tasks. The model’s ability to reduce interference while enhancing semantic alignments underscores its potential for advancing multi-modal learning. Extensive experiments across four widely-used benchmark datasets confirm its superior performance on two typical vision-language tasks, offering a practical and effective solution for real-world applications.

  17. h

    RAS-refcoco

    • huggingface.co
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengcao Cao (2025). RAS-refcoco [Dataset]. https://huggingface.co/datasets/Shengcao1006/RAS-refcoco
    Explore at:
    Dataset updated
    Oct 21, 2025
    Authors
    Shengcao Cao
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Shengcao1006/RAS-refcoco dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. Comparisons with state-of-the-art models on VQA and REC.

    • plos.figshare.com
    xls
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihan Guo; Xiang Shen; Chongqing Chen (2025). Comparisons with state-of-the-art models on VQA and REC. [Dataset]. http://doi.org/10.1371/journal.pone.0325543.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Zihan Guo; Xiang Shen; Chongqing Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparisons with state-of-the-art models on VQA and REC.

  19. refcoco_plus

    • kaggle.com
    zip
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kenji0901 (2024). refcoco_plus [Dataset]. https://www.kaggle.com/kenji0901/refcoco-plus
    Explore at:
    zip(8915774737 bytes)Available download formats
    Dataset updated
    Mar 6, 2024
    Authors
    kenji0901
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by kenji0901

    Released under MIT

    Contents

  20. h

    refcoco-coco2017

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    junhyungkwak, refcoco-coco2017 [Dataset]. https://huggingface.co/datasets/jhkwak-bp/refcoco-coco2017
    Explore at:
    Authors
    junhyungkwak
    Description

    refcoco with COCO 2017 Image Paths

    This dataset is a version of the original refcoco dataset that uses COCO 2017 image paths instead of COCO 2014.

      Changes from Original
    

    Image paths updated from COCO 2014 format to COCO 2017 format Images loaded from COCO 2017 directory structure All other annotations remain unchanged

      Usage
    

    from datasets import load_dataset

    dataset = load_dataset("jhkwak-bp/refcoco-coco2017")

      Citation
    

    Please cite the original… See the full description on the dataset page: https://huggingface.co/datasets/jhkwak-bp/refcoco-coco2017.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
LMMs-Lab (2024). RefCOCO [Dataset]. https://huggingface.co/datasets/lmms-lab/RefCOCO

RefCOCO

lmms-lab/RefCOCO

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 17, 2024
Dataset authored and provided by
LMMs-Lab
Description

Large-scale Multi-modality Models Evaluation Suite

Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

  This Dataset

This is a formatted version of RefCOCO. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @inproceedings{kazemzadeh-etal-2014-referitgame, title = "{R}efer{I}t{G}ame: Referring to Objects in… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/RefCOCO.

Search
Clear search
Close search
Google apps
Main menu