77 datasets found
  1. COCO-Text V2.0

    • kaggle.com
    Updated Aug 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei-Ming Lin (2022). COCO-Text V2.0 [Dataset]. https://www.kaggle.com/datasets/c7934597/cocotext-v20
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wei-Ming Lin
    Description

    Source from COCO-TEXT V2.0: https://bgshih.github.io/cocotext/

    The bounding box of the dataset's txt is format "YOLO". Already preprocessing is from format "COCO" to "YOLO". Support any "YOLO" frame to train for detecting text.

    If anybody needs to recognize the content of text, please following two methods: 1. After the detecting text, using recognize text model to solve it. 2. Using one step method, ocr for detect and recognize text.

  2. h

    COCO-Text

    • huggingface.co
    Updated Jan 18, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    howard-hou (2010). COCO-Text [Dataset]. https://huggingface.co/datasets/howard-hou/COCO-Text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2010
    Authors
    howard-hou
    Description

    Dataset Card for "COCO-Text"

    More Information needed

  3. coco-text

    • kaggle.com
    zip
    Updated Jun 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Glen Louis (2025). coco-text [Dataset]. https://www.kaggle.com/datasets/glenlouis/coco-text/code
    Explore at:
    zip(12296651 bytes)Available download formats
    Dataset updated
    Jun 23, 2025
    Authors
    Glen Louis
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Glen Louis

    Released under Apache 2.0

    Contents

  4. coco dataset

    • kaggle.com
    zip
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ProgramerSalar (2025). coco dataset [Dataset]. https://www.kaggle.com/datasets/salargamer/coco-dataset
    Explore at:
    zip(20043918455 bytes)Available download formats
    Dataset updated
    Jul 5, 2025
    Authors
    ProgramerSalar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The COCO dataset is a foundational large-scale benchmark for object detection, segmentation, captioning, and keypoint analysis. Created by Microsoft, it features complex everyday scenes with common objects in their natural contexts. With over 330,000 images and 2.5 million labeled instances, it has become the gold standard for training and evaluating computer vision models.

    File Information

    images/
    Contains 2 subdirectories split by usage:
    train2017/: Main training set (118K images)
    val2017/: Validation set (5K images)
    File Naming: 000000000009.jpg (12-digit zero-padded IDs)
    Formats: JPEG images with varying resolutions (average 640×480)
    
    annotations/
    Contains task-specific JSON files with consistent naming:
    captions_*.json: 5 human-generated descriptions per image
    
  5. R

    Text Dataset

    • universe.roboflow.com
    zip
    Updated Apr 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Testing (2025). Text Dataset [Dataset]. https://universe.roboflow.com/testing-1jsew/text-4crgg
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 12, 2025
    Dataset authored and provided by
    Testing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Text DWEI Bounding Boxes
    Description

    Based on COCO-Text V2.0 https://bgshih.github.io/cocotext/

  6. O

    COCO 2017

    • opendatalab.com
    • huggingface.co
    zip
    Updated Sep 30, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2017). COCO 2017 [Dataset]. https://opendatalab.com/OpenDataLab/COCO_2017
    Explore at:
    zip(49105147630 bytes)Available download formats
    Dataset updated
    Sep 30, 2017
    Dataset provided by
    Microsoft
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image 250,000 people with keypoints

  7. COCO image-text pair

    • kaggle.com
    zip
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SHARATH KRISHNA A H 231 (2023). COCO image-text pair [Dataset]. https://www.kaggle.com/datasets/sharathkrishnaah231/coco-image-text-pair
    Explore at:
    zip(12734119 bytes)Available download formats
    Dataset updated
    Jul 26, 2023
    Authors
    SHARATH KRISHNA A H 231
    Description

    Dataset

    This dataset was created by SHARATH KRISHNA A H 231

    Contents

  8. Manga Dataset - Text Detection

    • kaggle.com
    zip
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naufal Ahnaf (2024). Manga Dataset - Text Detection [Dataset]. https://www.kaggle.com/datasets/naufalahnaf17/manga-text-detection
    Explore at:
    zip(30722528 bytes)Available download formats
    Dataset updated
    Sep 16, 2024
    Authors
    Naufal Ahnaf
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Manga Text Object Detection Dataset is a meticulously curated collection of manga pages designed to support and advance research in text detection and recognition within comic book and graphic novel contexts. This dataset leverages the COCO (Common Objects in Context) format, making it readily compatible with a range of object detection frameworks and tools.

    consists of 3 folders namely

    • trains
    • valid
    • test

    Each folder contains a collection of images and 1 file called _annotations.coco.json

    You can use CocoDetection in pytorch to preprocess the data

  9. Z

    COCO, LVIS, Open Images V4 classes mapping

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Amato; Paolo Bolettieri; Fabio Carrara; Fabrizio Falchi; Claudio Gennaro; Nicola Messina; Lucia Vadicamo; Claudio Vairo (2022). COCO, LVIS, Open Images V4 classes mapping [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7194299
    Explore at:
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    ISTI-CNR
    Authors
    Giuseppe Amato; Paolo Bolettieri; Fabio Carrara; Fabrizio Falchi; Claudio Gennaro; Nicola Messina; Lucia Vadicamo; Claudio Vairo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.

    COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.

    We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet

    This repository contains the following files:

    coco_classes_map.txt, contains the mapping for the 80 coco classes

    lvis_classes_map.txt, contains the mapping for the 1460 coco classes

    openimages_classes_map.txt, contains the mapping for the 601 coco classes

    classname_hyperset_definition.csv, contains the final set of 1460 classes, their definition and hierarchy

    all-classnames.xlsx, contains a side-by-side view of all classes considered

    This mapping was used in VISIONE [Amato et al. 2021, Amato et al. 2022] that is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). For the object detection VISIONE uses three pre-trained models: VfNet Zhang et al. 2021, Mask R-CNN He et al. 2017, and a Faster R-CNN+Inception ResNet (trained on the Open Images V4).

    This is repository is released under a Creative Commons Attribution license, please cite the following paper if you use it in your work in any form:

    @inproceedings{amato2021visione, title={The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Debole, Franca and Falchi, Fabrizio and Gennaro, Claudio and Vadicamo, Lucia and Vairo, Claudio}, journal={Journal of Imaging}, volume={7}, number={5}, pages={76}, year={2021}, publisher={Multidisciplinary Digital Publishing Institute} }

    References:

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_52

    [Amato et al. 2021] Amato, G., Bolettieri, P., Carrara, F., Debole, F., Falchi, F., Gennaro, C., Vadicamo, L. and Vairo, C., 2021. The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. Journal of Imaging, 7(5), p.76.

    [Gupta et al.2019] Gupta, A., Dollar, P. and Girshick, R., 2019. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364).

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).

  10. Micro COCO2017

    • kaggle.com
    zip
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hao Hoang (2025). Micro COCO2017 [Dataset]. https://www.kaggle.com/datasets/haohoangofficial/micro-coco2017
    Explore at:
    zip(4918265726 bytes)Available download formats
    Dataset updated
    Jul 11, 2025
    Authors
    Hao Hoang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MicroCOCO2017 is a curated subset of the COCO 2017 dataset, designed for lightweight experimentation with object detection and segmentation models. It includes: - 25,000 images from the train2017 split - 5,000 images from the val2017 split Full COCO-style annotations (bounding boxes, categories, segmentation masks) This dataset is ideal for faster training and prototyping while maintaining the diversity of the original COCO dataset. 📁 Forked from: github.com/giddyyupp/coco-minitrain

  11. D

    COCO-style geographically unbiased image dataset for computer vision...

    • dataverse.ird.fr
    pdf, txt, zip
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theophile Bayet; Theophile Bayet (2023). COCO-style geographically unbiased image dataset for computer vision applications [Dataset]. http://doi.org/10.23708/N2UY4C
    Explore at:
    zip(176316624), zip(218991), pdf(57252), txt(1731), pdf(83345), zip(308454)Available download formats
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    DataSuds
    Authors
    Theophile Bayet; Theophile Bayet
    License

    https://dataverse.ird.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.23708/N2UY4Chttps://dataverse.ird.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.23708/N2UY4C

    Time period covered
    Jan 1, 2022 - Apr 1, 2022
    Description

    There are already a lot of datasets linked to computer vision tasks (Imagenet, MS COCO, Pascal VOC, OpenImages, and numerous others), but they all suffer from important bias. One bias of significance for us is the data origin: most datasets are composed of data coming from developed countries. Facing this situation, and the need of data with local context in developing countries, we try here to adapt common data generation process to inclusive data, meaning data drawn from locations and cultural context that are unseen or poorly represented. We chose to replicate MS COCO's data generation process, as it is well documented and easy to implement. Data was collected from January to April 2022 through Flickr platform. This dataset contains the results of our data collection process, as follows : 23 text files containing comma separated URLs for each of the 23 geographic zones identified in the UN M49 norm. These text files are named according to the names of the geographic zones they cover. Annotations for 400 images per geographic zones. Those annotations are COCO-style, and inform on the presence or absence of 91 categories of objects or concepts on the images. They are shared in a JSON format. Licenses for the 400 annotations per geographic zones, based on the original licenses of the data and specified per image. Those licenses are shared under CSV format. A document explaining the objectives and methodology underlying the data collection, also describing the different components of the dataset.

  12. E

    SPEECH-COCO

    • live.european-language-grid.eu
    audio wav
    Updated Dec 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). SPEECH-COCO [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7686
    Explore at:
    audio wavAvailable download formats
    Dataset updated
    Dec 10, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: Our corpus is an extension of the MS COCO image recognition and captioning dataset. MS COCO comprises images paired with a set of five captions. Yet, it does not include any speech. Therefore, we used Voxygen's text-to-speech system to synthesise the available captions. The addition of speech as a new modality enables MSCOCO to be used for researches in the field of language acquisition, unsupervised term discovery, keyword spotting, or semantic embedding using speech and vision. Our corpus is licensed under a Creative Commons Attribution 4.0 License. Data Set: This corpus contains 616,767 spoken captions from MSCOCO's val2014 and train2014 subsets (respectively 414,113 for train2014 and 202,654 for val2014). We used 8 different voices. 4 of them have a British accent (Paul, Bronwen, Judith, and Elizabeth) and the 4 others have an American accent (Phil, Bruce, Amanda, Jenny). In order to make the captions sound more natural, we used SOX tempo command, enabling us to change the speed without changing the pitch. 1/3 of the captions are 10% slower than the original pace, 1/3 are 10% faster. The last third of the captions was kept untouched. We also modified approximately 30% of the original captions and added disfluencies such as "um", "uh", "er" so that the captions would sound more natural. Each WAV file is paired with a JSON file containing various information: timecode of each word in the caption, name of the speaker, name of the WAV file, etc. The JSON files have the following data structure: {"duration": float, "speaker": string, "synthesisedCaption": string, "timecode": list, "speed": float, "wavFilename": string, "captionID": int, "imgID": int, "disfluency": list}. On average, each caption comprises 10.79 tokens, disfluencies included. The WAV files are on average 3.52 seconds long.

  13. T

    ref_coco

    • tensorflow.org
    • opendatalab.com
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ref_coco [Dataset]. https://www.tensorflow.org/datasets/catalog/ref_coco
    Explore at:
    Dataset updated
    May 31, 2024
    Description

    A collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.

    RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.

    Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".

    Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):

    datasetpartitionsplitrefsimages
    refcocogoogletrain4000019213
    refcocogoogleval50004559
    refcocogoogletest50004527
    refcocounctrain4240416994
    refcocouncval38111500
    refcocounctestA1975750
    refcocounctestB1810750
    refcoco+unctrain4227816992
    refcoco+uncval38051500
    refcoco+unctestA1975750
    refcoco+unctestB1798750
    refcocoggoogletrain4482224698
    refcocoggoogleval50004650
    refcocogumdtrain4222621899
    refcocogumdval25731300
    refcocogumdtest50232600

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ref_coco', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/ref_coco-refcoco_unc-1.1.0.png" alt="Visualization" width="500px">

  14. h

    COCO

    • huggingface.co
    • datasets.activeloop.ai
    Updated Feb 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuggingFaceM4 (2023). COCO [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/COCO
    Explore at:
    Dataset updated
    Feb 6, 2023
    Dataset authored and provided by
    HuggingFaceM4
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.

  15. COCO-Counterfactuals

    • huggingface.co
    Updated Feb 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intel (2024). COCO-Counterfactuals [Dataset]. https://huggingface.co/datasets/Intel/COCO-Counterfactuals
    Explore at:
    Dataset updated
    Feb 3, 2024
    Dataset authored and provided by
    Intelhttp://intel.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COCO-Counterfactuals is a high quality synthetic dataset for multimodal vision-language model evaluation and for training data augmentation. Each COCO-Counterfactuals example includes a pair of image-text pairs; one is a counterfactual variation of the other. The two captions are identical to each other except a noun subject. The two corresponding synthetic images differ only in terms of the altered subject in the two captions. In our accompanying paper, we showed that the COCO-Counterfactuals dataset is challenging for existing pre-trained multimodal models and significantly increase the difficulty of the zero-shot image-text retrieval and image-text matching tasks. Our experiments also demonstrate that augmenting training data with COCO-Counterfactuals improves OOD generalization on multiple downstream tasks.

  16. COCO Caption 2014

    • kaggle.com
    Updated Aug 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ChuLiJ (2025). COCO Caption 2014 [Dataset]. https://www.kaggle.com/datasets/chulij/cocodatasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ChuLiJ
    Description

    This dataset consists of the training set and validation set of COCO Caption 2014, containing only images. The corresponding captions can be obtained from 'https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json' and 'https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json', which obtains through the code of BLIP. The official dataset link is http://cocodataset.org/

    这个数据集是COCO Caption 2014的训练集和验证集,里面仅有图片,对应的Caption可以从BLIP官方代码的'https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json '、'https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json '中获取。官方数据集链接为http://cocodataset.org/

  17. License Plate persian coco

    • kaggle.com
    zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    marjan shahchera (2024). License Plate persian coco [Dataset]. https://www.kaggle.com/datasets/marshah/license-plate-persian-coco
    Explore at:
    zip(23589320 bytes)Available download formats
    Dataset updated
    Jul 12, 2024
    Authors
    marjan shahchera
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    License plate location detection and text recognition in Persian language (FARSI) in Iran with different types of license plates with images of cars in streets, roads, parking lots, and urban areas with images of multiple cars on busy streets.

  18. h

    laion-coco-aesthetic-text-only

    • huggingface.co
    Updated Feb 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nirmalendu prakash (2019). laion-coco-aesthetic-text-only [Dataset]. https://huggingface.co/datasets/nirmalendu01/laion-coco-aesthetic-text-only
    Explore at:
    Dataset updated
    Feb 15, 2019
    Authors
    nirmalendu prakash
    Description

    nirmalendu01/laion-coco-aesthetic-text-only dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    coco-fastvlm-2k-val2017

    • huggingface.co
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riddhiman Rana (2025). coco-fastvlm-2k-val2017 [Dataset]. https://huggingface.co/datasets/riddhimanrana/coco-fastvlm-2k-val2017
    Explore at:
    Dataset updated
    Aug 31, 2025
    Authors
    Riddhiman Rana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for COCO FastVLM 2K Val2017 Structured Captions

    This dataset contains 2,000 high-quality image-text pairs generated from the COCO 2017 validation set using a FastVLM-based vision-language model with structured prompt engineering and automated distillation. Each caption follows a structured 7-point format to describe the visual content in detail, enabling high-fidelity fine-tuning of multimodal models.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    This… See the full description on the dataset page: https://huggingface.co/datasets/riddhimanrana/coco-fastvlm-2k-val2017.

  20. Vietnamese COCO 2017 image caption dataset

    • kaggle.com
    zip
    Updated Mar 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dinhanhx (2023). Vietnamese COCO 2017 image caption dataset [Dataset]. https://www.kaggle.com/datasets/inhanhv/vietnamese-coco-2017-image-caption-dataset/code
    Explore at:
    zip(41152129 bytes)Available download formats
    Dataset updated
    Mar 4, 2023
    Authors
    dinhanhx
    Description

    The dataset is firstly introduced in dinhanhx/VisualRoBERTa.

    We use VinAI tools to translate COCO 2027 image caption (2017 Train/Val annotations) from English to Vietnamese. Then we merge UIT-ViIC dataset into it.

    To load the dataset, one can take a look at this code in VisualRoBERTa.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wei-Ming Lin (2022). COCO-Text V2.0 [Dataset]. https://www.kaggle.com/datasets/c7934597/cocotext-v20
Organization logo

COCO-Text V2.0

A Large-Scale Scene Text Dataset, Based on MSCOCO

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Wei-Ming Lin
Description

Source from COCO-TEXT V2.0: https://bgshih.github.io/cocotext/

The bounding box of the dataset's txt is format "YOLO". Already preprocessing is from format "COCO" to "YOLO". Support any "YOLO" frame to train for detecting text.

If anybody needs to recognize the content of text, please following two methods: 1. After the detecting text, using recognize text model to solve it. 2. Using one step method, ocr for detect and recognize text.

Search
Clear search
Close search
Google apps
Main menu