66 datasets found
  1. a

    Flickr30k

    • datasets.activeloop.ai
    • opendatalab.com
    • +1more
    deeplake
    Updated Mar 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie. (2022). Flickr30k [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/flickr30k-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 30, 2022
    Authors
    Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie.
    License

    Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
    License information was derived automatically

    Description

    A dataset of 30,000 images with 5 captions per image. The dataset was created by researchers at Stanford University and is used for research in machine learning and natural language processing tasks such as image captioning and visual question answering.

  2. P

    Flickr30K Entities Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bryan A. Plummer; Li-Wei Wang; Chris M. Cervantes; Juan C. Caicedo; Julia Hockenmaier; Svetlana Lazebnik (2024). Flickr30K Entities Dataset [Dataset]. https://paperswithcode.com/dataset/flickr30k-entities
    Explore at:
    Dataset updated
    Jan 3, 2024
    Authors
    Bryan A. Plummer; Li-Wei Wang; Chris M. Cervantes; Juan C. Caicedo; Julia Hockenmaier; Svetlana Lazebnik
    Description

    The Flickr30K Entities dataset is an extension to the Flickr30K dataset. It augments the original 158k captions with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. This is used to define a new benchmark for localization of textual entity mentions in an image.

  3. h

    flickr30k

    • huggingface.co
    Updated Oct 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMMs-Lab (2024). flickr30k [Dataset]. https://huggingface.co/datasets/lmms-lab/flickr30k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2024
    Dataset authored and provided by
    LMMs-Lab
    Description

    Large-scale Multi-modality Models Evaluation Suite

    Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

    🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

      This Dataset
    

    This is a formatted version of flickr30k. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @article{young-etal-2014-image, title = "From image descriptions to visual denotations: New similarity… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/flickr30k.

  4. h

    flickr30k-captions

    • huggingface.co
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sentence Transformers (2024). flickr30k-captions [Dataset]. https://huggingface.co/datasets/sentence-transformers/flickr30k-captions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2024
    Dataset authored and provided by
    Sentence Transformers
    Description

    Dataset Card for Flickr30k Captions

    This dataset is a collection of caption pairs given to the same image, collected from Flickr30k. See Flickr30k for additional information. This dataset can be used directly with Sentence Transformers to train embedding models. Note that two captions for the same image do not strictly have the same semantic meaning.

      Dataset Subsets
    
    
    
    
    
    
    
      pair subset
    

    Columns: "caption1", "caption2" Column types: str, str Examples:{… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/flickr30k-captions.

  5. h

    flickr30k_captions_quintets

    • huggingface.co
    Updated Sep 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Embedding Training Data (2022). flickr30k_captions_quintets [Dataset]. https://huggingface.co/datasets/embedding-data/flickr30k_captions_quintets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2022
    Dataset authored and provided by
    Embedding Training Data
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "flickr30k-captions"

      Dataset Summary
    

    We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituents and their denotations… See the full description on the dataset page: https://huggingface.co/datasets/embedding-data/flickr30k_captions_quintets.

  6. P

    Flickr30k-CNA Dataset

    • paperswithcode.com
    Updated Jul 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chunyu Xie; Heng Cai; Jincheng Li; Fanjing Kong; Xiaoyu Wu; Jianfei Song; Henrique Morimitsu; Lin Yao; Dexin Wang; Xiangzheng Zhang; Dawei Leng; Baochang Zhang; Xiangyang Ji; Yafeng Deng (2022). Flickr30k-CNA Dataset [Dataset]. https://paperswithcode.com/dataset/flickr30k-cna
    Explore at:
    Dataset updated
    Jul 28, 2022
    Authors
    Chunyu Xie; Heng Cai; Jincheng Li; Fanjing Kong; Xiaoyu Wu; Jianfei Song; Henrique Morimitsu; Lin Yao; Dexin Wang; Xiangzheng Zhang; Dawei Leng; Baochang Zhang; Xiangyang Ji; Yafeng Deng
    Description

    Former Flickr30k-CN translates the training and validation sets of Flickr30k using machine translation and manually translates the test set. We check the machine-translated results and find two kinds of problems. (1) Some sentences have language problems and translation errors. (2) Some sentences have poor semantics. In addition, the different translation ways between the training set and test set prevent the model from achieving accurate performance. We gather 6 professional English and Chinese linguists to meticulously re-translate all data of Flickr30k and double-check each sentence.

  7. P

    Flickr30K-Noisy Dataset

    • paperswithcode.com
    Updated May 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng (2024). Flickr30K-Noisy Dataset [Dataset]. https://paperswithcode.com/dataset/flickr30k-20-nc-1k-test
    Explore at:
    Dataset updated
    May 26, 2024
    Authors
    Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng
    Description

    This dataset, based on Flickr30K, is introduced in Learning with Noisy Correspondence for Cross-modal Matching. Noisy correspondence is simulated by randomly shuffling the captions of training images for a specific percentage, denoted by noise ratio

  8. t

    Flickr30K-EE - Dataset - LDM

    • service.tib.eu
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Flickr30K-EE - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/flickr30k-ee
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    Explicit Caption Editing (ECE) — refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DELETE) — has raised significant attention due to its explainable and human-like nature.

  9. h

    Flickr30k

    • huggingface.co
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Love (2025). Flickr30k [Dataset]. https://huggingface.co/datasets/jimmeylove/Flickr30k
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    James Love
    Description

    jimmeylove/Flickr30k dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. MS COCO + Flickr30k + Personal Dataset + Captions

    • kaggle.com
    zip
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Safia Faiz (2025). MS COCO + Flickr30k + Personal Dataset + Captions [Dataset]. https://www.kaggle.com/datasets/safiafaiz/images
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 11, 2025
    Authors
    Safia Faiz
    Description

    Dataset

    This dataset was created by Safia Faiz

    Contents

  11. Flickr30k-TFRecords[512x512]

    • kaggle.com
    Updated Jan 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashish Goswami (2021). Flickr30k-TFRecords[512x512] [Dataset]. https://www.kaggle.com/ashish2001/flickr30ktfrecords512x512/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ashish Goswami
    Description

    Original Flickr-30K dataset in TFRecord format for faster computations on a TPU.

    1. Original dataset contained 5 captions against each image.
    2. Each .tfrec file contains all 30k images but has only one caption for each image, the original caption number is denoted by set in the filename.
    3. Therefore, Caption #1 for image XXXXXX.jpg will be in complete_30K_set1.tfrec and so on.
    4. Each TFRecord contains the following features ``` feature = { 'image': _bytes_feature(feature), #jpeg image 'name': _bytes_feature(name), #filename 'comment': _bytes_feature(comment), #caption

    } ``4. Each caption is encapsulated with

  12. Word2Vec Flickr30k

    • kaggle.com
    Updated Dec 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleh Onyshchak (2019). Word2Vec Flickr30k [Dataset]. https://www.kaggle.com/jacksoncrow/word2vec-flickr30k/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Oleh Onyshchak
    Description
  13. t

    Flickr 30k Dataset - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Flickr 30k Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/flickr-30k-dataset
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The Flickr 30k dataset is a large-scale image captioning dataset containing 30,000 images with 30 captions each.

  14. h

    flickr30k-augmented-caption

    • huggingface.co
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Re:cast AI (2023). flickr30k-augmented-caption [Dataset]. https://huggingface.co/datasets/recastai/flickr30k-augmented-caption
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    Re:cast AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    recastai/flickr30k-augmented-caption dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    flickr30k

    • huggingface.co
    Updated Nov 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dev Gupta (2024). flickr30k [Dataset]. https://huggingface.co/datasets/Dev24910/flickr30k
    Explore at:
    Dataset updated
    Nov 30, 2024
    Authors
    Dev Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dev24910/flickr30k dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. f

    Manual Inspection and Correction.

    • plos.figshare.com
    xls
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem (2025). Manual Inspection and Correction. [Dataset]. http://doi.org/10.1371/journal.pone.0320701.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancements in deep learning have revolutionized numerous real-world applications, including image recognition, visual question answering, and image captioning. Among these, image captioning has emerged as a critical area of research, with substantial progress achieved in Arabic, Chinese, Uyghur, Hindi, and predominantly English. However, despite Urdu being a morphologically rich and widely spoken language, research in Urdu image captioning remains underexplored due to a lack of resources. This study creates a new Urdu Image Captioning Dataset (UCID) called UC-23-RY to fill in the gaps in Urdu image captioning. The Flickr30k dataset inspired the 159,816 Urdu captions in the dataset. Additionally, it suggests deep learning architectures designed especially for Urdu image captioning, including NASNetLarge-LSTM and ResNet-50-LSTM. The NASNetLarge-LSTM and ResNet-50-LSTM models achieved notable BLEU-1 scores of 0.86 and 0.84 respectively, as demonstrated through evaluation in this study accessing the model’s impact on caption quality. Additionally, it provides useful datasets and shows how well-suited sophisticated deep learning models are for improving automatic Urdu image captioning.

  17. O

    Flickr Image

    • opendatalab.com
    zip
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Illinois Urbana-Champaign (2022). Flickr Image [Dataset]. https://opendatalab.com/OpenDataLab/Flickr_Image
    Explore at:
    zip(8859405232 bytes)Available download formats
    Dataset updated
    Sep 13, 2022
    Dataset provided by
    University of Illinois Urbana-Champaign
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects.

  18. Karpathy splits for Image Captioning

    • kaggle.com
    Updated Dec 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shravan Kumar (2020). Karpathy splits for Image Captioning [Dataset]. https://www.kaggle.com/shtvkumar/karpathy-splits/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shravan Kumar
    Description

    The splits were created by Andrej Karpathy and is predominently useful for Image Captioning purpose. Contains captions for Flickr8k, Flickr30k and MSCOCO datasets. And the datasets has been divided into train, test and validation splits.

    Source: http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip

  19. f

    English language corpora.

    • plos.figshare.com
    xls
    Updated Jun 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem (2025). English language corpora. [Dataset]. http://doi.org/10.1371/journal.pone.0320701.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancements in deep learning have revolutionized numerous real-world applications, including image recognition, visual question answering, and image captioning. Among these, image captioning has emerged as a critical area of research, with substantial progress achieved in Arabic, Chinese, Uyghur, Hindi, and predominantly English. However, despite Urdu being a morphologically rich and widely spoken language, research in Urdu image captioning remains underexplored due to a lack of resources. This study creates a new Urdu Image Captioning Dataset (UCID) called UC-23-RY to fill in the gaps in Urdu image captioning. The Flickr30k dataset inspired the 159,816 Urdu captions in the dataset. Additionally, it suggests deep learning architectures designed especially for Urdu image captioning, including NASNetLarge-LSTM and ResNet-50-LSTM. The NASNetLarge-LSTM and ResNet-50-LSTM models achieved notable BLEU-1 scores of 0.86 and 0.84 respectively, as demonstrated through evaluation in this study accessing the model’s impact on caption quality. Additionally, it provides useful datasets and shows how well-suited sophisticated deep learning models are for improving automatic Urdu image captioning.

  20. manga-photo-60k

    • kaggle.com
    Updated May 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergey Chernykh (2023). manga-photo-60k [Dataset]. http://doi.org/10.34740/kaggle/dsv/5784611
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2023
    Dataset provided by
    Kaggle
    Authors
    Sergey Chernykh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains images of realistic style manga pages and photos of people. Each image has size of 460x660. The purpose of this dataset to find out the possibility of style transferring of manga to photo and vice versa. I've used cycleGAN for this purpose.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie. (2022). Flickr30k [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/flickr30k-dataset/

Flickr30k

Explore at:
deeplakeAvailable download formats
Dataset updated
Mar 30, 2022
Authors
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie.
License

Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically

Description

A dataset of 30,000 images with 5 captions per image. The dataset was created by researchers at Stanford University and is used for research in machine learning and natural language processing tasks such as image captioning and visual question answering.

Search
Clear search
Close search
Google apps
Main menu