66 datasets found

a
Flickr30k
datasets.activeloop.ai
opendatalab.com
+1more
deeplake
Updated Mar 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie. (2022). Flickr30k [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/flickr30k-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Mar 30, 2022
Authors
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie.
License
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Description
A dataset of 30,000 images with 5 captions per image. The dataset was created by researchers at Stanford University and is used for research in machine learning and natural language processing tasks such as image captioning and visual question answering.
P
Flickr30K Entities Dataset
paperswithcode.com
library.toponeai.link
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bryan A. Plummer; Li-Wei Wang; Chris M. Cervantes; Juan C. Caicedo; Julia Hockenmaier; Svetlana Lazebnik (2024). Flickr30K Entities Dataset [Dataset]. https://paperswithcode.com/dataset/flickr30k-entities
Explore at:
Dataset updated
Jan 3, 2024
Authors
Bryan A. Plummer; Li-Wei Wang; Chris M. Cervantes; Juan C. Caicedo; Julia Hockenmaier; Svetlana Lazebnik
Description
The Flickr30K Entities dataset is an extension to the Flickr30K dataset. It augments the original 158k captions with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. This is used to define a new benchmark for localization of textual entity mentions in an image.
h
flickr30k
huggingface.co
Updated Oct 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LMMs-Lab (2024). flickr30k [Dataset]. https://huggingface.co/datasets/lmms-lab/flickr30k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2024
Dataset authored and provided by
LMMs-Lab
Description
Large-scale Multi-modality Models Evaluation Suite

Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

This Dataset

This is a formatted version of flickr30k. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @article{young-etal-2014-image, title = "From image descriptions to visual denotations: New similarity… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/flickr30k.
h
flickr30k-captions
huggingface.co
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sentence Transformers (2024). flickr30k-captions [Dataset]. https://huggingface.co/datasets/sentence-transformers/flickr30k-captions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 30, 2024
Dataset authored and provided by
Sentence Transformers
Description
Dataset Card for Flickr30k Captions

This dataset is a collection of caption pairs given to the same image, collected from Flickr30k. See Flickr30k for additional information. This dataset can be used directly with Sentence Transformers to train embedding models. Note that two captions for the same image do not strictly have the same semantic meaning.

Dataset Subsets pair subset

Columns: "caption1", "caption2" Column types: str, str Examples:{… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/flickr30k-captions.
h
flickr30k_captions_quintets
huggingface.co
Updated Sep 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Embedding Training Data (2022). flickr30k_captions_quintets [Dataset]. https://huggingface.co/datasets/embedding-data/flickr30k_captions_quintets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 25, 2022
Dataset authored and provided by
Embedding Training Data
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for "flickr30k-captions"

Dataset Summary

We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituents and their denotations… See the full description on the dataset page: https://huggingface.co/datasets/embedding-data/flickr30k_captions_quintets.
P
Flickr30k-CNA Dataset
paperswithcode.com
Updated Jul 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chunyu Xie; Heng Cai; Jincheng Li; Fanjing Kong; Xiaoyu Wu; Jianfei Song; Henrique Morimitsu; Lin Yao; Dexin Wang; Xiangzheng Zhang; Dawei Leng; Baochang Zhang; Xiangyang Ji; Yafeng Deng (2022). Flickr30k-CNA Dataset [Dataset]. https://paperswithcode.com/dataset/flickr30k-cna
Explore at:
Dataset updated
Jul 28, 2022
Authors
Chunyu Xie; Heng Cai; Jincheng Li; Fanjing Kong; Xiaoyu Wu; Jianfei Song; Henrique Morimitsu; Lin Yao; Dexin Wang; Xiangzheng Zhang; Dawei Leng; Baochang Zhang; Xiangyang Ji; Yafeng Deng
Description
Former Flickr30k-CN translates the training and validation sets of Flickr30k using machine translation and manually translates the test set. We check the machine-translated results and find two kinds of problems. (1) Some sentences have language problems and translation errors. (2) Some sentences have poor semantics. In addition, the different translation ways between the training set and test set prevent the model from achieving accurate performance. We gather 6 professional English and Chinese linguists to meticulously re-translate all data of Flickr30k and double-check each sentence.
P
Flickr30K-Noisy Dataset
paperswithcode.com
Updated May 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng (2024). Flickr30K-Noisy Dataset [Dataset]. https://paperswithcode.com/dataset/flickr30k-20-nc-1k-test
Explore at:
Dataset updated
May 26, 2024
Authors
Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng
Description
This dataset, based on Flickr30K, is introduced in Learning with Noisy Correspondence for Cross-modal Matching. Noisy correspondence is simulated by randomly shuffling the captions of training images for a specific percentage, denoted by noise ratio
t
Flickr30K-EE - Dataset - LDM
service.tib.eu
Updated Dec 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Flickr30K-EE - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/flickr30k-ee
Explore at:
Dataset updated
Dec 3, 2024
Description
Explicit Caption Editing (ECE) — refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DELETE) — has raised significant attention due to its explainable and human-like nature.
h
Flickr30k
huggingface.co
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Love (2025). Flickr30k [Dataset]. https://huggingface.co/datasets/jimmeylove/Flickr30k
Explore at:
Dataset updated
May 11, 2025
Authors
James Love
Description
jimmeylove/Flickr30k dataset hosted on Hugging Face and contributed by the HF Datasets community
MS COCO + Flickr30k + Personal Dataset + Captions
kaggle.com
zip
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Safia Faiz (2025). MS COCO + Flickr30k + Personal Dataset + Captions [Dataset]. https://www.kaggle.com/datasets/safiafaiz/images
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 11, 2025
Authors
Safia Faiz
Description
Dataset

This dataset was created by Safia Faiz

Contents
Flickr30k-TFRecords[512x512]
kaggle.com
Updated Jan 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashish Goswami (2021). Flickr30k-TFRecords[512x512] [Dataset]. https://www.kaggle.com/ashish2001/flickr30ktfrecords512x512/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ashish Goswami
Description
Original Flickr-30K dataset in TFRecord format for faster computations on a TPU.

Original dataset contained 5 captions against each image.

Each .tfrec file contains all 30k images but has only one caption for each image, the original caption number is denoted by set in the filename.

Therefore, Caption #1 for image XXXXXX.jpg will be in complete_30K_set1.tfrec and so on.

Each TFRecord contains the following features ``` feature = { 'image': _bytes_feature(feature), #jpeg image 'name': _bytes_feature(name), #filename 'comment': _bytes_feature(comment), #caption

} ``4. Each caption is encapsulated with
Word2Vec Flickr30k
kaggle.com
Updated Dec 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleh Onyshchak (2019). Word2Vec Flickr30k [Dataset]. https://www.kaggle.com/jacksoncrow/word2vec-flickr30k/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Oleh Onyshchak
Description
Source

Uploaded from https://github.com/danieljf24/w2vv
t
Flickr 30k Dataset - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Flickr 30k Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/flickr-30k-dataset
Explore at:
Dataset updated
Dec 16, 2024
Description
The Flickr 30k dataset is a large-scale image captioning dataset containing 30,000 images with 30 captions each.
h
flickr30k-augmented-caption
huggingface.co
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Re:cast AI (2023). flickr30k-augmented-caption [Dataset]. https://huggingface.co/datasets/recastai/flickr30k-augmented-caption
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2023
Dataset authored and provided by
Re:cast AI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
recastai/flickr30k-augmented-caption dataset hosted on Hugging Face and contributed by the HF Datasets community
h
flickr30k
huggingface.co
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dev Gupta (2024). flickr30k [Dataset]. https://huggingface.co/datasets/Dev24910/flickr30k
Explore at:
Dataset updated
Nov 30, 2024
Authors
Dev Gupta
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dev24910/flickr30k dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Manual Inspection and Correction.
plos.figshare.com
xls
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem (2025). Manual Inspection and Correction. [Dataset]. http://doi.org/10.1371/journal.pone.0320701.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320701.t004
Dataset updated
Jun 2, 2025
Dataset provided by
PLOS ONE
Authors
Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advancements in deep learning have revolutionized numerous real-world applications, including image recognition, visual question answering, and image captioning. Among these, image captioning has emerged as a critical area of research, with substantial progress achieved in Arabic, Chinese, Uyghur, Hindi, and predominantly English. However, despite Urdu being a morphologically rich and widely spoken language, research in Urdu image captioning remains underexplored due to a lack of resources. This study creates a new Urdu Image Captioning Dataset (UCID) called UC-23-RY to fill in the gaps in Urdu image captioning. The Flickr30k dataset inspired the 159,816 Urdu captions in the dataset. Additionally, it suggests deep learning architectures designed especially for Urdu image captioning, including NASNetLarge-LSTM and ResNet-50-LSTM. The NASNetLarge-LSTM and ResNet-50-LSTM models achieved notable BLEU-1 scores of 0.86 and 0.84 respectively, as demonstrated through evaluation in this study accessing the model’s impact on caption quality. Additionally, it provides useful datasets and shows how well-suited sophisticated deep learning models are for improving automatic Urdu image captioning.
O
Flickr Image
opendatalab.com
zip
Updated Sep 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Illinois Urbana-Champaign (2022). Flickr Image [Dataset]. https://opendatalab.com/OpenDataLab/Flickr_Image
Explore at:
zip(8859405232 bytes)Available download formats
Dataset updated
Sep 13, 2022
Dataset provided by
University of Illinois Urbana-Champaign
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects.
Karpathy splits for Image Captioning
kaggle.com
Updated Dec 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shravan Kumar (2020). Karpathy splits for Image Captioning [Dataset]. https://www.kaggle.com/shtvkumar/karpathy-splits/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 2, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shravan Kumar
Description
The splits were created by Andrej Karpathy and is predominently useful for Image Captioning purpose. Contains captions for Flickr8k, Flickr30k and MSCOCO datasets. And the datasets has been divided into train, test and validation splits.

Source: http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip
f
English language corpora.
plos.figshare.com
xls
Updated Jun 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem (2025). English language corpora. [Dataset]. http://doi.org/10.1371/journal.pone.0320701.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320701.t002
Dataset updated
Jun 2, 2025
Dataset provided by
PLOS ONE
Authors
Rimsha Muzaffar; Syed Yasser Arafat; Junaid Rashid; Jungeun Kim; Usman Naseem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Advancements in deep learning have revolutionized numerous real-world applications, including image recognition, visual question answering, and image captioning. Among these, image captioning has emerged as a critical area of research, with substantial progress achieved in Arabic, Chinese, Uyghur, Hindi, and predominantly English. However, despite Urdu being a morphologically rich and widely spoken language, research in Urdu image captioning remains underexplored due to a lack of resources. This study creates a new Urdu Image Captioning Dataset (UCID) called UC-23-RY to fill in the gaps in Urdu image captioning. The Flickr30k dataset inspired the 159,816 Urdu captions in the dataset. Additionally, it suggests deep learning architectures designed especially for Urdu image captioning, including NASNetLarge-LSTM and ResNet-50-LSTM. The NASNetLarge-LSTM and ResNet-50-LSTM models achieved notable BLEU-1 scores of 0.86 and 0.84 respectively, as demonstrated through evaluation in this study accessing the model’s impact on caption quality. Additionally, it provides useful datasets and shows how well-suited sophisticated deep learning models are for improving automatic Urdu image captioning.
manga-photo-60k
kaggle.com
Updated May 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergey Chernykh (2023). manga-photo-60k [Dataset]. http://doi.org/10.34740/kaggle/dsv/5784611
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5784611
Dataset updated
May 26, 2023
Dataset provided by
Kaggle
Authors
Sergey Chernykh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains images of realistic style manga pages and photos of people. Each image has size of 460x660. The purpose of this dataset to find out the possibility of style transferring of manga to photo and vice versa. I've used cycleGAN for this purpose.

Facebook

Twitter

Click to copy link

Link copied

Cite

Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie. (2022). Flickr30k [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/flickr30k-dataset/

Flickr30k

Explore at:

deeplakeAvailable download formats

Dataset updated

Mar 30, 2022

Authors

Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie.

License

Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically

Description

A dataset of 30,000 images with 5 captions per image. The dataset was created by researchers at Stanford University and is used for research in machine learning and natural language processing tasks such as image captioning and visual question answering.

Clear search

Close search

Google apps

Main menu

Flickr30k

Flickr30K Entities Dataset

flickr30k

flickr30k-captions

flickr30k_captions_quintets

Flickr30k-CNA Dataset

Flickr30K-Noisy Dataset

Flickr30K-EE - Dataset - LDM

Flickr30k

MS COCO + Flickr30k + Personal Dataset + Captions

Dataset

Contents

Flickr30k-TFRecords[512x512]

Word2Vec Flickr30k

Source

Flickr 30k Dataset - Dataset - LDM

flickr30k-augmented-caption

flickr30k

Manual Inspection and Correction.

Flickr Image

Karpathy splits for Image Captioning

English language corpora.

manga-photo-60k

Flickr30k