Update: OCT-2023
Add v2 with recent SoTA model swinV2 classifier for both soft/hard-label visual_caption_cosine_score_v2 with person label (0.2, 0.3 and 0.4)
Introduction
Modern image captaining relies heavily on extracting knowledge, from images such as objects, to capture the concept of static story in the image. In this paper, we propose a textual visual context dataset for captioning, where the publicly available dataset COCO caption (Lin et al., 2014) has been… See the full description on the dataset page: https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Unlock detailed insights into road scenes with our Vehicle Image Captioning Dataset. Featuring over 1000 high-resolution images.
Dataset Card for "my-image-captioning-dataset"
More Information needed
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Gujarati Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.
This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.
Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.
Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:
The Image Captioning Dataset serves various applications across different domains:
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Norwegian Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.
This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.
Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.
Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:
The Image Captioning Dataset serves various applications across different domains:
Dataset Card for "kag100-image-captioning-dataset"
More Information needed
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
etc.
Soumyajit9979/my-image-caption-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the German Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.
This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.
Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.
Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:
The Image Captioning Dataset serves various applications across different domains:
tuy20212521/my-image-captioning-dataset-test dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SwaFlickr8k dataset is an extension of the well-known Flickr8k dataset, specifically designed for image captioning tasks. It includes a collection of images and corresponding captions written in Swahili. With 8,091 unique images and 40,455 captions, this dataset provides a valuable resource for research and development in the field of image understanding and language processing, particularly in the context of Swahili language.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An image captioning dataset including images of humans performing various activities. The included images include the following activities: walking, running, sleeping, swimming, sitting, jumping, riding, climbing, drinking and reading.
This dataset was created by Malavika Shamesh
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Remote sensing imagery offers intricate and nuanced data, emphasizing the need for a profound understanding of the relationships among varied geographical elements and events. In this study, we explore the transitions from the image domain to the text domain by employing four state-of-the-art image captioning algorithms, i.e., BLIP, mPLUG, OFA, and X-VLM. Specifically, we investigate 1) the stability of these image captioning algorithms for remote sensing image captioning, 2) the preservation of similarity between images and their corresponding captions, and 3) the characteristics of their caption embedding spaces. The results suggest a moderate consistency across generated captions from different image captioning models, with observable variations contingent upon the urban entities presented. In addition, a dynamic relationship emerges between image space and the corresponding caption space, evidenced by their fluctuated correlation coefficient. Most importantly, patterns within the caption embedding space align with the observed land cover and land use in the image patches, reaffirming the potential of our pilot work as an impactful analytical approach in future remote sensing analytics. We advocate that integrating image captioning techniques with remote sensing imagery paves the way for an innovative data extraction and interpretation approach with diverse applications. This dataset contains the data and code to reproduce this study.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Finnish Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.
This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.
Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.
Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:
The Image Captioning Dataset serves various applications across different domains:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
SafeVision:IMAGE CAPTIONING is a dataset for vision language (multimodal) tasks - it contains Safety annotations for 314 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by Nabi Nabiyev
It contains the following files:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Image captioning in Bengali. The image dataset is collected from Flickr8K and Banglalekha dataset. The captions are generated using google translate.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
A dataset of 30,000 images with 5 captions per image. The dataset was created by researchers at Stanford University and is used for research in machine learning and natural language processing tasks such as image captioning and visual question answering.
10,000 Image caption data of gestures, mainly for young and middle-aged people, the collection environment includes indoor scenes and outdoor scenes, including various collection environments, various seasons, and various collection angles. The description language is English, mainly describing hand characteristics such as hand movements, gestures, image acquisition angles, gender, age, etc.
Update: OCT-2023
Add v2 with recent SoTA model swinV2 classifier for both soft/hard-label visual_caption_cosine_score_v2 with person label (0.2, 0.3 and 0.4)
Introduction
Modern image captaining relies heavily on extracting knowledge, from images such as objects, to capture the concept of static story in the image. In this paper, we propose a textual visual context dataset for captioning, where the publicly available dataset COCO caption (Lin et al., 2014) has been… See the full description on the dataset page: https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset.