100+ datasets found
  1. h

    Textual-Image-Caption-Dataset

    • huggingface.co
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Sabir (2022). Textual-Image-Caption-Dataset [Dataset]. https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2022
    Authors
    Ahmed Sabir
    Description

    Update: OCT-2023

    Add v2 with recent SoTA model swinV2 classifier for both soft/hard-label visual_caption_cosine_score_v2 with person label (0.2, 0.3 and 0.4)

      Introduction
    

    Modern image captaining relies heavily on extracting knowledge, from images such as objects, to capture the concept of static story in the image. In this paper, we propose a textual visual context dataset for captioning, where the publicly available dataset COCO caption (Lin et al., 2014) has been… See the full description on the dataset page: https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset.

  2. g

    Vehicle Image Captioning Dataset

    • gts.ai
    json
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Vehicle Image Captioning Dataset [Dataset]. https://gts.ai/dataset-download/vehicle-image-captioning-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Unlock detailed insights into road scenes with our Vehicle Image Captioning Dataset. Featuring over 1000 high-resolution images.

  3. h

    my-image-captioning-dataset

    • huggingface.co
    Updated Apr 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Park (2023). my-image-captioning-dataset [Dataset]. https://huggingface.co/datasets/SKyu/my-image-captioning-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2023
    Authors
    Park
    Description

    Dataset Card for "my-image-captioning-dataset"

    More Information needed

  4. F

    Gujarati Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Gujarati Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/gujarati-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Gujarati Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    •
    Sources: Images are sourced from public databases and proprietary collections.
    •
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    •
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    •
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    •
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    •
    Daily Life: Images about household objects, activities, and daily routines.
    •
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    •
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    •
    Human Activities: Images about people, their actions, professions, and interactions.
    •
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    •
    Food and Dining: Images about different foods, meals, and dining settings.
    •
    Education: Images related to educational settings, materials, and activities.
    •
    Sports and Recreation: Images about various sports, games, and recreational activities.
    •
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    •
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    •Caption Details:
    •
    Human Generated: Each caption is generated by native Gujarati people.
    •
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    •
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    •Image File Name
    •Category
    •Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    •
    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  5. F

    Norwegian Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Norwegian Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/norwegian-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Norwegian Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    •
    Sources: Images are sourced from public databases and proprietary collections.
    •
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    •
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    •
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    •
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    •
    Daily Life: Images about household objects, activities, and daily routines.
    •
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    •
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    •
    Human Activities: Images about people, their actions, professions, and interactions.
    •
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    •
    Food and Dining: Images about different foods, meals, and dining settings.
    •
    Education: Images related to educational settings, materials, and activities.
    •
    Sports and Recreation: Images about various sports, games, and recreational activities.
    •
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    •
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    •Caption Details:
    •
    Human Generated: Each caption is generated by native Norwegian people.
    •
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    •
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    •Image File Name
    •Category
    •Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    •
    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  6. h

    kag100-image-captioning-dataset

    • huggingface.co
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavan Kumar Jonnakuti (2023). kag100-image-captioning-dataset [Dataset]. https://huggingface.co/datasets/jpawan33/kag100-image-captioning-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2023
    Authors
    Pavan Kumar Jonnakuti
    Description

    Dataset Card for "kag100-image-captioning-dataset"

    More Information needed

  7. i

    Dataset for Image Captioning System (in Bangla)

    • ieee-dataport.org
    Updated Aug 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toshiba Kamruzzaman (2021). Dataset for Image Captioning System (in Bangla) [Dataset]. https://ieee-dataport.org/documents/dataset-image-captioning-system-bangla
    Explore at:
    Dataset updated
    Aug 2, 2021
    Authors
    Toshiba Kamruzzaman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    etc.

  8. h

    my-image-caption-dataset

    • huggingface.co
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sen Sharma (2025). my-image-caption-dataset [Dataset]. https://huggingface.co/datasets/Soumyajit9979/my-image-caption-dataset
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Sen Sharma
    Description

    Soumyajit9979/my-image-caption-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. F

    German Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). German Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/german-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the German Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    •
    Sources: Images are sourced from public databases and proprietary collections.
    •
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    •
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    •
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    •
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    •
    Daily Life: Images about household objects, activities, and daily routines.
    •
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    •
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    •
    Human Activities: Images about people, their actions, professions, and interactions.
    •
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    •
    Food and Dining: Images about different foods, meals, and dining settings.
    •
    Education: Images related to educational settings, materials, and activities.
    •
    Sports and Recreation: Images about various sports, games, and recreational activities.
    •
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    •
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    •Caption Details:
    •
    Human Generated: Each caption is generated by native German people.
    •
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    •
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    •Image File Name
    •Category
    •Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    •
    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  10. h

    my-image-captioning-dataset-test

    • huggingface.co
    Updated Oct 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyllah (2024). my-image-captioning-dataset-test [Dataset]. https://huggingface.co/datasets/tuy20212521/my-image-captioning-dataset-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2024
    Authors
    Lyllah
    Description

    tuy20212521/my-image-captioning-dataset-test dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. Swahili Image Captioning Dataset

    • zenodo.org
    csv
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grevians Onula; Shikali Casper; Omala Andrew; Grevians Onula; Shikali Casper; Omala Andrew (2023). Swahili Image Captioning Dataset [Dataset]. http://doi.org/10.5281/zenodo.8087852
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Grevians Onula; Shikali Casper; Omala Andrew; Grevians Onula; Shikali Casper; Omala Andrew
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SwaFlickr8k dataset is an extension of the well-known Flickr8k dataset, specifically designed for image captioning tasks. It includes a collection of images and corresponding captions written in Swahili. With 8,091 unique images and 40,455 captions, this dataset provides a valuable resource for research and development in the field of image understanding and language processing, particularly in the context of Swahili language.

  12. Image captioning dataset for human activities

    • zenodo.org
    jpeg, txt
    Updated Jan 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TU; TU (2021). Image captioning dataset for human activities [Dataset]. http://doi.org/10.5281/zenodo.4453525
    Explore at:
    jpeg, txtAvailable download formats
    Dataset updated
    Jan 21, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    TU; TU
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An image captioning dataset including images of humans performing various activities. The included images include the following activities: walking, running, sleeping, swimming, sitting, jumping, riding, climbing, drinking and reading.

  13. Image captioning dataset

    • kaggle.com
    Updated Apr 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malavika Shamesh (2023). Image captioning dataset [Dataset]. https://www.kaggle.com/datasets/malavikashamesh/image-captioning-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Malavika Shamesh
    Description

    Dataset

    This dataset was created by Malavika Shamesh

    Contents

  14. Data and codes to reproduce the study, titled:" Understanding remote sensing...

    • figshare.com
    zip
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous Contributor (2023). Data and codes to reproduce the study, titled:" Understanding remote sensing imagery like reading a text document: What can remote sensing image captioning offer?" [Dataset]. http://doi.org/10.6084/m9.figshare.24665631.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anonymous Contributor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Remote sensing imagery offers intricate and nuanced data, emphasizing the need for a profound understanding of the relationships among varied geographical elements and events. In this study, we explore the transitions from the image domain to the text domain by employing four state-of-the-art image captioning algorithms, i.e., BLIP, mPLUG, OFA, and X-VLM. Specifically, we investigate 1) the stability of these image captioning algorithms for remote sensing image captioning, 2) the preservation of similarity between images and their corresponding captions, and 3) the characteristics of their caption embedding spaces. The results suggest a moderate consistency across generated captions from different image captioning models, with observable variations contingent upon the urban entities presented. In addition, a dynamic relationship emerges between image space and the corresponding caption space, evidenced by their fluctuated correlation coefficient. Most importantly, patterns within the caption embedding space align with the observed land cover and land use in the image patches, reaffirming the potential of our pilot work as an impactful analytical approach in future remote sensing analytics. We advocate that integrating image captioning techniques with remote sensing imagery paves the way for an innovative data extraction and interpretation approach with diverse applications. This dataset contains the data and code to reproduce this study.

  15. F

    Finnish Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Finnish Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/finnish-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Finnish Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    •
    Sources: Images are sourced from public databases and proprietary collections.
    •
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    •
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    •
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    •
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    •
    Daily Life: Images about household objects, activities, and daily routines.
    •
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    •
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    •
    Human Activities: Images about people, their actions, professions, and interactions.
    •
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    •
    Food and Dining: Images about different foods, meals, and dining settings.
    •
    Education: Images related to educational settings, materials, and activities.
    •
    Sports and Recreation: Images about various sports, games, and recreational activities.
    •
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    •
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    •Caption Details:
    •
    Human Generated: Each caption is generated by native Finnish people.
    •
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    •
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    •Image File Name
    •Category
    •Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    •
    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  16. R

    Safevision:image Captioning Dataset

    • universe.roboflow.com
    zip
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VLMBLIP (2025). Safevision:image Captioning Dataset [Dataset]. https://universe.roboflow.com/vlmblip/safevision-image-captioning/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 8, 2025
    Dataset authored and provided by
    VLMBLIP
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Safety Descriptions
    Description

    SafeVision:IMAGE CAPTIONING

    ## Overview
    
    SafeVision:IMAGE CAPTIONING is a dataset for vision language (multimodal) tasks - it contains Safety annotations for 314 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  17. Image Captioning Dataset(without links)

    • kaggle.com
    zip
    Updated Sep 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nabi Nabiyev (2020). Image Captioning Dataset(without links) [Dataset]. https://www.kaggle.com/nabiyevnabi/image-captioning-datasetwithout-links
    Explore at:
    zip(33855315 bytes)Available download formats
    Dataset updated
    Sep 7, 2020
    Authors
    Nabi Nabiyev
    Description

    Dataset

    This dataset was created by Nabi Nabiyev

    Contents

    It contains the following files:

  18. m

    HybridDataset

    • data.mendeley.com
    Updated May 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Safayet Masum (2022). HybridDataset [Dataset]. http://doi.org/10.17632/9drtzfjpbf.1
    Explore at:
    Dataset updated
    May 9, 2022
    Authors
    Safayet Masum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Image captioning in Bengali. The image dataset is collected from Flickr8K and Banglalekha dataset. The captions are generated using google translate.

  19. a

    Flickr30k

    • datasets.activeloop.ai
    • opendatalab.com
    • +1more
    deeplake
    Updated Mar 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie. (2022). Flickr30k [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/flickr30k-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 30, 2022
    Authors
    Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaie.
    License

    Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
    License information was derived automatically

    Description

    A dataset of 30,000 images with 5 captions per image. The dataset was created by researchers at Stanford University and is used for research in machine learning and natural language processing tasks such as image captioning and visual question answering.

  20. 10,000 Image Caption Data of Gestures

    • m.nexdata.ai
    • nexdata.ai
    Updated Oct 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). 10,000 Image Caption Data of Gestures [Dataset]. https://m.nexdata.ai/datasets/llm/1287
    Explore at:
    Dataset updated
    Oct 3, 2023
    Dataset authored and provided by
    Nexdata
    Variables measured
    Data size, Data format, Text length, Accuracy rate, Age distribution, Race distribution, Gender distribution, Collecting diversity, Description language, Collecting environment, and 1 more
    Description

    10,000 Image caption data of gestures, mainly for young and middle-aged people, the collection environment includes indoor scenes and outdoor scenes, including various collection environments, various seasons, and various collection angles. The description language is English, mainly describing hand characteristics such as hand movements, gestures, image acquisition angles, gender, age, etc.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ahmed Sabir (2022). Textual-Image-Caption-Dataset [Dataset]. https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset

Textual-Image-Caption-Dataset

AhmedSSabir/Textual-Image-Caption-Dataset

image captioning language grounding visual semantic

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2022
Authors
Ahmed Sabir
Description

Update: OCT-2023

Add v2 with recent SoTA model swinV2 classifier for both soft/hard-label visual_caption_cosine_score_v2 with person label (0.2, 0.3 and 0.4)

  Introduction

Modern image captaining relies heavily on extracting knowledge, from images such as objects, to capture the concept of static story in the image. In this paper, we propose a textual visual context dataset for captioning, where the publicly available dataset COCO caption (Lin et al., 2014) has been… See the full description on the dataset page: https://huggingface.co/datasets/AhmedSSabir/Textual-Image-Caption-Dataset.

Search
Clear search
Close search
Google apps
Main menu