100+ datasets found
  1. COCO Image Captioning Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Narayan (2023). COCO Image Captioning Dataset [Dataset]. https://www.kaggle.com/datasets/nikhil7280/coco-image-caption
    Explore at:
    zip(14747667844 bytes)Available download formats
    Dataset updated
    Jun 25, 2023
    Authors
    Nikhil Narayan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Basic Info

    The COCO dataset (Common Objects in Context) is a large-scale dataset of images and annotations for object detection, segmentation, and captioning. It is one of the most popular datasets in computer vision research, and has been used to train and evaluate many state-of-the-art models.

    The COCO dataset is a valuable resource for researchers working on object detection, segmentation, and captioning. It is a large, challenging dataset that provides a wide variety of images and annotations. The COCO dataset is also well-organized and easy to use.

    To have more info about the dataset, you can visit its website

    The Data Uploaded Consists of Images and their respective annotations to be used for Image captioning

    2014 Data

    The 2014 Data is used as Training and Validation data for the task of Image captioning

    2017 Data

    The 2017 Data is used as Testing data

  2. g

    Vehicle Image Captioning Dataset

    • gts.ai
    json
    Updated Jul 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Vehicle Image Captioning Dataset [Dataset]. https://gts.ai/dataset-download/vehicle-image-captioning-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Unlock detailed insights into road scenes with our Vehicle Image Captioning Dataset. Featuring over 1000 high-resolution images.

  3. RSICD Image Caption Dataset

    • kaggle.com
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). RSICD Image Caption Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/rsicd-image-caption-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    RSICD Image Caption Dataset

    RSICD Image Caption Dataset

    By Arto (From Huggingface) [source]

    About this dataset

    The train.csv file contains a list of image filenames, captions, and the actual images used for training the image captioning models. Similarly, the test.csv file includes a separate set of image filenames, captions, and images specifically designated for testing the accuracy and performance of the trained models.

    Furthermore, the valid.csv file contains a unique collection of image filenames with their respective captions and images that serve as an independent validation set to evaluate the models' capabilities accurately.

    Each entry in these CSV files includes both a filename string that indicates the name or identifier of an image file stored in another location or directory. Additionally,** each entry also provides a list (or multiple rows) o**f strings representing written descriptions or captions describing each respective image given its filename.

    Considering these details about this dataset's structure, it can be immensely valuable to researchers, developers, and enthusiasts working on developing innovative computer vision algorithms such as automatic text generation based on visual content analysis. Whether it's training machine learning models to automatically generate relevant captions based on new unseen images or evaluating existing systems' performance against diverse criteria.

    Stay updated with cutting-edge research trends by leveraging this comprehensive dataset containing not only captio**ns but also corresponding imag**es across different sets specifically designed to cater to varied purposes within computer vision tasks. »

    How to use the dataset

    Overview of the Dataset

    The dataset consists of three primary files: train.csv, test.csv, and valid.csv. These files contain information about image filenames and their respective captions. Each file includes multiple captions for each image to support diverse training techniques.

    Understanding the Files

    • train.csv: This file contains filenames (filename column) and their corresponding captions (captions column) for training your image captioning model.
    • test.csv: The test set is included in this file, which contains a similar structure as that of train.csv. The purpose of this file is to evaluate your trained models on unseen data.
    • valid.csv: This validation set provides images with their respective filenames (filename) and captions (captions). It allows you to fine-tune your models based on performance during evaluation.

    Getting Started

    To begin utilizing this dataset effectively, follow these steps:

    • Extract the zip file containing all relevant data files onto your local machine or cloud environment.
    • Familiarize yourself with each CSV file's structure: train.csv, test.csv, and valid.csv. Understand how information like filename(s) (filename) corresponds with its respective caption(s) (captions).
    • Depending on your specific use case or research goals, determine which portion(s) of the dataset you wish to work with (e.g., only train or train+validation).
    • Load the dataset into your preferred programming environment or machine learning framework, ensuring you have the necessary dependencies installed.
    • Preprocess the dataset as needed, such as resizing images to a specific dimension or encoding captions for model training purposes.
    • Split the data into training, validation, and test sets according to your experimental design requirements.
    • Use appropriate algorithms and techniques to train your image captioning models on the provided data.

    Enhancing Model Performance

    To optimize model performance using this dataset, consider these tips:

    • Explore different architectures and pre-trained models specifically designed for image captioning tasks.
    • Experiment with various natural language

    Research Ideas

    • Image Captioning: This dataset can be used to train and evaluate image captioning models. The captions can be used as target labels for training, and the images can be paired with the captions to generate descriptive captions for test images.
    • Image Retrieval: The dataset can be used for image retrieval tasks where given a query caption, the model needs to retrieve the images that best match the description. This can be useful in applications such as content-based image search.
    • Natural Language Processing: The dataset can also be used for natural language processing tasks such as text generation or machine translation. The captions in this dataset are descriptive ...
  4. j

    Structural-Damage Image Captioning Dataset

    • jstagedata.jst.go.jp
    txt
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OSAMA ABBAS; Ji DANG (2023). Structural-Damage Image Captioning Dataset [Dataset]. http://doi.org/10.60336/data.jsceiiai.24736914.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    Japan Society of Civil Engineers
    Authors
    OSAMA ABBAS; Ji DANG
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset consists of 2000 images used from (PEER Hub Image-Net) and a caption file in text format.

    The caption uses a very limited vocabulary set designed to get better results while also describing plenty of information with a length of 18 vocab.

  5. F

    English Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/english-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    Sources: Images are sourced from public databases and proprietary collections.
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    Daily Life: Images about household objects, activities, and daily routines.
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    Human Activities: Images about people, their actions, professions, and interactions.
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    Food and Dining: Images about different foods, meals, and dining settings.
    Education: Images related to educational settings, materials, and activities.
    Sports and Recreation: Images about various sports, games, and recreational activities.
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    Caption Details:
    Human Generated: Each caption is generated by native English people.
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    Image File Name
    Category
    Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  6. h

    kag100-image-captioning-dataset

    • huggingface.co
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavan Kumar Jonnakuti (2023). kag100-image-captioning-dataset [Dataset]. https://huggingface.co/datasets/jpawan33/kag100-image-captioning-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2023
    Authors
    Pavan Kumar Jonnakuti
    Description

    Dataset Card for "kag100-image-captioning-dataset"

    More Information needed

  7. h

    my-image-captioning-dataset

    • huggingface.co
    Updated Apr 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Park (2023). my-image-captioning-dataset [Dataset]. https://huggingface.co/datasets/SKyu/my-image-captioning-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2023
    Authors
    Park
    Description

    Dataset Card for "my-image-captioning-dataset"

    More Information needed

  8. Flickr8k-Images-Captions

    • kaggle.com
    zip
    Updated Jul 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aladdin Persson (2020). Flickr8k-Images-Captions [Dataset]. https://www.kaggle.com/datasets/aladdinpersson/flickr8kimagescaptions
    Explore at:
    zip(1113116819 bytes)Available download formats
    Dataset updated
    Jul 12, 2020
    Authors
    Aladdin Persson
    Description

    Dataset

    A small image captioning dataset that is perfect to get started in image captioning. I have also made a video on building an image captioning model in PyTorch where we use this dataset that you could check out: https://youtu.be/y2BaTt1fxJU

  9. h

    fkr30k-image-captioning-dataset

    • huggingface.co
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pavan Kumar Jonnakuti (2023). fkr30k-image-captioning-dataset [Dataset]. https://huggingface.co/datasets/jpawan33/fkr30k-image-captioning-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2023
    Authors
    Pavan Kumar Jonnakuti
    Description

    Dataset Card for "fkr30k-image-captioning-dataset"

    More Information needed

  10. F

    Polish Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Polish Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/polish-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Polish Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    Sources: Images are sourced from public databases and proprietary collections.
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    Daily Life: Images about household objects, activities, and daily routines.
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    Human Activities: Images about people, their actions, professions, and interactions.
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    Food and Dining: Images about different foods, meals, and dining settings.
    Education: Images related to educational settings, materials, and activities.
    Sports and Recreation: Images about various sports, games, and recreational activities.
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    Caption Details:
    Human Generated: Each caption is generated by native Polish people.
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    Image File Name
    Category
    Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  11. F

    Dutch Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Dutch Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/dutch-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Dutch Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    Sources: Images are sourced from public databases and proprietary collections.
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    Daily Life: Images about household objects, activities, and daily routines.
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    Human Activities: Images about people, their actions, professions, and interactions.
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    Food and Dining: Images about different foods, meals, and dining settings.
    Education: Images related to educational settings, materials, and activities.
    Sports and Recreation: Images about various sports, games, and recreational activities.
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    Caption Details:
    Human Generated: Each caption is generated by native Dutch people.
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    Image File Name
    Category
    Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate and

  12. t

    Remote Sensing Image Captioning - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Remote Sensing Image Captioning - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/remote-sensing-image-captioning
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    Remote Sensing Image Captioning Dataset (RSICD) and UCM-captions dataset for remote sensing image captioning

  13. Image Caption Models from HuggingFace Hub

    • kaggle.com
    zip
    Updated Feb 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanrei(nama) (2023). Image Caption Models from HuggingFace Hub [Dataset]. https://www.kaggle.com/datasets/tanreinama/image-caption-models
    Explore at:
    zip(27655050862 bytes)Available download formats
    Dataset updated
    Feb 27, 2023
    Authors
    Tanrei(nama)
    Description
  14. Swahili Image Captioning Dataset

    • zenodo.org
    csv
    Updated Jun 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grevians Onula; Shikali Casper; Omala Andrew; Grevians Onula; Shikali Casper; Omala Andrew (2023). Swahili Image Captioning Dataset [Dataset]. http://doi.org/10.5281/zenodo.8087852
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Grevians Onula; Shikali Casper; Omala Andrew; Grevians Onula; Shikali Casper; Omala Andrew
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SwaFlickr8k dataset is an extension of the well-known Flickr8k dataset, specifically designed for image captioning tasks. It includes a collection of images and corresponding captions written in Swahili. With 8,091 unique images and 40,455 captions, this dataset provides a valuable resource for research and development in the field of image understanding and language processing, particularly in the context of Swahili language.

  15. h

    my-image-captioning-dataset

    • huggingface.co
    Updated Aug 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanvir Ahmed (2024). my-image-captioning-dataset [Dataset]. https://huggingface.co/datasets/akibc123/my-image-captioning-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2024
    Authors
    Tanvir Ahmed
    Description

    akibc123/my-image-captioning-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. F

    Vietnamese Image Captioning Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Vietnamese Image Captioning Dataset [Dataset]. https://www.futurebeeai.com/dataset/multi-modal-dataset/vietnamese-image-caption-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Vietnamese Language Image Captioning Dataset! A collection of Images with associated text captions to facilitate the development of AI models capable of generating high-quality captions for images. This dataset is meticulously crafted to support research and innovation in computer vision and natural language processing.

    Image Data

    This dataset features over 5,000 high-resolution images sourced from diverse categories and scenes. Each image is meticulously selected to encompass a wide array of contexts, objects, and environments, ensuring comprehensive coverage for training robust image captioning models.

    Sources: Images are sourced from public databases and proprietary collections.
    Clarity and Relevance: Each image is vetted for visual clarity and relevance, ensuring it accurately represents real-world scenarios.
    Copyright: All selected images are free from copyright restrictions, allowing for unrestricted use in research and development.
    Format: Images in the dataset are available in various formats like JPEG, PNG, and HEIC.
    Image Categories: The dataset spans a wide range of image categories to ensure thorough training, fine-tuning, and testing of image captioning models. categories include:
    Daily Life: Images about household objects, activities, and daily routines.
    Nature and Environment: Images related to natural scenes, plants, animals, and weather.
    Technology and Gadgets: Images about electronic devices, tools, and machinery.
    Human Activities: Images about people, their actions, professions, and interactions.
    Geography and Landmarks: Images related to specific locations, landmarks, and geographic features.
    Food and Dining: Images about different foods, meals, and dining settings.
    Education: Images related to educational settings, materials, and activities.
    Sports and Recreation: Images about various sports, games, and recreational activities.
    Transportation: Images about vehicles, travel methods, and transportation infrastructure.
    Cultural and Historical: Images about cultural artifacts, historical events, and traditions.

    Caption Data

    Each image in the dataset is paired with a high-quality descriptive caption. These captions are carefully crafted to provide detailed and contextually rich descriptions of the images, enhancing the dataset's utility for training sophisticated image captioning algorithms.

    Caption Details:
    Human Generated: Each caption is generated by native Vietnamese people.
    Quality Assurance: Captions are meticulously reviewed for linguistic accuracy, coherence, and relevance to the corresponding images.
    Contextual Relevance: Captions are generated by keeping the visual insights like objects, scenes, actions, and settings depicted in the images.

    Metadata

    Each image-caption pair is accompanied by comprehensive metadata to facilitate informed decision-making in model development:

    Image File Name
    Category
    Caption

    Usage and Applications

    The Image Captioning Dataset serves various applications across different domains:

    Training Image Captioning Models: Provides high-quality data for training and fine-tuning Generative AI models to generate accurate

  17. h

    vlm-image-captioning-dataset

    • huggingface.co
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KAUSHAL KUMAR SINGH (2025). vlm-image-captioning-dataset [Dataset]. https://huggingface.co/datasets/kasvnmtp/vlm-image-captioning-dataset
    Explore at:
    Dataset updated
    Oct 14, 2025
    Authors
    KAUSHAL KUMAR SINGH
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    kasvnmtp/vlm-image-captioning-dataset

      Dataset Description
    

    This is a custom Vision Language Model (VLM) dataset for image captioning tasks. The dataset contains image-text pairs suitable for finetuning vision-language models.

      Dataset Statistics
    

    Total Samples: 149,997 Train Samples: 74,998 Test Samples: 74,999 Features: image, text, sample_id

      Dataset Structure
    
    
    
    
    
      Data Fields
    

    image: PIL Image object text: Caption/description text for the… See the full description on the dataset page: https://huggingface.co/datasets/kasvnmtp/vlm-image-captioning-dataset.

  18. i

    Dataset for Image Captioning System (in Bangla)

    • ieee-dataport.org
    Updated Aug 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toshiba Kamruzzaman (2021). Dataset for Image Captioning System (in Bangla) [Dataset]. https://ieee-dataport.org/documents/dataset-image-captioning-system-bangla
    Explore at:
    Dataset updated
    Aug 2, 2021
    Authors
    Toshiba Kamruzzaman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    etc.

  19. Web-Harvested Image and Caption Dataset

    • kaggle.com
    zip
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Web-Harvested Image and Caption Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/web-harvested-image-and-caption-dataset
    Explore at:
    zip(233254845 bytes)Available download formats
    Dataset updated
    Dec 6, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Web-Harvested Image and Caption Dataset

    Web-Harvested Image and Caption Dataset

    By conceptual_captions (From Huggingface) [source]

    About this dataset

    The Conceptual Captions dataset, hosted on Kaggle, is a comprehensive and expansive collection of web-harvested images and their corresponding captions. With a staggering total of approximately 3.3 million images, this dataset offers a rich resource for training and evaluating image captioning models.

    Unlike other image caption datasets, the unique feature of Conceptual Captions lies in the diverse range of styles represented in its captions. These captions are sourced from the web, specifically extracted from the Alt-text HTML attribute associated with web images. This approach ensures that the dataset encompasses a broad variety of textual descriptions that accurately reflect real-world usage scenarios.

    To guarantee the quality and reliability of these captions, an elaborate automatic pipeline has been developed for extracting, filtering, and transforming each image/caption pair. The goal behind this diligent curation process is to provide clean, informative, fluent, and learnable captions that effectively describe their corresponding images.

    The dataset itself consists of two primary components: train.csv and validation.csv files. The train.csv file comprises an extensive collection of over 3.3 million web-harvested images along with their respective carefully curated captions. Each image is accompanied by its unique URL to allow easy retrieval during model training.

    On the other hand, validation.csv contains approximately 100,000 image URLs paired with their corresponding informative captions. This subset serves as an invaluable resource for validating and evaluating model performance after training on the larger train.csv set.

    Researchers and data scientists can leverage this remarkable Conceptual Captions dataset to develop state-of-the-art computer vision models focused on tasks such as image understanding, natural language processing (NLP), multimodal learning techniques combining visual features with textual context comprehension – among others.

    By providing such an extensive array of high-quality images coupled with richly descriptive captions acquired from various sources across the internet landscape through a meticulous curation process - Conceptual Captions empowers professionals working in fields like artificial intelligence (AI), machine learning, computer vision, and natural language processing to explore new frontiers in visual understanding and textual comprehension

    How to use the dataset

    Title: How to Use the Conceptual Captions Dataset for Web-Harvested Image and Caption Analysis

    Introduction: The Conceptual Captions dataset is an extensive collection of web-harvested images, each accompanied by a caption. This guide aims to help you understand and effectively utilize this dataset for various applications, such as image captioning, natural language processing, computer vision tasks, and more. Let's dive into the details!

    Step 1: Acquiring the Dataset

    Step 2: Exploring the Dataset Files After downloading the dataset files ('train.csv' and 'validation.csv'), you'll find that each file consists of multiple columns containing valuable information:

    a) 'caption': This column holds captions associated with each image. It provides textual descriptions that can be used in various NLP tasks. b) 'image_url': This column contains URLs pointing to individual images in the dataset.

    Step 3: Understanding Dataset Structure The Conceptual Captions dataset follows a tabular format where each row represents an image/caption pair. Combining knowledge from both train.csv and validation.csv files will give you access to a diverse range of approximately 3.4 million paired examples.

    Step 4: Preprocessing Considerations Due to its web-harvested nature, it is recommended to perform certain preprocessing steps on this dataset before utilizing it for your specific task(s). Some considerations include:

    a) Text Cleaning: Perform basic text cleaning techniques such as removing special characters or applying sentence tokenization. b) Filtering: Depending on your application, you may need to apply specific filters to remove captions that are irrelevant, inaccurate, or noisy. c) Language Preprocessing: Consider using techniques like lemmatization or stemming if it suits your task.

    Step 5: Training and Evaluation Once you have preprocessed the dataset as per your requirements, it's time to train your models! The Conceptual Captions dataset can be used for a range of tasks such as image captioni...

  20. g

    Flickr8k Image Dataset

    • gts.ai
    json
    Updated Jan 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2025). Flickr8k Image Dataset [Dataset]. https://gts.ai/dataset-download/page/83/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 25, 2025
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore the Flickr8k Image Dataset, featuring 8,092 images with multiple descriptive captions. This dataset is widely used for image captioning, visual recognition, and AI research in computer vision and natural language processing.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nikhil Narayan (2023). COCO Image Captioning Dataset [Dataset]. https://www.kaggle.com/datasets/nikhil7280/coco-image-caption
Organization logo

COCO Image Captioning Dataset

Unleashing the untold stories behind captivating images

Explore at:
zip(14747667844 bytes)Available download formats
Dataset updated
Jun 25, 2023
Authors
Nikhil Narayan
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Basic Info

The COCO dataset (Common Objects in Context) is a large-scale dataset of images and annotations for object detection, segmentation, and captioning. It is one of the most popular datasets in computer vision research, and has been used to train and evaluate many state-of-the-art models.

The COCO dataset is a valuable resource for researchers working on object detection, segmentation, and captioning. It is a large, challenging dataset that provides a wide variety of images and annotations. The COCO dataset is also well-organized and easy to use.

To have more info about the dataset, you can visit its website

The Data Uploaded Consists of Images and their respective annotations to be used for Image captioning

2014 Data

The 2014 Data is used as Training and Validation data for the task of Image captioning

2017 Data

The 2017 Data is used as Testing data

Search
Clear search
Close search
Google apps
Main menu