69 datasets found

RSICD Image Caption Dataset
kaggle.com
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). RSICD Image Caption Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/rsicd-image-caption-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
RSICD Image Caption Dataset

RSICD Image Caption Dataset

By Arto (From Huggingface) [source]

About this dataset

The train.csv file contains a list of image filenames, captions, and the actual images used for training the image captioning models. Similarly, the test.csv file includes a separate set of image filenames, captions, and images specifically designated for testing the accuracy and performance of the trained models.

Furthermore, the valid.csv file contains a unique collection of image filenames with their respective captions and images that serve as an independent validation set to evaluate the models' capabilities accurately.

Each entry in these CSV files includes both a filename string that indicates the name or identifier of an image file stored in another location or directory. Additionally,** each entry also provides a list (or multiple rows) o**f strings representing written descriptions or captions describing each respective image given its filename.

Considering these details about this dataset's structure, it can be immensely valuable to researchers, developers, and enthusiasts working on developing innovative computer vision algorithms such as automatic text generation based on visual content analysis. Whether it's training machine learning models to automatically generate relevant captions based on new unseen images or evaluating existing systems' performance against diverse criteria.

Stay updated with cutting-edge research trends by leveraging this comprehensive dataset containing not only captio**ns but also corresponding imag**es across different sets specifically designed to cater to varied purposes within computer vision tasks. »

How to use the dataset

Overview of the Dataset

The dataset consists of three primary files: train.csv, test.csv, and valid.csv. These files contain information about image filenames and their respective captions. Each file includes multiple captions for each image to support diverse training techniques.

Understanding the Files

train.csv: This file contains filenames (filename column) and their corresponding captions (captions column) for training your image captioning model.

test.csv: The test set is included in this file, which contains a similar structure as that of train.csv. The purpose of this file is to evaluate your trained models on unseen data.

valid.csv: This validation set provides images with their respective filenames (filename) and captions (captions). It allows you to fine-tune your models based on performance during evaluation.

Getting Started

To begin utilizing this dataset effectively, follow these steps:

Extract the zip file containing all relevant data files onto your local machine or cloud environment.

Familiarize yourself with each CSV file's structure: train.csv, test.csv, and valid.csv. Understand how information like filename(s) (filename) corresponds with its respective caption(s) (captions).

Depending on your specific use case or research goals, determine which portion(s) of the dataset you wish to work with (e.g., only train or train+validation).

Load the dataset into your preferred programming environment or machine learning framework, ensuring you have the necessary dependencies installed.

Preprocess the dataset as needed, such as resizing images to a specific dimension or encoding captions for model training purposes.

Split the data into training, validation, and test sets according to your experimental design requirements.

Use appropriate algorithms and techniques to train your image captioning models on the provided data.

Enhancing Model Performance

To optimize model performance using this dataset, consider these tips:

Explore different architectures and pre-trained models specifically designed for image captioning tasks.

Experiment with various natural language

Research Ideas

Image Captioning: This dataset can be used to train and evaluate image captioning models. The captions can be used as target labels for training, and the images can be paired with the captions to generate descriptive captions for test images.

Image Retrieval: The dataset can be used for image retrieval tasks where given a query caption, the model needs to retrieve the images that best match the description. This can be useful in applications such as content-based image search.

Natural Language Processing: The dataset can also be used for natural language processing tasks such as text generation or machine translation. The captions in this dataset are descriptive ...
Stylish Product Image Dataset
kaggle.com
zip
Updated May 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Santosh Kumar (2022). Stylish Product Image Dataset [Dataset]. https://www.kaggle.com/datasets/kuchhbhi/stylish-product-image-dataset
Explore at:
zip(9509715613 bytes)Available download formats
Dataset updated
May 21, 2022
Authors
Santosh Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context:

The idea came to my mind to scrap this data. I was working on an e-commerce project Fashion Product Recommendation (an end-to-end project). In this project, upload any fashion image and it will show the 10 closest recommendations.

https://user-images.githubusercontent.com/40932902/169657090-20d3342d-d472-48e3-bc34-8a9686b09961.png" alt="">

https://user-images.githubusercontent.com/40932902/169657035-870bb803-f985-482a-ac16-789d0fcf2a2b.png" alt="">

https://user-images.githubusercontent.com/40932902/169013855-099838d6-8612-45ce-8961-28ccf44f81f7.png" alt="">

I completed my project on this image dataset . The problem I faced while deploying on the Heroku server. Due to the large project file size, I was unable to deploy as Heroku offers limited memory space for a free account.

As currently, I am only familiar with Heroku. Learning AWS for big projects. So, I decided to scrap my own image dataset with much more information that can help me to transform this project to the next level. Scraped this data from flipkart.com(e-commerce website) in two formats Image and textual data in tabular format.

About this Dataset:

This dataset contains 65k images (400x450 pixel)) of fashion/style products and accessories like clothing, footwear, accessories, and many more. There is a CSV file also mapped with the image name and the id column in tabular data. The name of the image is in a unique numerical format like 1.png, 62299.png Image name and Id columns are the same. So, suppose you want to find the details of any image then you can find them using the image name id, go to the Id column in the csv file and that id rows will be the details of the image. You can find the notebook in the code section which I used to scrap this data.

Columns of CSV Dataset: 1. id : Unique id same as the image name 2. brand: Brand name of the product 3. title: Title of the product 4. sold_price: selling price of the product 5. actual_price: Actual price of the product 6. url : unique URL of every product 7. img: Image URL

How did helped me this dataset: 1. I trained my CNN model using the image data, that's the only use of the image dataset. 2. In my front-end page of the project to display results, I used Image URL and displayed after extracting from the web. This helped me to not upload the image dataset with the project on the server and this saved huge memory space. 3. Using the url displaying live price and** ratings** from the Flipkart website. 4. And there is a Buy button mapped with the url you will be redirected to the original product page and buy it from there. after using this dataset I changed my project name from Fashion Product Recommender to Flipkart Fashion Product Recommender. 😄😄😄

Still, the memory problem was not resolved as the model trained file was above 500MB on the complete dataset. So I tried on multiple sets and finally, I deployed after training on 1000 images only. In the future, I will try on another platform to deploy the complete project. I learned many new things while working on this dataset.

Your Job:

You can use this dataset in your deep learning projects, go and try to create interesting projects.

You can use CSV data in your Machine Learning projects, first you need to do feature construction from the title columns as there is much information hidden and some data cleaning required.

There is two complete records missing in csv data, your job is to find the missing data with the help of image dataset and fill as per your knowledge.

This is a huge dataset in terms of records as well as memory size. To download this dataset you need high internet speed.

To download the same dataset in small size less than 500mb you can find it here, everything is the same as this dataset only I reduced the pixel of the image from 400x450px to ** 65x80pixels**.

Pls, Rate this work

Support with Upvote... that encourages me to research more.

Share your feedback, reviews, and suggestions if any.

Thanks!!
Web-Harvested Image and Caption Dataset
kaggle.com
zip
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Web-Harvested Image and Caption Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/web-harvested-image-and-caption-dataset
Explore at:
zip(233254845 bytes)Available download formats
Dataset updated
Dec 6, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description

Web-Harvested Image and Caption Dataset

Web-Harvested Image and Caption Dataset

By conceptual_captions (From Huggingface) [source]

About this dataset

The Conceptual Captions dataset, hosted on Kaggle, is a comprehensive and expansive collection of web-harvested images and their corresponding captions. With a staggering total of approximately 3.3 million images, this dataset offers a rich resource for training and evaluating image captioning models.

Unlike other image caption datasets, the unique feature of Conceptual Captions lies in the diverse range of styles represented in its captions. These captions are sourced from the web, specifically extracted from the Alt-text HTML attribute associated with web images. This approach ensures that the dataset encompasses a broad variety of textual descriptions that accurately reflect real-world usage scenarios.

To guarantee the quality and reliability of these captions, an elaborate automatic pipeline has been developed for extracting, filtering, and transforming each image/caption pair. The goal behind this diligent curation process is to provide clean, informative, fluent, and learnable captions that effectively describe their corresponding images.

The dataset itself consists of two primary components: train.csv and validation.csv files. The train.csv file comprises an extensive collection of over 3.3 million web-harvested images along with their respective carefully curated captions. Each image is accompanied by its unique URL to allow easy retrieval during model training.

On the other hand, validation.csv contains approximately 100,000 image URLs paired with their corresponding informative captions. This subset serves as an invaluable resource for validating and evaluating model performance after training on the larger train.csv set.

Researchers and data scientists can leverage this remarkable Conceptual Captions dataset to develop state-of-the-art computer vision models focused on tasks such as image understanding, natural language processing (NLP), multimodal learning techniques combining visual features with textual context comprehension – among others.

By providing such an extensive array of high-quality images coupled with richly descriptive captions acquired from various sources across the internet landscape through a meticulous curation process - Conceptual Captions empowers professionals working in fields like artificial intelligence (AI), machine learning, computer vision, and natural language processing to explore new frontiers in visual understanding and textual comprehension

How to use the dataset

Title: How to Use the Conceptual Captions Dataset for Web-Harvested Image and Caption Analysis

Introduction: The Conceptual Captions dataset is an extensive collection of web-harvested images, each accompanied by a caption. This guide aims to help you understand and effectively utilize this dataset for various applications, such as image captioning, natural language processing, computer vision tasks, and more. Let's dive into the details!

Step 1: Acquiring the Dataset

Step 2: Exploring the Dataset Files After downloading the dataset files ('train.csv' and 'validation.csv'), you'll find that each file consists of multiple columns containing valuable information:

a) 'caption': This column holds captions associated with each image. It provides textual descriptions that can be used in various NLP tasks. b) 'image_url': This column contains URLs pointing to individual images in the dataset.

Step 3: Understanding Dataset Structure The Conceptual Captions dataset follows a tabular format where each row represents an image/caption pair. Combining knowledge from both train.csv and validation.csv files will give you access to a diverse range of approximately 3.4 million paired examples.

Step 4: Preprocessing Considerations Due to its web-harvested nature, it is recommended to perform certain preprocessing steps on this dataset before utilizing it for your specific task(s). Some considerations include:

a) Text Cleaning: Perform basic text cleaning techniques such as removing special characters or applying sentence tokenization. b) Filtering: Depending on your application, you may need to apply specific filters to remove captions that are irrelevant, inaccurate, or noisy. c) Language Preprocessing: Consider using techniques like lemmatization or stemming if it suits your task.

Step 5: Training and Evaluation Once you have preprocessed the dataset as per your requirements, it's time to train your models! The Conceptual Captions dataset can be used for a range of tasks such as image captioni...
h
SportsImageClassification
huggingface.co
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HES-XPLAIN (2024). SportsImageClassification [Dataset]. https://huggingface.co/datasets/HES-XPLAIN/SportsImageClassification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 1, 2024
Dataset authored and provided by
HES-XPLAIN
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Sports Image Classification dataset

From Kaggle: 100 Sports Image Classification Collection of sports images covering 100 different sports. Images are 224x224x3 jpg format. Data is separated into train, test and valid directories.

13493 train images 500 test images 500 validate images

Additionallly a csv file is included for those that wish to use it to create there own train, test and validation datasets.

Clone

git clone… See the full description on the dataset page: https://huggingface.co/datasets/HES-XPLAIN/SportsImageClassification.
Z
Sound field image dataset
data.niaid.nih.gov
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kenji; Daiki; Noboru; Takehiro (2024). Sound field image dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8357752
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Moriya
Harada
Takeuchi
Ishikawa
Authors
Kenji; Daiki; Noboru; Takehiro
Description
Description

This sound field image dataset contains clean-noisy pairs of complex-valued sound-field images generated by 2D acoustic simulations. The dataset was initially prepared for deep sound-field denoiser (https://github.com/nttcslab/deep-sound-field-denoiser), a DNN-based denoising method for optically measured sound fields. Since the data is a two-dimensional sound field based on the Helmholtz equation, one can use this dataset for any acoustic application. Please check our GitHub repository and paper for details.

Directory structure

The dataset contains three directories: training, validation, and evaluation. Each directory contains "soundsource#" sub-directories (# represents the number of sound sources used in the acoustic simulation). Each sub-directory has three h5 files for data (clean, white noise, and speckle noise) and three CSV files listing random parameter values used in the simulation.

/training

/soundsource#

constants.csv

random_variable_ranges.csv

random_variables.csv

sf_true.h5

sf_noise_white.h5

sf_noise_speckle.h5

Condition of use

This dataset is available under the attached license file. Read the terms and conditions in NTTSoftwareLicenseAgreement.pdf carefully.

Citation

If you use this dataset, please cite the following paper.

K. Ishikawa, D. Takeuchi, N. Harada, and T. Moriya ``Deep sound-field denoiser: optically-measured sound-field denoising using deep neural network,'' arXiv:2304.14923 (2023).
Astronomical Image and CSV Dataset
kaggle.com
zip
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mighty Glow (2025). Astronomical Image and CSV Dataset [Dataset]. https://www.kaggle.com/datasets/mightyglow/astronomical-image-and-csv-dataset
Explore at:
zip(73225502097 bytes)Available download formats
Dataset updated
Apr 23, 2025
Authors
Mighty Glow
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
🪐 Astronomical Objects Dataset: 50,000 Images + Metadata

Welcome to the Astronomical Objects Dataset, a comprehensive collection of 50,000 high-resolution images of celestial bodies, including asteroids, planets, moons, stars, galaxies, nebulae, black holes, exoplanets, dwarf planets, constellations and stars and man-made space objects like rockets, space debris, spacecrafts (landers,orbiters,probes,rovers),satellites and space stations. This dataset is curated for AI/ML research, astronomy education, and image classification tasks in the realm of space exploration.

🌌 What's Inside?

📁 Images

50,000+ images organized by class (e.g., /asteroids, /planets, /moons, etc.)

Varying resolutions, optimized for computer vision models

Includes real astronomical captures from space missions and observatories

Also includes filtered images for Machine Learning Model, the filters are specific for astronomical images

📄 CSV Metadata Files (Column Descriptions)

For each category (e.g., asteroids, planets, etc.), there's a corresponding CSV file containing: - name: Common or scientific name of the object - description: Short summary about the object - distance_from_earth_km: Average distance from Earth - discovery_year: Year the object was discovered (if applicable) - diameter_km, mass_kg: Physical characteristics - fun_facts : Fun Facts about the specific object in that category

🤖 Python Bots

Image Rotator Bot - https://github.com/MightyGlow/Image-Rotator-Bot - This bot rotates an inputed image 2 degrees and saves each iteration in a single specified folder

File Renaming Bot - https://github.com/MightyGlow/File-Renaming-Bot - This bot renames each file in a entered folder in a numbered format. Eg: Earth_1, Earth_2

Filter Imaging Bot - https://github.com/MightyGlow/Filter-Imaging-Bot - This bot applies different filters to each image in the inputed folder and save it in the destinaation folder. There are various filters entered that you can choose from based on your preferences. ####Star each of the above repository after using it to increase its reach so others can also apply it.

🔭 Use Cases

Training deep learning models for astronomical object classification

Building educational applications or interactive visualizations

Developing AI-powered space exploration tools

Supporting research in astrophysics and space sciences

🧠 Ideal For

Data scientists and machine learning practitioners

Astronomy enthusiasts and educators

Students working on space-related AI projects

Researchers building AI models for satellite or telescope data

📚 Upvote

Upvote this dataset for others to see and use it for their specific projects and purposes.

🚀 Let's Explore the Cosmos with AI!

Dive into the dataset and start building intelligent systems that can see and understand the universe like never before.

Let me know in the comments if you have more images that can be added to upgrade this dataset. I had to manually create this dataset as there are not many image datasets with specific astrnomical images for different categories each.
Data from: Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB...
zenodo.org
observatorio-cientifico.ua.es
+2more
text/x-python, zip
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik (2025). Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery annotated for global land use/land cover mapping with deep learning (License CC BY 4.0) [Dataset]. http://doi.org/10.5281/zenodo.6941662
Explore at:
zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6941662
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE).

Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames):

Land Cover Class ID: is the identification number of each LULC class

Land Cover Class Short Name: is the short name of each LULC class

Image ID: is the identification number of each image within its corresponding LULC class

Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products

GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image

Latitude: is the latitude of the center point of each image

Longitude: is the longitude of the center point of each image

Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes

Administrative Department Level1: is the administrative level 1 name to which each image belongs

Administrative Department Level2: is the administrative level 2 name to which each image belongs

Locality: is the name of the locality to which each image belongs

Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile

For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:

A CSV file that contains all exported images for this class

A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images".

To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.

© Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)
Z
Detection-and-Tracking of Dolphins of Aerial Videos and Images
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Jul 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bigal, Eyal (2021). Detection-and-Tracking of Dolphins of Aerial Videos and Images [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4775124
Explore at:
Dataset updated
Jul 5, 2021
Authors
Bigal, Eyal
Description
This Project consists of two datasets, both of aerial images and videos of dolphins, being taken by drones. The data was captured from few places (Italy and Israel coast lines).

The aim of the project is to examine automated dolphins detection and tracking from aerial surveys.

The project description, details and results are presented in the paper (link to the paper).

Each dataset was organized and set for a different phase of the project. Each dataset is located in a different zip file:

Detection - Detection.zip

Tracking - Tracking.zip

Further information about the datasets' content and annotation format is below.

In aim to watch each file content, use the preview option, in addition a description appears later on this section.

Detection Dataset

This dataset contains 1125 aerial images, while an image can contain several dolphins.

The detection phase of the project is done using RetinaNet, supervised deep learning based algorithm, with the implementation of Keras RetinaNet. Therefore, the data was divided into three parts - Train, Validation and Test. The relations is 70%, 15%, 15% respectively.

The annotation format follows the requested format of that implementation (Keras RetinaNet). Each object, which is a dolphin, is annotated as a bounding box coordinates and a class. For this project, the dolphins were not distinguished into species, therefore, a dolphin object is annotated as a bounding box, and classified as a 'Dolphin'. Detection zip file includes:

A folder for each - Train, Validation and Test subsets, which includes the images

An annotations CSV file for each subset

A class mapping csv file (one for all the subsets).

*The annotation format is detailed in Annotation section.

Detection zip file content:

Detection |——————train_set (images) |——————train_set.csv |——————validation_set (images) |——————train_set.csv |——————test_set (images) |——————train_set.csv └——————class_mapping.csv

Tracking

This dataset contains 5 short videos (10-30 seconds), which were trimmed from a longer aerial videos, captured from a drone.

The tracking phase of the project is done using two metrics:

VIAME application, using the tracking feature

Re3: Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects, by Daniel Gordon. For this project, the author's Tensorflow implementation is being used

Both metrics demand the videos' frames sequence as an input. Therefore, the videos' frames were extracted. The first frame was annotated manually for initialization, and the algorithms track accordingly. Same as the Detection dataset, each frame can includes several objects (dolphins).

For annotation consistency, the videos' frames sequences were annotated similar to the Detection Dataset above, (details can be found in Annotation section). Each video's frames annotations separately. Therefore, Tracking zip file contains a folder for each video (5 folders in total), named after the video's file name.

Each video folder contains:

Frames sequence directory, which includes the extracted frames of the video

An annotations CSV file

A class mapping CSV file

The original video in MP4 format

The examined videos description and details are displayed in 'Videos Description.xlsx' file. Use the preview option for displaying its content.

Tracking zip file content:

Tracking |——————DJI_0195_trim_0015_0045 | └——————frames (images) | └——————annotations_DJI_0195_trim_0015_0045.csv | └——————class_mapping_DJI_0195_trim_0015_0045.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_0010_0025 | └——————frames (images) | └——————annotations_DJI_0395_trim_0010_0025.csv | └——————class_mapping_DJI_0395_trim_0010_0025.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_00140_00150 | └——————frames (images) | └——————annotations_DJI_0395_trim_00140_00150.csv | └——————class_mapping_DJI_0395_trim_00140_00150.csv | └——————DJI_0395_trim_00140_00150.MP4 |——————DJI_0395_trim_0055_0085 | └——————frames (images) | └——————annotations_DJI_0395_trim_0055_0085.csv | └——————class_mapping_DJI_0395_trim_0055_0085.csv | └——————DJI_0395_trim_0055_0085.MP4 └——————HighToLow_trim_0045_0070 └—————frames (images) └—————annotations_HighToLow_trim_0045_0070.csv └—————class_mapping_HighToLow_trim_0045_0070.csv └—————HighToLow_trim_0045_0070.MP4

Annotations format

Both datasets have similar annotation format which is described below. The data annotation format, of both datasets, follows the requested format of Keras RetinaNet Implementation, which was used for training in the Dolphins Detection phase of the project.

Each object (dolphin) is annotated by a bounding box left-top and right-bottom coordinates and a class. Each image or frame can includes several objects. All data was annotated using Labelbox application.

For each subset (Train, Validation and Test of Detection dataset, and each video of Tracking Dataset) there are two corresponded CSV files:

Annotations CSV file

Class mapping CSV file

Each line in the Annotations CSV file contains an annotation (bounding box) in an image or frame. The format of each line of the CSV annotation is:

path/to/image.jpg - a path to the image/frame

x1, y1 - image coordinates of the left upper corner of the bounding box

x2, y2 - image coordinates of the right bottom corner of the bounding box

class_name - class name of the annotated object

path/to/image.jpg,x1,y1,x2,y2,class_name

An example from train_set.csv:

.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin .\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin

This defines a dataset with 2 images:

1146_20170730101_ce1_sc_GOPR3047 103.jpg which contains 2 objects classified as 'Dolphin'

1146_20170730101_ce1_sc_GOPR3047 104.jpg which contains 3 objects classified as 'Dolphin'

Each line in the Class Mapping CSV file contains a mapping:

class_name,id

An example:

Dolphin,0
Z
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning
data.niaid.nih.gov
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Olsen, Alex; Konovalov, Dimitriv A.; Philippa, Bronson; Ridd, Peter; Wood, Jake C.; Johns, Jamie; Banks, Wesley; Girgenti, Benjamin; Kenny, Owen; Whinney, James; Calvert, Brendan; Rahimi Azghadi, Mostafa; White, Ronald D. (2023). DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7939059
Explore at:
Dataset updated
May 16, 2023
Authors
Olsen, Alex; Konovalov, Dimitriv A.; Philippa, Bronson; Ridd, Peter; Wood, Jake C.; Johns, Jamie; Banks, Wesley; Girgenti, Benjamin; Kenny, Owen; Whinney, James; Calvert, Brendan; Rahimi Azghadi, Mostafa; White, Ronald D.
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning

This repository makes available the source code and public dataset for the work, "DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning", published with open access by Scientific Reports: https://www.nature.com/articles/s41598-018-38343-3. The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora. In our work, the dataset was classified to an average accuracy of 95.7% with the ResNet50 deep convolutional neural network.

The source code, images and annotations are licensed under CC BY 4.0 license. The contents of this repository are released under an Apache 2 license.

Download the dataset images and our trained models

images.zip (468 MB)

models.zip (477 MB)

Due to the size of the images and models they are hosted outside of the Github repository. The images and models must be downloaded into directories named "images" and "models", respectively, at the root of the repository. If you execute the python script (deepweeds.py), as instructed below, this step will be performed for you automatically.

TensorFlow Datasets

Alternatively, you can access the DeepWeeds dataset with TensorFlow Datasets, TensorFlow's official collection of ready-to-use datasets. DeepWeeds was officially added to the TensorFlow Datasets catalog in August 2019.

Weeds and locations

The selected weed species are local to pastoral grasslands across the state of Queensland. They include: "Chinee apple", "Snake weed", "Lantana", "Prickly acacia", "Siam weed", "Parthenium", "Rubber vine" and "Parkinsonia". The images were collected from weed infestations at the following sites across Queensland: "Black River", "Charters Towers", "Cluden", "Douglas", "Hervey Range", "Kelso", "McKinlay" and "Paluma". The table and figure below break down the dataset by weed, location and geographical distribution.

Data organization

Images are assigned unique filenames that include the date/time the image was photographed and an ID number for the instrument which produced the image. The format is like so: YYYYMMDD-HHMMSS-ID, where the ID is simply an integer from 0 to 3. The unique filenames are strings of 17 characters, such as 20170320-093423-1.

labels

The labels.csv file assigns species labels to each image. It is a comma separated text file in the format:

Filename,Label,Species ... 20170207-154924-0,jpg,7,Snake weed 20170610-123859-1.jpg,1,Lantana 20180119-105722-1.jpg,8,Negative ...

Note: The specific label subsets of training (60%), validation (20%) and testing (20%) for the five-fold cross validation used in the paper are also provided here as CSV files in the same format as "labels.csv".

models

We provide the most successful ResNet50 and InceptionV3 models saved in Keras' hdf5 model format. The ResNet50 model, which provided the best results, has also been converted to UFF format in order to construct a TensorRT inference engine.

resnet.hdf5 inception.hdf5 resnet.uff

deepweeds.py

This python script trains and evaluates Keras' base implementation of ResNet50 and InceptionV3 on the DeepWeeds dataset, pre-trained with ImageNet weights. The performance of the networks are cross validated for 5 folds. The final classification accuracy is taken to be the average across the five folds. Similarly, the final confusion matrix from the associated paper aggregates across the five independent folds. The script also provides the ability to measure the inference speeds within the TensorFlow environment.

The script can be executed to carry out these computations using the following commands.

To train and evaluate the ResNet50 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model resnet.

To train and evaluate the InceptionV3 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model inception.

To measure inference times for the ResNet50 model, use python3 deepweeds.py inference --model models/resnet.hdf5.

To measure inference times for the InceptionV3 model, use python3 deepweeds.py inference --model models/inception.hdf5.

Dependencies

The required Python packages to execute deepweeds.py are listed in requirements.txt.

tensorrt

This folder includes C++ source code for creating and executing a ResNet50 TensorRT inference engine on an NVIDIA Jetson TX2 platform. To build and run on your Jetson TX2, execute the following commands:

cd tensorrt/src make -j4 cd ../bin ./resnet_inference

Citations

If you use the DeepWeeds dataset in your work, please cite it as:

IEEE style citation: “A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney, B. Calvert, M. Rahimi Azghadi, and R. D. White, “DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning,” Scientific Reports, vol. 9, no. 2058, 2 2019. [Online]. Available: https://doi.org/10.1038/s41598-018-38343-3 ”

BibTeX

@article{DeepWeeds2019, author = {Alex Olsen and Dmitry A. Konovalov and Bronson Philippa and Peter Ridd and Jake C. Wood and Jamie Johns and Wesley Banks and Benjamin Girgenti and Owen Kenny and James Whinney and Brendan Calvert and Mostafa {Rahimi Azghadi} and Ronald D. White}, title = {{DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning}}, journal = {Scientific Reports}, year = 2019, number = 2058, month = 2, volume = 9, issue = 1, day = 14, url = "https://doi.org/10.1038/s41598-018-38343-3", doi = "10.1038/s41598-018-38343-3" }
Image diagram dataset
figshare.com
zip
Updated Sep 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Andres Rodriguez Torres (2022). Image diagram dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20399283.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20399283.v2
Dataset updated
Sep 13, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Sergio Andres Rodriguez Torres
License
https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html
Description
A collection of 5.981 images labeled into 6 categories related to software diagrams, with the following distribution: Label Name Number 0 None 1010 1 Activity Diagram 595 2 Sequence Diagram 811 3 Class Diagram 986 4 Component Diagram 368 5 Use Case Diagram 854 6 Cloud Diagram 978 The dataset consist of a CSV file with the labeling and a zip file with the normalized images. The images are normalized in format RGB and size 224x224 pixels, ready for Keras neural networks.
Z
StreetSurfaceVis: a dataset of street-level imagery with annotations of road...
data.niaid.nih.gov
zenodo.org
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kapp, Alexandra; Hoffmann, Edith; Weigmann, Esther; Mihaljevic, Helena (2025). StreetSurfaceVis: a dataset of street-level imagery with annotations of road surface type and quality [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11449976
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Hochschule für Technik und Wirtschaft Berlin
HTW Berlin - University of Applied Sciences
Authors
Kapp, Alexandra; Hoffmann, Edith; Weigmann, Esther; Mihaljevic, Helena
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
StreetSurfaceVis

StreetSurfaceVis is an image dataset containing 9,122 street-level images from Germany with labels on road surface type and quality. The CSV file streetSurfaceVis_v1_0.csv contains all image metadata and four folders contain the image files. All images are available in four different sizes, based on the image width, in 256px, 1024px, 2048px and the original size.Folders containing the images are named according to the respective image size. Image files are named based on the mapillary_image_id.

You can find the corresponding publication here: StreetSurfaceVis: a dataset of crowdsourced street-level imagery with semi-automated annotations of road surface type and quality

Image metadata

Each CSV record contains information about one street-level image with the following attributes:

mapillary_image_id: ID provided by Mapillary (see information below on Mapillary)

user_id: Mapillary user ID of contributor

user_name: Mapillary user name of contributor

captured_at: timestamp, capture time of image

longitude, latitude: location the image was taken at

train: Suggestion to split train and test data. True for train data and False for test data. Test data contains data from 5 cities which are excluded in the training data.

surface_type: Surface type of the road in the focal area (the center of the lower image half) of the image. Possible values: asphalt, concrete, paving_stones, sett, unpaved

surface_quality: Surface quality of the road in the focal area of the image. Possible values: (1) excellent, (2) good, (3) intermediate, (4) bad, (5) very bad (see the attached Labeling Guide document for details)

Image source

Images are obtained from Mapillary, a crowd-sourcing plattform for street-level imagery. More metadata about each image can be obtained via the Mapillary API . User-generated images are shared by Mapillary under the CC-BY-SA License.

For each image, the dataset contains the mapillary_image_id and user_name. You can access user information on the Mapillary website by https://www.mapillary.com/app/user/ and image information by https://www.mapillary.com/app/?focus=photo&pKey=

If you use the provided images, please adhere to the terms of use of Mapillary.

Instances per class

Total number of images: 9,122

excellent good intermediate bad very bad

asphalt 971 1697 821

246

concrete 314 350 250

58

paving stones 385 1063 519

70

sett

129 694

540

unpaved

-

326 387 303

For modeling, we recommend using a train-test split where the test data includes geospatially distinct areas, thereby ensuring the model's ability to generalize to unseen regions is tested. We propose five cities varying in population size and from different regions in Germany for testing - images are tagged accordingly.

Number of test images (train-test split): 776

Inter-rater-reliablility

Three annotators labeled the dataset, such that each image was annotated by one person. Annotators were encouraged to consult each other for a second opinion when uncertain.1,800 images were annotated by all three annotators, resulting in a Krippendorff's alpha of 0.96 for surface type and 0.74 for surface quality.

Recommended image preprocessing

As the focal road located in the bottom center of the street-level image is labeled, it is recommended to crop images to their lower and middle half prior using for classification tasks.

This is an exemplary code for recommended image preprocessing in Python:

from PIL import Imageimg = Image.open(image_path)width, height = img.sizeimg_cropped = img.crop((0.25 * width, 0.5 * height, 0.75 * width, height))

License

CC-BY-SA

Citation

If you use this dataset, please cite as:

Kapp, A., Hoffmann, E., Weigmann, E. et al. StreetSurfaceVis: a dataset of crowdsourced street-level imagery annotated by road surface type and quality. Sci Data 12, 92 (2025). https://doi.org/10.1038/s41597-024-04295-9

@article{kapp_streetsurfacevis_2025, title = {{StreetSurfaceVis}: a dataset of crowdsourced street-level imagery annotated by road surface type and quality}, volume = {12}, issn = {2052-4463}, url = {https://doi.org/10.1038/s41597-024-04295-9}, doi = {10.1038/s41597-024-04295-9}, pages = {92}, number = {1}, journaltitle = {Scientific Data}, shortjournal = {Scientific Data}, author = {Kapp, Alexandra and Hoffmann, Edith and Weigmann, Esther and Mihaljević, Helena}, date = {2025-01-16},}

This is part of the SurfaceAI project at the University of Applied Sciences, HTW Berlin.

Prof. Dr. Helena Mihajlević- Alexandra Kapp- Edith Hoffmann- Esther Weigmann

Contact: surface-ai@htw-berlin.de

https://surfaceai.github.io/surfaceai/

Funding: SurfaceAI is a mFund project funded by the Federal Ministry for Digital and Transportation Germany.
RailEnV-PASMVS: a dataset for multi-view stereopsis training and...
zenodo.org
resodate.org
+2more
bin, csv, png, txt +1
Updated Jul 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
André Broekman; André Broekman; Petrus Johannes Gräbe; Petrus Johannes Gräbe (2024). RailEnV-PASMVS: a dataset for multi-view stereopsis training and reconstruction applications [Dataset]. http://doi.org/10.5281/zenodo.5233840
Explore at:
bin, csv, txt, zip, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5233840
Dataset updated
Jul 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
André Broekman; André Broekman; Petrus Johannes Gräbe; Petrus Johannes Gräbe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 320 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering.

File descriptions

RailEnV-PASMVS.blend (227 Mb) - Blender file (developed using Blender version 2.8.1) used to generate the dataset. The Blender file packs only one of the HDR environmental textures to use as an example, along with all the other asset textures.

RailEnV-PASMVS_sample.png (28 Mb) - A visual collage of 30 scenes, illustrating the variability introduced by using different models, illumination, material properties and camera focal lengths.

geometry.zip (2 Mb) - Geometry CSV files used for scenes 01 to 20. The Bezier curve defines the geometry of the rail profile (10 mm intervals).

PhysicalDataset.7z (2.0 Gb) - A smaller, secondary dataset of 320 manually annotated photographs of railway environments; only the railway profiles are annotated.

01.7z-40.7z (2.0 Gb each) - Archive of every scene (01 through 40).

all_list.txt, training_list.txt, validation_list.txt - Text files containing the all the scene names, together with those used for validation (validation_list.txt) and training (training_list.txt), used by MVSNet.

index.csv - CSV file provides a convenient reference for all the sample files, linking the corresponding file and relative data path.

Steps to reproduce

The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices.

import numpy as np from scipy.spatial.transform import Rotation as R #The intrinsic matrix K is constructed using the following formulation: focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm K = [[focalLengthPixel, 0, dimX/2], [0, focalPixel, dimY/2], [0, 0, 1]] #The rotation vector as provided by Blender was first transformed to a rotation matrix: r = R.from_euler('xyz', vectR, degrees=True) matR = r.as_matrix() #Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system: R_world2bcam = np.transpose(matR) #The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is: R_bcam2cv = np.array([[1, 0, 0], [0, -1, 0], [0, 0, -1]]) #Thus the representation from WORLD to CV/STANDARD coordinates is: R_world2cv = R_bcam2cv.dot(R_world2bcam) #The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates: T_world2bcam = -1 * R_world2bcam.dot(vectC) T_world2cv = R_bcam2cv.dot(T_world2bcam)

The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao. The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided.

Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.
The Object Detection for Olfactory References (ODOR) Dataset
data.europa.eu
unknown
Updated Oct 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2023). The Object Detection for Olfactory References (ODOR) Dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-10027116?locale=de
Explore at:
unknown(3926)Available download formats
Dataset updated
Oct 19, 2023
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Object Detection for Olfactory References (ODOR) Dataset Real-world applications of computer vision in the humanities require algorithms to be robust against artistic abstraction, peripheral objects, and subtle differences between fine-grained target classes. Existing datasets provide instance-level annotations on artworks but are generally biased towards the image centre and limited with regard to detailed object classes. The ODOR dataset fills this gap, offering 38,116 object-level annotations across 4,712 images, spanning an extensive set of 139 fine-grained categories. It has challenging dataset properties, such as a detailed set of categories, dense and overlapping objects, and spatial distribution over the whole image canvas. Inspiring further research on artwork object detection and broader visual cultural heritage studies, the dataset challenges researchers to explore the intersection of object recognition and smell perception. How to use To download the dataset images, run the download_imgs.py script in the subfolder. The images will be downloaded to the imgs folder. The annotations are provided in COCO JSON format. To represent the two-level hierarchy of the object classes, we make use of the supercategory field in the categories array as defined by COCO. In addition to the object-level annotations, we provide an additional CSV file with image-level metadata, which includes content-related fields, such as Iconclass codes or image descriptions, as well as formal annotations, such as artist, license, or creation year. For the sake of license compliance, we do not publish the images directly (although most of the images are public domain). Instead, we provide links to their source collections in the metadata file (meta.csv) and a python script to download the artwork images (download_images.py). The mapping between the images array of the annotations.json and the metadata.csv file can be accomplished via the file_name attribute of the elements of the images array and the unique File Name column of the metadata.csv file, respectively.
Z
Dolphins-Detection-and-Tracking from Aerial Videos and Images
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated May 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bigel, Eyal (2021). Dolphins-Detection-and-Tracking from Aerial Videos and Images [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4726661
Explore at:
Dataset updated
May 20, 2021
Authors
Bigel, Eyal
Description
Dataset includes aerial images and videos of dolphins, being taken by drones. The data was captured from few places (Italy and Israel coast lines).

The datsaset was collected in aim to perform automated dolphins detection of aerial images, and dolphins tracking from aerial videos.

The Project description and results in the github link, which describes and visualizes the paper (link to the paper).

The dataset includes two zip files:

Detection.zip

Tracking.zip

For both files, the data annotation format is identical, and described below.

In aim to watch each file content, use the preview option, in addition a description appears later on this section.

Annotations format

The data annotation format is inspired by the requested format of Keras RetinaNet Implementation, which was used for training in the Dolphins Detection Phase.

Each object is annotated by a bounding box. All data was annotated using Labelbox application.

For each subset there are two corresponded CSV files:

Annotation file

Class mapping file

Each line in the Annotations CSV file contains an annotation (bounding box) in an image or frame. The format of each line of the CSV annotation is:

path/to/image.jpg,x1,y1,x2,y2,class_name

path/to/image.jpg - a path to the image/frame

x1, y1 - image coordinates of the left upper corner of the bounding box

x2, y2 - image coordinates of the right bottom corner of the bounding box

class_name - class name of the annotated object

An example from train_set.csv:

.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin .\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin

This defines a dataset with 2 images:

1146_20170730101_ce1_sc_GOPR3047 103.jpg contains 2 bounding boxes which contains dolphins.

1146_20170730101_ce1_sc_GOPR3047 104.jpg contains 3 bounding boxes which contains dolphins.

Each line in the Class Mapping CSV file contains a mapping:

class_name,id

An example:

Dolphin,0

Detection

The data for dolphins' detection is separated to three sub-directories: train, validation and test sets.

Since all files contain only one class - Dolphin, there is one class_mapping.csv which is can be used for all the three subsets.

Detection dataset folder includes:

A folder for each - train, validation and test sets, which includes the images

An annotations CSV file for each - train, validation and test sets

A class mapping csv file (for all the sets)

There is an annotation CSV file for each of the subset.

Tracking

For the tracking phase, trackers were examined and evaluated on 5 videos. Each video has its annotation and class mapping CSV files. In addition, extracted each video's frames are available in the frames directory.

Tracking dataset folder includes a folder for each video (5 videos), which contain:

frames directory, which includes extracted frames of the video

An annotations CSV

A class mapping csv file

The original video

The examined videos description and details:

Detection and Tracking dataset structure:

Detection |——————train_set (images) |——————train_set.csv |——————validation_set (images) |——————train_set.csv |——————test_set (images) |——————train_set.csv └——————class_mapping.csv

Tracking |——————DJI_0195_trim_0015_0045 | └——————frames (images) | └——————annotations_DJI_0195_trim_0015_0045.csv | └——————class_mapping_DJI_0195_trim_0015_0045.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_0010_0025 | └——————frames (images) | └——————annotations_DJI_0395_trim_0010_0025.csv | └——————class_mapping_DJI_0395_trim_0010_0025.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_00140_00150 | └——————frames (images) | └——————annotations_DJI_0395_trim_00140_00150.csv | └——————class_mapping_DJI_0395_trim_00140_00150.csv | └——————DJI_0395_trim_00140_00150.MP4 |——————DJI_0395_trim_0055_0085 | └——————frames (images) | └——————annotations_DJI_0395_trim_0055_0085.csv | └——————class_mapping_DJI_0395_trim_0055_0085.csv | └——————DJI_0395_trim_0055_0085.MP4 └——————HighToLow_trim_0045_0070 └—————frames (images) └—————annotations_HighToLow_trim_0045_0070.csv └—————class_mapping_HighToLow_trim_0045_0070.csv └—————HighToLow_trim_0045_0070.MP4
MNIST dataset for Outliers Detection - [ MNIST4OD ]
figshare.com
application/gzip
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni Stilo; Bardh Prenkaj (2024). MNIST dataset for Outliers Detection - [ MNIST4OD ] [Dataset]. http://doi.org/10.6084/m9.figshare.9954986.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9954986.v2
Dataset updated
May 17, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Giovanni Stilo; Bardh Prenkaj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).We build MNIST4OD in the following way:To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits. For implementation simplicity we then flatten the images (28 X 28) into vectors.Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9).See the following numbers for a complete list of the statistics of each datasets ( Name | Instances | Dimensions | Number of Outliers in % ):MNIST_0 | 7594 | 784 | 10MNIST_1 | 8665 | 784 | 10MNIST_2 | 7689 | 784 | 10MNIST_3 | 7856 | 784 | 10MNIST_4 | 7507 | 784 | 10MNIST_5 | 6945 | 784 | 10MNIST_6 | 7564 | 784 | 10MNIST_7 | 8023 | 784 | 10MNIST_8 | 7508 | 784 | 10MNIST_9 | 7654 | 784 | 10
c
Walmart Products Dataset – Free Product Data CSV
crawlfeeds.com
csv, zip
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Walmart Products Dataset – Free Product Data CSV [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.

Key Features

Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.

CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.

Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.

Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.

Who Benefits?

Data analysts & researchers exploring e-commerce trends or product catalog data.

Developers & data scientists building price-comparison tools, recommendation engines or ML models.

E-commerce strategists/marketers need product metadata for competitive analysis or market research.

Students/hobbyists needing a free dataset for learning or demo projects.

Why Use This Dataset Instead of Manual Scraping?

Time-saving: No need to write scrapers or deal with rate limits.

Clean, structured data: All records are verified and already formatted in CSV, saving hours of cleaning.

Risk-free: Avoid Terms-of-Service issues or IP blocks that come with manual scraping.
Instant access: Free and immediately downloadable.
m
Fishpond Visual Condition Dataset v2.0
data.mendeley.com
Updated Nov 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dany Eka Saputra (2023). Fishpond Visual Condition Dataset v2.0 [Dataset]. http://doi.org/10.17632/rtsrk8792k.2
Explore at:
Unique identifier
https://doi.org/10.17632/rtsrk8792k.2
Dataset updated
Nov 21, 2023
Authors
Dany Eka Saputra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a part of a fundamental research to produce an IoT monitoring device for fishpond. The hypothesis of this research is that the health of a fishpond can be inferred from the visual data. To build the dataset, several conditions data is gathered. The temperature, pH level, and total dissolved solid (TDS) were collected in several location at different time. At each time and location, an aerial photo of the pond was also collected using drone at several height. The images is presented in two condition: the raw original images of the ponds, and the cropped image on each data point. The conditions data is collected by using appropriate digital sensor for each parameter. The dataset consists of 975 data rows. Each row represent the condition and visual image (in 100 x 100 pixels images) of a fishpond at certain time and location. To use the dataset, please access the pond_dataset.csv file. The file contains the tabular data of 13 ponds (each pond represent different location and different collection time). For each row, the visual image file name is presented. To access the image file, please search in the images folder and find the corresponding image file according to the name listed in the csv file. The dataset can be used to study the correlation of each parameter. For example, the research originally study the correlation about the visual data with the conditions data. To do this, the image need to be preprocessed. The image data can be converted into a histogram data, or any other visual preprocessing method and result.
PULP-Dronet v3 dataset
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lamberti, Lorenzo; Lorenzo, Bellone; Michal, Barcis (2024). PULP-Dronet v3 dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_13348430
Explore at:
Dataset updated
Aug 20, 2024
Dataset provided by
Technology Innovation Institutehttps://www.tii.ae/
University of Bologna
Authors
Lamberti, Lorenzo; Lorenzo, Bellone; Michal, Barcis
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The PULP-Dronet v3 dataset The Himax dataset has been collected at University of Bologna and Technology Innovation Institute using a Himax ultra-low power, gray-scale, and QVGA camera mounted on a Bitcraze Crazyflie nano-drone. The dataset has been used for training and testing the PULP-Dronet v3 CNN, a neural network for autonomous visual-based navigation for nano-drones. This release includes the training and testing set described in the paper.

Resources

Paper: Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs

Code available: pulp-platform/pulp-dronet

Video: https://youtu.be/ehNlDyhsVSc

Dataset Description

We collected a dataset of 77k images for nano-drones' autonomous navigation, for a total of 600MB of data.

We used the Bitcraze Crazyflie 2.1, collecting images from the AI-Deck's Himax HM01B0 monocrome camera.

The images in the PULP-Dronet v3 dataset have the following characteristics:

Resolution: each image has a QVGA resolution of 324x244 pixels.

Color: all images are grayscale, so they have 1 single channel.

Format: the images are stored in .jpeg format.

A human pilot manually flew the drone, collecting i) images from the grayscale QVGA Himax camera sensor of the AI-deck, ii) the gamepad's yaw-rate, normalized in the [-1;+1] range, inputted from the human pilot, iii) the drone's estimated state, and iv) the distance between obstacles and the drone measured by the front-looking ToF sensor.

After the data collection, we labeled all the images with a binary collision label whenever an obstacle was in the line of sight and closer than 2m. We recorded 301 sequences in 20 different environments. Each sequence of data is labeled with high-level characteristics, listed in characteristics.json:

For training our CNNs, we augmented the training images by applying random cropping, flipping, brightness augmentation, vignetting, and blur. The resulting dataset has 157k images, split as follows: 110k, 7k, 15k images for training, validation, and testing, respectively.

To address the labels' bias towards the center of the [-1;+1] yaw-rate range in our testing dataset, we balanced the dataset by selectively removing a portion of images that had a yaw-rate of 0. Specifically, we removed (only from the test set) some images having yaw_rate==0 and collision==1.

Dataset Train Images Validation Images Test Images Total

PULP-Dronet v3 53,830 7,798 15,790 77,418

PULP-Dronet v3 testing 53,830 7,798 3,071 64,699

PULP-Dronet v3 training 110,138 15,812 31,744 157,694

we use the PULP-Dronet v3 training for training and the PULP-Dronet v3 testing for validation/testing, this is the final split:

Dataset Train Images Validation Images Test Images Total

Final
110,138 7,798 3,071 121,007

Notes:

PULP-Dronet v3 and PULP-Dronet v3 testing datasets: Images are in full QVGA resolution (324x244px), uncropped.

PULP-Dronet v3 training dataset: Images are cropped to 200x200px, matching the PULP-Dronet input resolution. Cropping was done randomly on the full-resolution images to create variations.

Dataset Structure

. └── Dataset_PULP_Dronet_v3_*/ ├── ETH finetuning/ │ ├── acquisition1/ │ │ ├── characteristics.json # metadata │ │ ├── images/ # images folder │ │ ├── labels_partitioned.csv # Labels for PULP-Dronet │ │ └── state_labels_DroneState.csv # raw data from the crazyflie | ... │ └── acquisition39/ ├── Lorenzo Bellone/ │ ├── acquisition1/ | ... │ └── acquisition19/ ├── Lorenzo Lamberti/ │ ├── dataset-session1/ | │ ├── acquisition1/ | | ... | │ └── acquisition29/ │ ├── dataset-session2/ | │ ├── acquisition1/ | | ... | │ └── acquisition55/ │ ├── dataset-session3/ | │ ├── acquisition1/ | | ... | │ └── acquisition65/ │ └── dataset-session4/ | ├── acquisition1/ | ... | └── acquisition51/ ├── Michal Barcis/ │ ├── acquisition1/ | ... │ └── acquisition18/ └── TII finetuning/ ├── dataset-session1/ │ ├── acquisition1/ | ... │ └── acquisition18/ └── dataset-session2/ ├── acquisition1/ ... └── acquisition39/

This structure applies for all the three sets mentioned above: PULP_Dronet_v3, PULP_Dronet_v3_training, PULP_Dronet_v3_testing.

Dataset Labels

labels_partitioned.csv

The file contains metadata for the PULP-Dronet v3 image dataset.

The file includes the following columns:

filename: The name of the image file (e.g., 25153.jpeg).

label_yaw_rate: The yaw rate label, representing the rotational velocity. values are in the [-1, +1] range, where YawRate > 0 means counter-clockwise turn --> turn left, and YawRate < 0 means clockwise turn --> turn right.

label_collision: The collision label, in range [0,1]. 0 denotes no collision and 1 indicates a collision.

partition: The dataset partition, i.e., train, test, or valid.

characteristics.json

contains metadata. This might be useful the user to filter the dataset on some specific characteristics, or to partition the images types equally:

scenario (i.e., indoor or outdoor);

path type (i.e., presence or absence of turns);

obstacle types (e.g., pedestrians, chairs);

flight height (i.e., 0.5, 1.0, 1.5 m/s);

behaviour in presence of obstacles (i.e., overpassing, stand still, n/a);

light conditions (dark, normal, bright, mixed);

a location name identifier;

acquisition date.

labeled_images.csv

the same as labels_partitioned.csv, but without the partition column. You can use this file to repeat the partition into train, valid, and test sets.

state_labels_DroneState.csv

This is the raw data logged from the crazyflie at ~100 samples/s.

The file includes the following columns:

timeTicks: The timestamp.

range.front: The distance measurement from the front VL53L1x ToF sensor [mm].

mRange.rangeStatusFront: The status code of the front range sensor (check the VL53L1x datasheet for more info)

controller.yawRate: The yaw rate command given by the human pilot (in radians per second).

ctrltarget.yaw: The target yaw rate set by the control system (in radians per second).

stateEstimateZ.rateYaw: The estimated yaw rate from the drone's state estimator (in radians per second).

Data Processing Workflow

You can find the scripts at pulp-platform/pulp-dronet

dataset_processing.py:

Input: state_labels_DroneState.csv

Output: labeled_images.csv

Function: matches the drone state labels (~100Hz) timestamp to the image's timestamp (~10Hz), discarding extra drone states.

dataset_partitioning.py:

Input: labeled_images.csv

Output: labels_partitioned.csv

Function: Partitions the labeled images into training, validation, and test sets.

License

We release this dataset as open source under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, see LICENSE.CC.md.
Landmarks Dataset for sign recognition numbers
kaggle.com
zip
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshat Mittu (2022). Landmarks Dataset for sign recognition numbers [Dataset]. https://www.kaggle.com/datasets/akshatmittu/landmarks-dataset-for-sign-recognition-numbers
Explore at:
zip(50385 bytes)Available download formats
Dataset updated
Nov 4, 2022
Authors
Akshat Mittu
Description
This dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).

You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly

import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os

for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:

for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks): mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks, connections = mp_hands.HAND_CONNECTIONS) a = dict() a['label'] = t for i in range(21): s = ('x','y','z') k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z) for j in range(len(k)): a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j] df = df.append(a,ignore_index=True)
bioimage.io upload: hpa/hpa-kaggle-2021-dataset
zenodo.org
data-staging.niaid.nih.gov
bin, png
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Human Protein Atlas; The Human Protein Atlas (2024). bioimage.io upload: hpa/hpa-kaggle-2021-dataset [Dataset]. http://doi.org/10.5281/zenodo.13219996
Explore at:
bin, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13219996
Dataset updated
Aug 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
The Human Protein Atlas; The Human Protein Atlas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
View on bioimage.io # HPA Single Cell Classification Dataset 2021

Training dataset for the Human Protein Atlas - Single Cell Classification competition 2021

More information: https://www.kaggle.com/competitions/hpa-single-cell-image-classification/data

What files do I need?

On the data page below, you will find a set of full size original images (a mix of 1728x1728, 2048x2048 and 3072x3072 PNG files) in train.zip and test.zip. (Please note that since this is a code competition, part of test data will be hidden)

You will also need the image level labels from train.csv and the filenames for the test set from sample_submission.csv. As many Kagglers made use of all public images in HPA for previous classification challenge, we made the public HPA images available to download as instructed in this notebook. Note also that there are TFRecords available if competitors would like to use TPUs.

The 16-bit version of the training images are available here. Additional training images are available here.

What should I expect the data format to be?

The training image-level labels are provided for each sample in train.csv. The bulk of the data for images - train.zip. Each sample consists of four files. Each file represents a different filter on the subcellular protein patterns represented by the sample. The format should be [filename]_[filter color].png for the PNG files. Colors are red for microtubule channels, blue for nuclei channels, yellow for Endoplasmic Reticulum (ER) channels, and green for protein of interest.

What am I predicting?

You are predicting protein organelle localization labels for each cell in the image. Border cells are included when there is enough information to decide on the labels.

There are in total 19 different labels present in the dataset (18 labels for specific locations, and label 18 for negative and unspecific signal). The dataset is acquired in a highly standardized way using one imaging modality (confocal microscopy). However, the dataset comprises 17 different cell types of highly different morphology, which affect the protein patterns of the different organelles. All image samples are represented by four filters (stored as individual files), the protein of interest (green) plus three cellular landmarks: nucleus (blue), microtubules (red), endoplasmic reticulum (yellow). The green filter should hence be used to predict the label, and the other filters are used as references. The labels are represented as integers that map to the following:

0. Nucleoplasm 1. Nuclear membrane 2. Nucleoli 3. Nucleoli fibrillar center 4. Nuclear speckles 5. Nuclear bodies 6. Endoplasmic reticulum 7. Golgi apparatus 8. Intermediate filaments 9. Actin filaments 10. Microtubules 11. Mitotic spindle 12. Centrosome 13. Plasma membrane 14. Mitochondria 15. Aggresome 16. Cytosol 17. Vesicles and punctate cytosolic patterns 18. Negative

What is meant by weak image-level labels?

The labels you will get for training are image level labels while the task is to predict cell level labels. That is to say, each training image contains a number of cells that have collectively been labeled as described above and the prediction task is to look at images of the same type and predict the labels of each individual cell within those images.

As the training labels are a collective label for all the cells in an image, it means that each labeled pattern can be seen in the image but not necessarily that each cell within the image expresses the pattern. This imprecise labeling is what we refer to as weak.

During the challenge you will both need to segment the cells in the images and predict the labels of those segmented cells.

Files: - train - training images (in .tif) - test - test images (in .png) - the task of the competition is to segment and label the images in this folder - train.csv - filenames and image level labels for the training set - sample_submission.csv - filenames for the test set, and a guide to constructing a working submission.

Columns: - ID - The base filename of the sample. As noted above all samples consist of four files - blue, green, red, and yellow. - Label - in the training data, this represents the labels assigned to each sample; in submission, this represent the labels assigned to each cell.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). RSICD Image Caption Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/rsicd-image-caption-dataset

RSICD Image Caption Dataset

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 6, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

RSICD Image Caption Dataset

By Arto (From Huggingface) [source]

About this dataset

The train.csv file contains a list of image filenames, captions, and the actual images used for training the image captioning models. Similarly, the test.csv file includes a separate set of image filenames, captions, and images specifically designated for testing the accuracy and performance of the trained models.

Furthermore, the valid.csv file contains a unique collection of image filenames with their respective captions and images that serve as an independent validation set to evaluate the models' capabilities accurately.

Each entry in these CSV files includes both a filename string that indicates the name or identifier of an image file stored in another location or directory. Additionally,** each entry also provides a list (or multiple rows) o**f strings representing written descriptions or captions describing each respective image given its filename.

Considering these details about this dataset's structure, it can be immensely valuable to researchers, developers, and enthusiasts working on developing innovative computer vision algorithms such as automatic text generation based on visual content analysis. Whether it's training machine learning models to automatically generate relevant captions based on new unseen images or evaluating existing systems' performance against diverse criteria.

Stay updated with cutting-edge research trends by leveraging this comprehensive dataset containing not only captio**ns but also corresponding imag**es across different sets specifically designed to cater to varied purposes within computer vision tasks. »

How to use the dataset

Overview of the Dataset

The dataset consists of three primary files: train.csv, test.csv, and valid.csv. These files contain information about image filenames and their respective captions. Each file includes multiple captions for each image to support diverse training techniques.

Understanding the Files

train.csv: This file contains filenames (filename column) and their corresponding captions (captions column) for training your image captioning model.

test.csv: The test set is included in this file, which contains a similar structure as that of train.csv. The purpose of this file is to evaluate your trained models on unseen data.

valid.csv: This validation set provides images with their respective filenames (filename) and captions (captions). It allows you to fine-tune your models based on performance during evaluation.

Getting Started

To begin utilizing this dataset effectively, follow these steps:

Extract the zip file containing all relevant data files onto your local machine or cloud environment.

Familiarize yourself with each CSV file's structure: train.csv, test.csv, and valid.csv. Understand how information like filename(s) (filename) corresponds with its respective caption(s) (captions).

Depending on your specific use case or research goals, determine which portion(s) of the dataset you wish to work with (e.g., only train or train+validation).

Load the dataset into your preferred programming environment or machine learning framework, ensuring you have the necessary dependencies installed.

Preprocess the dataset as needed, such as resizing images to a specific dimension or encoding captions for model training purposes.

Split the data into training, validation, and test sets according to your experimental design requirements.

Use appropriate algorithms and techniques to train your image captioning models on the provided data.

Enhancing Model Performance

To optimize model performance using this dataset, consider these tips:

Explore different architectures and pre-trained models specifically designed for image captioning tasks.

Experiment with various natural language

Research Ideas

Image Captioning: This dataset can be used to train and evaluate image captioning models. The captions can be used as target labels for training, and the images can be paired with the captions to generate descriptive captions for test images.

Image Retrieval: The dataset can be used for image retrieval tasks where given a query caption, the model needs to retrieve the images that best match the description. This can be useful in applications such as content-based image search.

Natural Language Processing: The dataset can also be used for natural language processing tasks such as text generation or machine translation. The captions in this dataset are descriptive ...

Clear search

Close search

Google apps

Main menu

RSICD Image Caption Dataset

RSICD Image Caption Dataset