Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Arto (From Huggingface) [source]
The train.csv file contains a list of image filenames, captions, and the actual images used for training the image captioning models. Similarly, the test.csv file includes a separate set of image filenames, captions, and images specifically designated for testing the accuracy and performance of the trained models.
Furthermore, the valid.csv file contains a unique collection of image filenames with their respective captions and images that serve as an independent validation set to evaluate the models' capabilities accurately.
Each entry in these CSV files includes both a filename string that indicates the name or identifier of an image file stored in another location or directory. Additionally,** each entry also provides a list (or multiple rows) o**f strings representing written descriptions or captions describing each respective image given its filename.
Considering these details about this dataset's structure, it can be immensely valuable to researchers, developers, and enthusiasts working on developing innovative computer vision algorithms such as automatic text generation based on visual content analysis. Whether it's training machine learning models to automatically generate relevant captions based on new unseen images or evaluating existing systems' performance against diverse criteria.
Stay updated with cutting-edge research trends by leveraging this comprehensive dataset containing not only captio**ns but also corresponding imag**es across different sets specifically designed to cater to varied purposes within computer vision tasks. »
Overview of the Dataset
The dataset consists of three primary files:
train.csv,test.csv, andvalid.csv. These files contain information about image filenames and their respective captions. Each file includes multiple captions for each image to support diverse training techniques.Understanding the Files
- train.csv: This file contains filenames (
filenamecolumn) and their corresponding captions (captionscolumn) for training your image captioning model.- test.csv: The test set is included in this file, which contains a similar structure as that of
train.csv. The purpose of this file is to evaluate your trained models on unseen data.- valid.csv: This validation set provides images with their respective filenames (
filename) and captions (captions). It allows you to fine-tune your models based on performance during evaluation.Getting Started
To begin utilizing this dataset effectively, follow these steps:
- Extract the zip file containing all relevant data files onto your local machine or cloud environment.
- Familiarize yourself with each CSV file's structure:
train.csv,test.csv, andvalid.csv. Understand how information like filename(s) (filename) corresponds with its respective caption(s) (captions).- Depending on your specific use case or research goals, determine which portion(s) of the dataset you wish to work with (e.g., only train or train+validation).
- Load the dataset into your preferred programming environment or machine learning framework, ensuring you have the necessary dependencies installed.
- Preprocess the dataset as needed, such as resizing images to a specific dimension or encoding captions for model training purposes.
- Split the data into training, validation, and test sets according to your experimental design requirements.
- Use appropriate algorithms and techniques to train your image captioning models on the provided data.
Enhancing Model Performance
To optimize model performance using this dataset, consider these tips:
- Explore different architectures and pre-trained models specifically designed for image captioning tasks.
- Experiment with various natural language
- Image Captioning: This dataset can be used to train and evaluate image captioning models. The captions can be used as target labels for training, and the images can be paired with the captions to generate descriptive captions for test images.
- Image Retrieval: The dataset can be used for image retrieval tasks where given a query caption, the model needs to retrieve the images that best match the description. This can be useful in applications such as content-based image search.
- Natural Language Processing: The dataset can also be used for natural language processing tasks such as text generation or machine translation. The captions in this dataset are descriptive ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The idea came to my mind to scrap this data. I was working on an e-commerce project Fashion Product Recommendation (an end-to-end project). In this project, upload any fashion image and it will show the 10 closest recommendations.
https://user-images.githubusercontent.com/40932902/169657090-20d3342d-d472-48e3-bc34-8a9686b09961.png" alt="">
https://user-images.githubusercontent.com/40932902/169657035-870bb803-f985-482a-ac16-789d0fcf2a2b.png" alt="">
https://user-images.githubusercontent.com/40932902/169013855-099838d6-8612-45ce-8961-28ccf44f81f7.png" alt="">
I completed my project on this image dataset . The problem I faced while deploying on the Heroku server. Due to the large project file size, I was unable to deploy as Heroku offers limited memory space for a free account.
As currently, I am only familiar with Heroku. Learning AWS for big projects. So, I decided to scrap my own image dataset with much more information that can help me to transform this project to the next level. Scraped this data from flipkart.com(e-commerce website) in two formats Image and textual data in tabular format.
This dataset contains 65k images (400x450 pixel)) of fashion/style products and accessories like clothing, footwear, accessories, and many more. There is a CSV file also mapped with the image name and the id column in tabular data. The name of the image is in a unique numerical format like 1.png, 62299.png Image name and Id columns are the same. So, suppose you want to find the details of any image then you can find them using the image name id, go to the Id column in the csv file and that id rows will be the details of the image. You can find the notebook in the code section which I used to scrap this data.
Columns of CSV Dataset: 1. id : Unique id same as the image name 2. brand: Brand name of the product 3. title: Title of the product 4. sold_price: selling price of the product 5. actual_price: Actual price of the product 6. url : unique URL of every product 7. img: Image URL
How did helped me this dataset: 1. I trained my CNN model using the image data, that's the only use of the image dataset. 2. In my front-end page of the project to display results, I used Image URL and displayed after extracting from the web. This helped me to not upload the image dataset with the project on the server and this saved huge memory space. 3. Using the url displaying live price and** ratings** from the Flipkart website. 4. And there is a Buy button mapped with the url you will be redirected to the original product page and buy it from there. after using this dataset I changed my project name from Fashion Product Recommender to Flipkart Fashion Product Recommender. 😄😄😄
Still, the memory problem was not resolved as the model trained file was above 500MB on the complete dataset. So I tried on multiple sets and finally, I deployed after training on 1000 images only. In the future, I will try on another platform to deploy the complete project. I learned many new things while working on this dataset.
To download the same dataset in small size less than 500mb you can find it here, everything is the same as this dataset only I reduced the pixel of the image from 400x450px to ** 65x80pixels**.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By conceptual_captions (From Huggingface) [source]
The Conceptual Captions dataset, hosted on Kaggle, is a comprehensive and expansive collection of web-harvested images and their corresponding captions. With a staggering total of approximately 3.3 million images, this dataset offers a rich resource for training and evaluating image captioning models.
Unlike other image caption datasets, the unique feature of Conceptual Captions lies in the diverse range of styles represented in its captions. These captions are sourced from the web, specifically extracted from the Alt-text HTML attribute associated with web images. This approach ensures that the dataset encompasses a broad variety of textual descriptions that accurately reflect real-world usage scenarios.
To guarantee the quality and reliability of these captions, an elaborate automatic pipeline has been developed for extracting, filtering, and transforming each image/caption pair. The goal behind this diligent curation process is to provide clean, informative, fluent, and learnable captions that effectively describe their corresponding images.
The dataset itself consists of two primary components: train.csv and validation.csv files. The train.csv file comprises an extensive collection of over 3.3 million web-harvested images along with their respective carefully curated captions. Each image is accompanied by its unique URL to allow easy retrieval during model training.
On the other hand, validation.csv contains approximately 100,000 image URLs paired with their corresponding informative captions. This subset serves as an invaluable resource for validating and evaluating model performance after training on the larger train.csv set.
Researchers and data scientists can leverage this remarkable Conceptual Captions dataset to develop state-of-the-art computer vision models focused on tasks such as image understanding, natural language processing (NLP), multimodal learning techniques combining visual features with textual context comprehension – among others.
By providing such an extensive array of high-quality images coupled with richly descriptive captions acquired from various sources across the internet landscape through a meticulous curation process - Conceptual Captions empowers professionals working in fields like artificial intelligence (AI), machine learning, computer vision, and natural language processing to explore new frontiers in visual understanding and textual comprehension
Title: How to Use the Conceptual Captions Dataset for Web-Harvested Image and Caption Analysis
Introduction: The Conceptual Captions dataset is an extensive collection of web-harvested images, each accompanied by a caption. This guide aims to help you understand and effectively utilize this dataset for various applications, such as image captioning, natural language processing, computer vision tasks, and more. Let's dive into the details!
Step 1: Acquiring the Dataset
Step 2: Exploring the Dataset Files After downloading the dataset files ('train.csv' and 'validation.csv'), you'll find that each file consists of multiple columns containing valuable information:
a) 'caption': This column holds captions associated with each image. It provides textual descriptions that can be used in various NLP tasks. b) 'image_url': This column contains URLs pointing to individual images in the dataset.
Step 3: Understanding Dataset Structure The Conceptual Captions dataset follows a tabular format where each row represents an image/caption pair. Combining knowledge from both train.csv and validation.csv files will give you access to a diverse range of approximately 3.4 million paired examples.
Step 4: Preprocessing Considerations Due to its web-harvested nature, it is recommended to perform certain preprocessing steps on this dataset before utilizing it for your specific task(s). Some considerations include:
a) Text Cleaning: Perform basic text cleaning techniques such as removing special characters or applying sentence tokenization. b) Filtering: Depending on your application, you may need to apply specific filters to remove captions that are irrelevant, inaccurate, or noisy. c) Language Preprocessing: Consider using techniques like lemmatization or stemming if it suits your task.
Step 5: Training and Evaluation Once you have preprocessed the dataset as per your requirements, it's time to train your models! The Conceptual Captions dataset can be used for a range of tasks such as image captioni...
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Sports Image Classification dataset
From Kaggle: 100 Sports Image Classification Collection of sports images covering 100 different sports. Images are 224x224x3 jpg format. Data is separated into train, test and valid directories.
13493 train images 500 test images 500 validate images
Additionallly a csv file is included for those that wish to use it to create there own train, test and validation datasets.
Clone
git clone… See the full description on the dataset page: https://huggingface.co/datasets/HES-XPLAIN/SportsImageClassification.
Facebook
TwitterDescription
This sound field image dataset contains clean-noisy pairs of complex-valued sound-field images generated by 2D acoustic simulations. The dataset was initially prepared for deep sound-field denoiser (https://github.com/nttcslab/deep-sound-field-denoiser), a DNN-based denoising method for optically measured sound fields. Since the data is a two-dimensional sound field based on the Helmholtz equation, one can use this dataset for any acoustic application. Please check our GitHub repository and paper for details.
Directory structure
The dataset contains three directories: training, validation, and evaluation. Each directory contains "soundsource#" sub-directories (# represents the number of sound sources used in the acoustic simulation). Each sub-directory has three h5 files for data (clean, white noise, and speckle noise) and three CSV files listing random parameter values used in the simulation.
/training
/soundsource#
constants.csv
random_variable_ranges.csv
random_variables.csv
sf_true.h5
sf_noise_white.h5
sf_noise_speckle.h5
Condition of use
This dataset is available under the attached license file. Read the terms and conditions in NTTSoftwareLicenseAgreement.pdf carefully.
Citation
If you use this dataset, please cite the following paper.
K. Ishikawa, D. Takeuchi, N. Harada, and T. Moriya ``Deep sound-field denoiser: optically-measured sound-field denoising using deep neural network,'' arXiv:2304.14923 (2023).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Welcome to the Astronomical Objects Dataset, a comprehensive collection of 50,000 high-resolution images of celestial bodies, including asteroids, planets, moons, stars, galaxies, nebulae, black holes, exoplanets, dwarf planets, constellations and stars and man-made space objects like rockets, space debris, spacecrafts (landers,orbiters,probes,rovers),satellites and space stations. This dataset is curated for AI/ML research, astronomy education, and image classification tasks in the realm of space exploration.
/asteroids, /planets, /moons, etc.)For each category (e.g., asteroids, planets, etc.), there's a corresponding CSV file containing:
- name: Common or scientific name of the object
- description: Short summary about the object
- distance_from_earth_km: Average distance from Earth
- discovery_year: Year the object was discovered (if applicable)
- diameter_km, mass_kg: Physical characteristics
- fun_facts : Fun Facts about the specific object in that category
Upvote this dataset for others to see and use it for their specific projects and purposes.
Dive into the dataset and start building intelligent systems that can see and understand the universe like never before.
Let me know in the comments if you have more images that can be added to upgrade this dataset. I had to manually create this dataset as there are not many image datasets with specific astrnomical images for different categories each.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE).
Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames):
For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:
To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.
© Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)
Facebook
TwitterThis Project consists of two datasets, both of aerial images and videos of dolphins, being taken by drones. The data was captured from few places (Italy and Israel coast lines).
The aim of the project is to examine automated dolphins detection and tracking from aerial surveys.
The project description, details and results are presented in the paper (link to the paper).
Each dataset was organized and set for a different phase of the project. Each dataset is located in a different zip file:
Detection - Detection.zip
Tracking - Tracking.zip
Further information about the datasets' content and annotation format is below.
Detection Dataset
This dataset contains 1125 aerial images, while an image can contain several dolphins.
The detection phase of the project is done using RetinaNet, supervised deep learning based algorithm, with the implementation of Keras RetinaNet. Therefore, the data was divided into three parts - Train, Validation and Test. The relations is 70%, 15%, 15% respectively.
The annotation format follows the requested format of that implementation (Keras RetinaNet). Each object, which is a dolphin, is annotated as a bounding box coordinates and a class. For this project, the dolphins were not distinguished into species, therefore, a dolphin object is annotated as a bounding box, and classified as a 'Dolphin'. Detection zip file includes:
A folder for each - Train, Validation and Test subsets, which includes the images
An annotations CSV file for each subset
A class mapping csv file (one for all the subsets).
*The annotation format is detailed in Annotation section.
Detection zip file content:
Detection |——————train_set (images) |——————train_set.csv |——————validation_set (images) |——————train_set.csv |——————test_set (images) |——————train_set.csv └——————class_mapping.csv
Tracking
This dataset contains 5 short videos (10-30 seconds), which were trimmed from a longer aerial videos, captured from a drone.
The tracking phase of the project is done using two metrics:
VIAME application, using the tracking feature
Re3: Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects, by Daniel Gordon. For this project, the author's Tensorflow implementation is being used
Both metrics demand the videos' frames sequence as an input. Therefore, the videos' frames were extracted. The first frame was annotated manually for initialization, and the algorithms track accordingly. Same as the Detection dataset, each frame can includes several objects (dolphins).
For annotation consistency, the videos' frames sequences were annotated similar to the Detection Dataset above, (details can be found in Annotation section). Each video's frames annotations separately. Therefore, Tracking zip file contains a folder for each video (5 folders in total), named after the video's file name.
Each video folder contains:
Frames sequence directory, which includes the extracted frames of the video
An annotations CSV file
A class mapping CSV file
The original video in MP4 format
The examined videos description and details are displayed in 'Videos Description.xlsx' file. Use the preview option for displaying its content.
Tracking zip file content:
Tracking |——————DJI_0195_trim_0015_0045 | └——————frames (images) | └——————annotations_DJI_0195_trim_0015_0045.csv | └——————class_mapping_DJI_0195_trim_0015_0045.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_0010_0025 | └——————frames (images) | └——————annotations_DJI_0395_trim_0010_0025.csv | └——————class_mapping_DJI_0395_trim_0010_0025.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_00140_00150 | └——————frames (images) | └——————annotations_DJI_0395_trim_00140_00150.csv | └——————class_mapping_DJI_0395_trim_00140_00150.csv | └——————DJI_0395_trim_00140_00150.MP4 |——————DJI_0395_trim_0055_0085 | └——————frames (images) | └——————annotations_DJI_0395_trim_0055_0085.csv | └——————class_mapping_DJI_0395_trim_0055_0085.csv | └——————DJI_0395_trim_0055_0085.MP4 └——————HighToLow_trim_0045_0070 └—————frames (images) └—————annotations_HighToLow_trim_0045_0070.csv └—————class_mapping_HighToLow_trim_0045_0070.csv └—————HighToLow_trim_0045_0070.MP4
Annotations format
Both datasets have similar annotation format which is described below. The data annotation format, of both datasets, follows the requested format of Keras RetinaNet Implementation, which was used for training in the Dolphins Detection phase of the project.
Each object (dolphin) is annotated by a bounding box left-top and right-bottom coordinates and a class. Each image or frame can includes several objects. All data was annotated using Labelbox application.
For each subset (Train, Validation and Test of Detection dataset, and each video of Tracking Dataset) there are two corresponded CSV files:
Annotations CSV file
Class mapping CSV file
Each line in the Annotations CSV file contains an annotation (bounding box) in an image or frame. The format of each line of the CSV annotation is:
path/to/image.jpg - a path to the image/frame
x1, y1 - image coordinates of the left upper corner of the bounding box
x2, y2 - image coordinates of the right bottom corner of the bounding box
class_name - class name of the annotated object
path/to/image.jpg,x1,y1,x2,y2,class_name
An example from train_set.csv:
.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin .\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin
This defines a dataset with 2 images:
1146_20170730101_ce1_sc_GOPR3047 103.jpg which contains 2 objects classified as 'Dolphin'
1146_20170730101_ce1_sc_GOPR3047 104.jpg which contains 3 objects classified as 'Dolphin'
Each line in the Class Mapping CSV file contains a mapping:
class_name,id
An example:
Dolphin,0
Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning
This repository makes available the source code and public dataset for the work, "DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning", published with open access by Scientific Reports: https://www.nature.com/articles/s41598-018-38343-3. The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora. In our work, the dataset was classified to an average accuracy of 95.7% with the ResNet50 deep convolutional neural network.
The source code, images and annotations are licensed under CC BY 4.0 license. The contents of this repository are released under an Apache 2 license.
Download the dataset images and our trained models
images.zip (468 MB)
models.zip (477 MB)
Due to the size of the images and models they are hosted outside of the Github repository. The images and models must be downloaded into directories named "images" and "models", respectively, at the root of the repository. If you execute the python script (deepweeds.py), as instructed below, this step will be performed for you automatically.
TensorFlow Datasets
Alternatively, you can access the DeepWeeds dataset with TensorFlow Datasets, TensorFlow's official collection of ready-to-use datasets. DeepWeeds was officially added to the TensorFlow Datasets catalog in August 2019.
Weeds and locations
The selected weed species are local to pastoral grasslands across the state of Queensland. They include: "Chinee apple", "Snake weed", "Lantana", "Prickly acacia", "Siam weed", "Parthenium", "Rubber vine" and "Parkinsonia". The images were collected from weed infestations at the following sites across Queensland: "Black River", "Charters Towers", "Cluden", "Douglas", "Hervey Range", "Kelso", "McKinlay" and "Paluma". The table and figure below break down the dataset by weed, location and geographical distribution.
Data organization
Images are assigned unique filenames that include the date/time the image was photographed and an ID number for the instrument which produced the image. The format is like so: YYYYMMDD-HHMMSS-ID, where the ID is simply an integer from 0 to 3. The unique filenames are strings of 17 characters, such as 20170320-093423-1.
labels
The labels.csv file assigns species labels to each image. It is a comma separated text file in the format:
Filename,Label,Species ... 20170207-154924-0,jpg,7,Snake weed 20170610-123859-1.jpg,1,Lantana 20180119-105722-1.jpg,8,Negative ...
Note: The specific label subsets of training (60%), validation (20%) and testing (20%) for the five-fold cross validation used in the paper are also provided here as CSV files in the same format as "labels.csv".
models
We provide the most successful ResNet50 and InceptionV3 models saved in Keras' hdf5 model format. The ResNet50 model, which provided the best results, has also been converted to UFF format in order to construct a TensorRT inference engine.
resnet.hdf5 inception.hdf5 resnet.uff
deepweeds.py
This python script trains and evaluates Keras' base implementation of ResNet50 and InceptionV3 on the DeepWeeds dataset, pre-trained with ImageNet weights. The performance of the networks are cross validated for 5 folds. The final classification accuracy is taken to be the average across the five folds. Similarly, the final confusion matrix from the associated paper aggregates across the five independent folds. The script also provides the ability to measure the inference speeds within the TensorFlow environment.
The script can be executed to carry out these computations using the following commands.
To train and evaluate the ResNet50 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model resnet.
To train and evaluate the InceptionV3 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model inception.
To measure inference times for the ResNet50 model, use python3 deepweeds.py inference --model models/resnet.hdf5.
To measure inference times for the InceptionV3 model, use python3 deepweeds.py inference --model models/inception.hdf5.
Dependencies
The required Python packages to execute deepweeds.py are listed in requirements.txt.
tensorrt
This folder includes C++ source code for creating and executing a ResNet50 TensorRT inference engine on an NVIDIA Jetson TX2 platform. To build and run on your Jetson TX2, execute the following commands:
cd tensorrt/src make -j4 cd ../bin ./resnet_inference
Citations
If you use the DeepWeeds dataset in your work, please cite it as:
IEEE style citation: “A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney, B. Calvert, M. Rahimi Azghadi, and R. D. White, “DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning,” Scientific Reports, vol. 9, no. 2058, 2 2019. [Online]. Available: https://doi.org/10.1038/s41598-018-38343-3 ”
BibTeX
@article{DeepWeeds2019, author = {Alex Olsen and Dmitry A. Konovalov and Bronson Philippa and Peter Ridd and Jake C. Wood and Jamie Johns and Wesley Banks and Benjamin Girgenti and Owen Kenny and James Whinney and Brendan Calvert and Mostafa {Rahimi Azghadi} and Ronald D. White}, title = {{DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning}}, journal = {Scientific Reports}, year = 2019, number = 2058, month = 2, volume = 9, issue = 1, day = 14, url = "https://doi.org/10.1038/s41598-018-38343-3", doi = "10.1038/s41598-018-38343-3" }
Facebook
Twitterhttps://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html
A collection of 5.981 images labeled into 6 categories related to software diagrams, with the following distribution: Label Name Number 0 None 1010 1 Activity Diagram 595 2 Sequence Diagram 811 3 Class Diagram 986 4 Component Diagram 368 5 Use Case Diagram 854 6 Cloud Diagram 978 The dataset consist of a CSV file with the labeling and a zip file with the normalized images. The images are normalized in format RGB and size 224x224 pixels, ready for Keras neural networks.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
StreetSurfaceVis
StreetSurfaceVis is an image dataset containing 9,122 street-level images from Germany with labels on road surface type and quality. The CSV file streetSurfaceVis_v1_0.csv contains all image metadata and four folders contain the image files. All images are available in four different sizes, based on the image width, in 256px, 1024px, 2048px and the original size.Folders containing the images are named according to the respective image size. Image files are named based on the mapillary_image_id.
You can find the corresponding publication here: StreetSurfaceVis: a dataset of crowdsourced street-level imagery with semi-automated annotations of road surface type and quality
Image metadata
Each CSV record contains information about one street-level image with the following attributes:
mapillary_image_id: ID provided by Mapillary (see information below on Mapillary)
user_id: Mapillary user ID of contributor
user_name: Mapillary user name of contributor
captured_at: timestamp, capture time of image
longitude, latitude: location the image was taken at
train: Suggestion to split train and test data. True for train data and False for test data. Test data contains data from 5 cities which are excluded in the training data.
surface_type: Surface type of the road in the focal area (the center of the lower image half) of the image. Possible values: asphalt, concrete, paving_stones, sett, unpaved
surface_quality: Surface quality of the road in the focal area of the image. Possible values: (1) excellent, (2) good, (3) intermediate, (4) bad, (5) very bad (see the attached Labeling Guide document for details)
Image source
Images are obtained from Mapillary, a crowd-sourcing plattform for street-level imagery. More metadata about each image can be obtained via the Mapillary API . User-generated images are shared by Mapillary under the CC-BY-SA License.
For each image, the dataset contains the mapillary_image_id and user_name. You can access user information on the Mapillary website by https://www.mapillary.com/app/user/ and image information by https://www.mapillary.com/app/?focus=photo&pKey=
If you use the provided images, please adhere to the terms of use of Mapillary.
Instances per class
Total number of images: 9,122
excellent good intermediate bad very bad
asphalt 971 1697 821
concrete 314 350 250
paving stones 385 1063 519
129 694
-
326 387 303
For modeling, we recommend using a train-test split where the test data includes geospatially distinct areas, thereby ensuring the model's ability to generalize to unseen regions is tested. We propose five cities varying in population size and from different regions in Germany for testing - images are tagged accordingly.
Number of test images (train-test split): 776
Inter-rater-reliablility
Three annotators labeled the dataset, such that each image was annotated by one person. Annotators were encouraged to consult each other for a second opinion when uncertain.1,800 images were annotated by all three annotators, resulting in a Krippendorff's alpha of 0.96 for surface type and 0.74 for surface quality.
Recommended image preprocessing
As the focal road located in the bottom center of the street-level image is labeled, it is recommended to crop images to their lower and middle half prior using for classification tasks.
This is an exemplary code for recommended image preprocessing in Python:
from PIL import Imageimg = Image.open(image_path)width, height = img.sizeimg_cropped = img.crop((0.25 * width, 0.5 * height, 0.75 * width, height))
License
CC-BY-SA
Citation
If you use this dataset, please cite as:
Kapp, A., Hoffmann, E., Weigmann, E. et al. StreetSurfaceVis: a dataset of crowdsourced street-level imagery annotated by road surface type and quality. Sci Data 12, 92 (2025). https://doi.org/10.1038/s41597-024-04295-9
@article{kapp_streetsurfacevis_2025, title = {{StreetSurfaceVis}: a dataset of crowdsourced street-level imagery annotated by road surface type and quality}, volume = {12}, issn = {2052-4463}, url = {https://doi.org/10.1038/s41597-024-04295-9}, doi = {10.1038/s41597-024-04295-9}, pages = {92}, number = {1}, journaltitle = {Scientific Data}, shortjournal = {Scientific Data}, author = {Kapp, Alexandra and Hoffmann, Edith and Weigmann, Esther and Mihaljević, Helena}, date = {2025-01-16},}
This is part of the SurfaceAI project at the University of Applied Sciences, HTW Berlin.
Contact: surface-ai@htw-berlin.de
https://surfaceai.github.io/surfaceai/
Funding: SurfaceAI is a mFund project funded by the Federal Ministry for Digital and Transportation Germany.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Perfectly Accurate, Synthetic dataset featuring a virtual railway EnVironment for Multi-View Stereopsis (RailEnV-PASMVS) is presented, consisting of 40 scenes and 79,800 renderings together with ground truth depth maps, extrinsic and intrinsic camera parameters and binary segmentation masks of all the track components and surrounding environment. Every scene is rendered from a set of 3 cameras, each positioned relative to the track for optimal 3D reconstruction of the rail profile. The set of cameras is translated across the 100-meter length of tangent (straight) track to yield a total of 1,995 camera views. Photorealistic lighting of each of the 40 scenes is achieved with the implementation of high-definition, high dynamic range (HDR) environmental textures. Additional variation is introduced in the form of camera focal lengths, random noise for the camera location and rotation parameters and shader modifications of the rail profile. Representative track geometry data is used to generate random and unique vertical alignment data for the rail profile for every scene. This primary, synthetic dataset is augmented by a smaller image collection consisting of 320 manually annotated photographs for improved segmentation performance. The specular rail profile represents the most challenging component for MVS reconstruction algorithms, pipelines and neural network architectures, increasing the ambiguity and complexity of the data distribution. RailEnV-PASMVS represents an application specific dataset for railway engineering, against the backdrop of existing datasets available in the field of computer vision, providing the precision required for novel research applications in the field of transportation engineering.
File descriptions
Steps to reproduce
The open source Blender software suite (https://www.blender.org/) was used to generate the dataset, with the entire pipeline developed using the exposed Python API interface. The camera trajectory is kept fixed for all 40 scenes, except for small perturbations introduced in the form of random noise to increase the camera variation. The camera intrinsic information was initially exported as a single CSV file (scene.csv) for every scene, from which the camera information files were generated; this includes the focal length (focalLengthmm), image sensor dimensions (pixelDimensionX, pixelDimensionY), position, coordinate vector (vectC) and rotation vector (vectR). The STL model files, as provided in this data repository, were exported directly from Blender, such that the geometry/scenes can be reproduced. The data processing below is written for a Python implementation, transforming the information from Blender's coordinate system into universal rotation (R_world2cv) and translation (T_world2cv) matrices.
import numpy as np
from scipy.spatial.transform import Rotation as R
#The intrinsic matrix K is constructed using the following formulation:
focalLengthPixel = focalLengthmm x pixelDimensionX / sensorWidthmm
K = [[focalLengthPixel, 0, dimX/2],
[0, focalPixel, dimY/2],
[0, 0, 1]]
#The rotation vector as provided by Blender was first transformed to a rotation matrix:
r = R.from_euler('xyz', vectR, degrees=True)
matR = r.as_matrix()
#Transpose the rotation matrix, to find matrix from the WORLD to BLENDER coordinate system:
R_world2bcam = np.transpose(matR)
#The matrix describing the transformation from BLENDER to CV/STANDARD coordinates is:
R_bcam2cv = np.array([[1, 0, 0],
[0, -1, 0],
[0, 0, -1]])
#Thus the representation from WORLD to CV/STANDARD coordinates is:
R_world2cv = R_bcam2cv.dot(R_world2bcam)
#The camera coordinate vector requires a similar transformation moving from BLENDER to WORLD coordinates:
T_world2bcam = -1 * R_world2bcam.dot(vectC)
T_world2cv = R_bcam2cv.dot(T_world2bcam)
The resulting R_world2cv and T_world2cv matrices are written to the camera information file using exactly the same format as that of BlendedMVS developed by Dr. Yao. The original rotation and translation information can be found by following the process in reverse. Note that additional steps were required to convert from Blender's unique coordinate system to that of OpenCV; this ensures universal compatibility in the way that the camera intrinsic and extrinsic information is provided.
Equivalent GPS information is provided (gps.csv), whereby the local coordinate frame is transformed into equivalent GPS information, centered around the Engineering 4.0 campus, University of Pretoria, South Africa. This information is embedded within the JPG files as EXIF data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Object Detection for Olfactory References (ODOR) Dataset Real-world applications of computer vision in the humanities require algorithms to be robust against artistic abstraction, peripheral objects, and subtle differences between fine-grained target classes. Existing datasets provide instance-level annotations on artworks but are generally biased towards the image centre and limited with regard to detailed object classes. The ODOR dataset fills this gap, offering 38,116 object-level annotations across 4,712 images, spanning an extensive set of 139 fine-grained categories. It has challenging dataset properties, such as a detailed set of categories, dense and overlapping objects, and spatial distribution over the whole image canvas. Inspiring further research on artwork object detection and broader visual cultural heritage studies, the dataset challenges researchers to explore the intersection of object recognition and smell perception. How to use To download the dataset images, run the download_imgs.py script in the subfolder. The images will be downloaded to the imgs folder. The annotations are provided in COCO JSON format. To represent the two-level hierarchy of the object classes, we make use of the supercategory field in the categories array as defined by COCO. In addition to the object-level annotations, we provide an additional CSV file with image-level metadata, which includes content-related fields, such as Iconclass codes or image descriptions, as well as formal annotations, such as artist, license, or creation year. For the sake of license compliance, we do not publish the images directly (although most of the images are public domain). Instead, we provide links to their source collections in the metadata file (meta.csv) and a python script to download the artwork images (download_images.py). The mapping between the images array of the annotations.json and the metadata.csv file can be accomplished via the file_name attribute of the elements of the images array and the unique File Name column of the metadata.csv file, respectively.
Facebook
TwitterDataset includes aerial images and videos of dolphins, being taken by drones. The data was captured from few places (Italy and Israel coast lines).
The datsaset was collected in aim to perform automated dolphins detection of aerial images, and dolphins tracking from aerial videos.
The Project description and results in the github link, which describes and visualizes the paper (link to the paper).
The dataset includes two zip files:
Detection.zip
Tracking.zip
For both files, the data annotation format is identical, and described below.
In aim to watch each file content, use the preview option, in addition a description appears later on this section.
Annotations format
The data annotation format is inspired by the requested format of Keras RetinaNet Implementation, which was used for training in the Dolphins Detection Phase.
Each object is annotated by a bounding box. All data was annotated using Labelbox application.
For each subset there are two corresponded CSV files:
Annotation file
Class mapping file
Each line in the Annotations CSV file contains an annotation (bounding box) in an image or frame. The format of each line of the CSV annotation is:
path/to/image.jpg,x1,y1,x2,y2,class_name
path/to/image.jpg - a path to the image/frame
x1, y1 - image coordinates of the left upper corner of the bounding box
x2, y2 - image coordinates of the right bottom corner of the bounding box
class_name - class name of the annotated object
An example from train_set.csv:
.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin .\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin
This defines a dataset with 2 images:
1146_20170730101_ce1_sc_GOPR3047 103.jpg contains 2 bounding boxes which contains dolphins.
1146_20170730101_ce1_sc_GOPR3047 104.jpg contains 3 bounding boxes which contains dolphins.
Each line in the Class Mapping CSV file contains a mapping:
class_name,id
An example:
Dolphin,0
Detection
The data for dolphins' detection is separated to three sub-directories: train, validation and test sets.
Since all files contain only one class - Dolphin, there is one class_mapping.csv which is can be used for all the three subsets.
Detection dataset folder includes:
A folder for each - train, validation and test sets, which includes the images
An annotations CSV file for each - train, validation and test sets
A class mapping csv file (for all the sets)
There is an annotation CSV file for each of the subset.
Tracking
For the tracking phase, trackers were examined and evaluated on 5 videos. Each video has its annotation and class mapping CSV files. In addition, extracted each video's frames are available in the frames directory.
Tracking dataset folder includes a folder for each video (5 videos), which contain:
frames directory, which includes extracted frames of the video
An annotations CSV
A class mapping csv file
The original video
The examined videos description and details:
Detection and Tracking dataset structure:
Detection |——————train_set (images) |——————train_set.csv |——————validation_set (images) |——————train_set.csv |——————test_set (images) |——————train_set.csv └——————class_mapping.csv
Tracking |——————DJI_0195_trim_0015_0045 | └——————frames (images) | └——————annotations_DJI_0195_trim_0015_0045.csv | └——————class_mapping_DJI_0195_trim_0015_0045.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_0010_0025 | └——————frames (images) | └——————annotations_DJI_0395_trim_0010_0025.csv | └——————class_mapping_DJI_0395_trim_0010_0025.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_00140_00150 | └——————frames (images) | └——————annotations_DJI_0395_trim_00140_00150.csv | └——————class_mapping_DJI_0395_trim_00140_00150.csv | └——————DJI_0395_trim_00140_00150.MP4 |——————DJI_0395_trim_0055_0085 | └——————frames (images) | └——————annotations_DJI_0395_trim_0055_0085.csv | └——————class_mapping_DJI_0395_trim_0055_0085.csv | └——————DJI_0395_trim_0055_0085.MP4 └——————HighToLow_trim_0045_0070 └—————frames (images) └—————annotations_HighToLow_trim_0045_0070.csv └—————class_mapping_HighToLow_trim_0045_0070.csv └—————HighToLow_trim_0045_0070.MP4
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).We build MNIST4OD in the following way:To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits. For implementation simplicity we then flatten the images (28 X 28) into vectors.Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9).See the following numbers for a complete list of the statistics of each datasets ( Name | Instances | Dimensions | Number of Outliers in % ):MNIST_0 | 7594 | 784 | 10MNIST_1 | 8665 | 784 | 10MNIST_2 | 7689 | 784 | 10MNIST_3 | 7856 | 784 | 10MNIST_4 | 7507 | 784 | 10MNIST_5 | 6945 | 784 | 10MNIST_6 | 7564 | 784 | 10MNIST_7 | 8023 | 784 | 10MNIST_8 | 7508 | 784 | 10MNIST_9 | 7654 | 784 | 10
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Looking for a free Walmart product dataset? The Walmart Products Free Dataset delivers a ready-to-use ecommerce product data CSV containing ~2,100 verified product records from Walmart.com. It includes vital details like product titles, prices, categories, brand info, availability, and descriptions — perfect for data analysis, price comparison, market research, or building machine-learning models.
Complete Product Metadata: Each entry includes URL, title, brand, SKU, price, currency, description, availability, delivery method, average rating, total ratings, image links, unique ID, and timestamp.
CSV Format, Ready to Use: Download instantly - no need for scraping, cleaning or formatting.
Good for E-commerce Research & ML: Ideal for product cataloging, price tracking, demand forecasting, recommendation systems, or data-driven projects.
Free & Easy Access: Priced at USD $0.0, making it a great starting point for developers, data analysts or students.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a part of a fundamental research to produce an IoT monitoring device for fishpond. The hypothesis of this research is that the health of a fishpond can be inferred from the visual data. To build the dataset, several conditions data is gathered. The temperature, pH level, and total dissolved solid (TDS) were collected in several location at different time. At each time and location, an aerial photo of the pond was also collected using drone at several height. The images is presented in two condition: the raw original images of the ponds, and the cropped image on each data point. The conditions data is collected by using appropriate digital sensor for each parameter. The dataset consists of 975 data rows. Each row represent the condition and visual image (in 100 x 100 pixels images) of a fishpond at certain time and location. To use the dataset, please access the pond_dataset.csv file. The file contains the tabular data of 13 ponds (each pond represent different location and different collection time). For each row, the visual image file name is presented. To access the image file, please search in the images folder and find the corresponding image file according to the name listed in the csv file. The dataset can be used to study the correlation of each parameter. For example, the research originally study the correlation about the visual data with the conditions data. To do this, the image need to be preprocessed. The image data can be converted into a histogram data, or any other visual preprocessing method and result.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The PULP-Dronet v3 dataset The Himax dataset has been collected at University of Bologna and Technology Innovation Institute using a Himax ultra-low power, gray-scale, and QVGA camera mounted on a Bitcraze Crazyflie nano-drone. The dataset has been used for training and testing the PULP-Dronet v3 CNN, a neural network for autonomous visual-based navigation for nano-drones. This release includes the training and testing set described in the paper.
Resources
Paper: Distilling Tiny and Ultra-fast Deep Neural Networks for Autonomous Navigation on Nano-UAVs
Code available: pulp-platform/pulp-dronet
Video: https://youtu.be/ehNlDyhsVSc
Dataset Description
We collected a dataset of 77k images for nano-drones' autonomous navigation, for a total of 600MB of data.
We used the Bitcraze Crazyflie 2.1, collecting images from the AI-Deck's Himax HM01B0 monocrome camera.
The images in the PULP-Dronet v3 dataset have the following characteristics:
Resolution: each image has a QVGA resolution of 324x244 pixels.
Color: all images are grayscale, so they have 1 single channel.
Format: the images are stored in .jpeg format.
A human pilot manually flew the drone, collecting i) images from the grayscale QVGA Himax camera sensor of the AI-deck, ii) the gamepad's yaw-rate, normalized in the [-1;+1] range, inputted from the human pilot, iii) the drone's estimated state, and iv) the distance between obstacles and the drone measured by the front-looking ToF sensor.
After the data collection, we labeled all the images with a binary collision label whenever an obstacle was in the line of sight and closer than 2m. We recorded 301 sequences in 20 different environments. Each sequence of data is labeled with high-level characteristics, listed in characteristics.json:
For training our CNNs, we augmented the training images by applying random cropping, flipping, brightness augmentation, vignetting, and blur. The resulting dataset has 157k images, split as follows: 110k, 7k, 15k images for training, validation, and testing, respectively.
To address the labels' bias towards the center of the [-1;+1] yaw-rate range in our testing dataset, we balanced the dataset by selectively removing a portion of images that had a yaw-rate of 0. Specifically, we removed (only from the test set) some images having yaw_rate==0 and collision==1.
Dataset Train Images Validation Images Test Images Total
PULP-Dronet v3 53,830 7,798 15,790 77,418
PULP-Dronet v3 testing 53,830 7,798 3,071 64,699
PULP-Dronet v3 training 110,138 15,812 31,744 157,694
we use the PULP-Dronet v3 training for training and the PULP-Dronet v3 testing for validation/testing, this is the final split:
Dataset Train Images Validation Images Test Images Total
Final
110,138
7,798
3,071
121,007
Notes:
PULP-Dronet v3 and PULP-Dronet v3 testing datasets: Images are in full QVGA resolution (324x244px), uncropped.
PULP-Dronet v3 training dataset: Images are cropped to 200x200px, matching the PULP-Dronet input resolution. Cropping was done randomly on the full-resolution images to create variations.
Dataset Structure
. └── Dataset_PULP_Dronet_v3_*/ ├── ETH finetuning/ │ ├── acquisition1/ │ │ ├── characteristics.json # metadata │ │ ├── images/ # images folder │ │ ├── labels_partitioned.csv # Labels for PULP-Dronet │ │ └── state_labels_DroneState.csv # raw data from the crazyflie | ... │ └── acquisition39/ ├── Lorenzo Bellone/ │ ├── acquisition1/ | ... │ └── acquisition19/ ├── Lorenzo Lamberti/ │ ├── dataset-session1/ | │ ├── acquisition1/ | | ... | │ └── acquisition29/ │ ├── dataset-session2/ | │ ├── acquisition1/ | | ... | │ └── acquisition55/ │ ├── dataset-session3/ | │ ├── acquisition1/ | | ... | │ └── acquisition65/ │ └── dataset-session4/ | ├── acquisition1/ | ... | └── acquisition51/ ├── Michal Barcis/ │ ├── acquisition1/ | ... │ └── acquisition18/ └── TII finetuning/ ├── dataset-session1/ │ ├── acquisition1/ | ... │ └── acquisition18/ └── dataset-session2/ ├── acquisition1/ ... └── acquisition39/
This structure applies for all the three sets mentioned above: PULP_Dronet_v3, PULP_Dronet_v3_training, PULP_Dronet_v3_testing.
Dataset Labels
The file contains metadata for the PULP-Dronet v3 image dataset.
The file includes the following columns:
filename: The name of the image file (e.g., 25153.jpeg).
label_yaw_rate: The yaw rate label, representing the rotational velocity. values are in the [-1, +1] range, where YawRate > 0 means counter-clockwise turn --> turn left, and YawRate < 0 means clockwise turn --> turn right.
label_collision: The collision label, in range [0,1]. 0 denotes no collision and 1 indicates a collision.
partition: The dataset partition, i.e., train, test, or valid.
contains metadata. This might be useful the user to filter the dataset on some specific characteristics, or to partition the images types equally:
scenario (i.e., indoor or outdoor);
path type (i.e., presence or absence of turns);
obstacle types (e.g., pedestrians, chairs);
flight height (i.e., 0.5, 1.0, 1.5 m/s);
behaviour in presence of obstacles (i.e., overpassing, stand still, n/a);
light conditions (dark, normal, bright, mixed);
a location name identifier;
acquisition date.
the same as labels_partitioned.csv, but without the partition column. You can use this file to repeat the partition into train, valid, and test sets.
This is the raw data logged from the crazyflie at ~100 samples/s.
The file includes the following columns:
timeTicks: The timestamp.
range.front: The distance measurement from the front VL53L1x ToF sensor [mm].
mRange.rangeStatusFront: The status code of the front range sensor (check the VL53L1x datasheet for more info)
controller.yawRate: The yaw rate command given by the human pilot (in radians per second).
ctrltarget.yaw: The target yaw rate set by the control system (in radians per second).
stateEstimateZ.rateYaw: The estimated yaw rate from the drone's state estimator (in radians per second).
Data Processing Workflow
You can find the scripts at pulp-platform/pulp-dronet
dataset_processing.py:
Input: state_labels_DroneState.csv
Output: labeled_images.csv
Function: matches the drone state labels (~100Hz) timestamp to the image's timestamp (~10Hz), discarding extra drone states.
dataset_partitioning.py:
Input: labeled_images.csv
Output: labels_partitioned.csv
Function: Partitions the labeled images into training, validation, and test sets.
License
We release this dataset as open source under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, see LICENSE.CC.md.
Facebook
TwitterThis dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).
You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly
import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os
for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:
for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks,
connections = mp_hands.HAND_CONNECTIONS)
a = dict()
a['label'] = t
for i in range(21):
s = ('x','y','z')
k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z)
for j in range(len(k)):
a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j]
df = df.append(a,ignore_index=True)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
View on bioimage.io # HPA Single Cell Classification Dataset 2021
More information: https://www.kaggle.com/competitions/hpa-single-cell-image-classification/data
On the data page below, you will find a set of full size original images (a mix of 1728x1728, 2048x2048 and 3072x3072 PNG files) in train.zip and test.zip. (Please note that since this is a code competition, part of test data will be hidden)
You will also need the image level labels from train.csv and the filenames for the test set from sample_submission.csv. As many Kagglers made use of all public images in HPA for previous classification challenge, we made the public HPA images available to download as instructed in this notebook. Note also that there are TFRecords available if competitors would like to use TPUs.
The 16-bit version of the training images are available here. Additional training images are available here.
The training image-level labels are provided for each sample in train.csv. The bulk of the data for images - train.zip. Each sample consists of four files. Each file represents a different filter on the subcellular protein patterns represented by the sample. The format should be [filename]_[filter color].png for the PNG files. Colors are red for microtubule channels, blue for nuclei channels, yellow for Endoplasmic Reticulum (ER) channels, and green for protein of interest.
You are predicting protein organelle localization labels for each cell in the image. Border cells are included when there is enough information to decide on the labels.
There are in total 19 different labels present in the dataset (18 labels for specific locations, and label 18 for negative and unspecific signal). The dataset is acquired in a highly standardized way using one imaging modality (confocal microscopy). However, the dataset comprises 17 different cell types of highly different morphology, which affect the protein patterns of the different organelles. All image samples are represented by four filters (stored as individual files), the protein of interest (green) plus three cellular landmarks: nucleus (blue), microtubules (red), endoplasmic reticulum (yellow). The green filter should hence be used to predict the label, and the other filters are used as references. The labels are represented as integers that map to the following:
0. Nucleoplasm
1. Nuclear membrane
2. Nucleoli
3. Nucleoli fibrillar center
4. Nuclear speckles
5. Nuclear bodies
6. Endoplasmic reticulum
7. Golgi apparatus
8. Intermediate filaments
9. Actin filaments
10. Microtubules
11. Mitotic spindle
12. Centrosome
13. Plasma membrane
14. Mitochondria
15. Aggresome
16. Cytosol
17. Vesicles and punctate cytosolic patterns
18. Negative
The labels you will get for training are image level labels while the task is to predict cell level labels. That is to say, each training image contains a number of cells that have collectively been labeled as described above and the prediction task is to look at images of the same type and predict the labels of each individual cell within those images.
As the training labels are a collective label for all the cells in an image, it means that each labeled pattern can be seen in the image but not necessarily that each cell within the image expresses the pattern. This imprecise labeling is what we refer to as weak.
During the challenge you will both need to segment the cells in the images and predict the labels of those segmented cells.
Files: - train - training images (in .tif) - test - test images (in .png) - the task of the competition is to segment and label the images in this folder - train.csv - filenames and image level labels for the training set - sample_submission.csv - filenames for the test set, and a guide to constructing a working submission.
Columns: - ID - The base filename of the sample. As noted above all samples consist of four files - blue, green, red, and yellow. - Label - in the training data, this represents the labels assigned to each sample; in submission, this represent the labels assigned to each cell.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Arto (From Huggingface) [source]
The train.csv file contains a list of image filenames, captions, and the actual images used for training the image captioning models. Similarly, the test.csv file includes a separate set of image filenames, captions, and images specifically designated for testing the accuracy and performance of the trained models.
Furthermore, the valid.csv file contains a unique collection of image filenames with their respective captions and images that serve as an independent validation set to evaluate the models' capabilities accurately.
Each entry in these CSV files includes both a filename string that indicates the name or identifier of an image file stored in another location or directory. Additionally,** each entry also provides a list (or multiple rows) o**f strings representing written descriptions or captions describing each respective image given its filename.
Considering these details about this dataset's structure, it can be immensely valuable to researchers, developers, and enthusiasts working on developing innovative computer vision algorithms such as automatic text generation based on visual content analysis. Whether it's training machine learning models to automatically generate relevant captions based on new unseen images or evaluating existing systems' performance against diverse criteria.
Stay updated with cutting-edge research trends by leveraging this comprehensive dataset containing not only captio**ns but also corresponding imag**es across different sets specifically designed to cater to varied purposes within computer vision tasks. »
Overview of the Dataset
The dataset consists of three primary files:
train.csv,test.csv, andvalid.csv. These files contain information about image filenames and their respective captions. Each file includes multiple captions for each image to support diverse training techniques.Understanding the Files
- train.csv: This file contains filenames (
filenamecolumn) and their corresponding captions (captionscolumn) for training your image captioning model.- test.csv: The test set is included in this file, which contains a similar structure as that of
train.csv. The purpose of this file is to evaluate your trained models on unseen data.- valid.csv: This validation set provides images with their respective filenames (
filename) and captions (captions). It allows you to fine-tune your models based on performance during evaluation.Getting Started
To begin utilizing this dataset effectively, follow these steps:
- Extract the zip file containing all relevant data files onto your local machine or cloud environment.
- Familiarize yourself with each CSV file's structure:
train.csv,test.csv, andvalid.csv. Understand how information like filename(s) (filename) corresponds with its respective caption(s) (captions).- Depending on your specific use case or research goals, determine which portion(s) of the dataset you wish to work with (e.g., only train or train+validation).
- Load the dataset into your preferred programming environment or machine learning framework, ensuring you have the necessary dependencies installed.
- Preprocess the dataset as needed, such as resizing images to a specific dimension or encoding captions for model training purposes.
- Split the data into training, validation, and test sets according to your experimental design requirements.
- Use appropriate algorithms and techniques to train your image captioning models on the provided data.
Enhancing Model Performance
To optimize model performance using this dataset, consider these tips:
- Explore different architectures and pre-trained models specifically designed for image captioning tasks.
- Experiment with various natural language
- Image Captioning: This dataset can be used to train and evaluate image captioning models. The captions can be used as target labels for training, and the images can be paired with the captions to generate descriptive captions for test images.
- Image Retrieval: The dataset can be used for image retrieval tasks where given a query caption, the model needs to retrieve the images that best match the description. This can be useful in applications such as content-based image search.
- Natural Language Processing: The dataset can also be used for natural language processing tasks such as text generation or machine translation. The captions in this dataset are descriptive ...