Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains metadata related to three categories of AI and computer vision applications:
Handwritten Math Solutions: Metadata on images of handwritten math problems with step-by-step solutions.
Multi-lingual Street Signs: Road sign images in various languages, with translations.
Security Camera Anomalies: Surveillance footage metadata distinguishing between normal and suspicious activities.
The dataset is useful for machine learning, image recognition, OCR (Optical Character Recognition), anomaly detection, and AI model training.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
VQA is a multimodal task wherein, given an image and a natural language question related to the image, the objective is to produce a natural language answer correctly as output.
It involves understanding the content of the image and correlating it with the context of the question asked. Because we need to compare the semantics of information present in both of the modalities β the image and natural language question related to it β VQA entails a wide range of sub-problems in both CV and NLP (such as object detection and recognition, scene classification, counting, and so on). Thus, it is considered an AI-complete task.
Facebook
TwitterThis dataset contain 20 classes which include 'person', 'car', 'chair', 'bottle', 'pottedplant', 'bird', 'dog', 'sofa', 'bicycle', 'horse', 'boat', 'motorbike', 'cat', 'tvmonitor', 'cow', 'sheep', 'aeroplane', 'train', 'diningtable', 'bus' and also have file Image Data which contain 'Filename' 'Width' 'Height' 'Name' 'xmin' 'xmax' 'ymin' 'ymax'
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder has AI models that are required to run the Paper Grading project.
In total, more than 100.000 visual data was used to train these models. Most of the data were created by me. r50.pth is a Detectron2 Faster R-CNN model, others are ResNet models trained using Fast.Ai API.
Model explanations:
- abcde.pkl: ResNet Image Classification model with 18 layers. It can predict which letter was marked by the student. It has five different output classes: A-B-C-D-E on 224x224 input images.
- config.yaml: Configuration file for Faster R-CNN model. For more details please check Detectron2 documentation.
- dogru_yanlis.pkl: ResNet Image Classification model with 18 layers, which can classify if there is a tick (β) or cross (X) on 48x48 input images.
- eslestirme.pkl: ResNet Image Classification model with 34 layers, which can predict the letter (A-H) written by the students on 48x48 images.
- gecerli_gecersiz.pkl: ResNet Image Classification model with 18 layers which can predict if the multiple-choice question marked by the student is valid or not (empty, marked more than one letter, etc.)
- r50.pth: Faster R-CNN model with 101 layers. Its aim is to detect questions on photos or scans of real exam papers. It can detect:
1. True-False
2. Matching
3. Gap Filling
4. Multiple Choice
question types as well as Name-Surname box. Please check the image below:
https://tasarimciogretmen-storage.s3.eu-south-1.amazonaws.com/Resim1.png" alt="">
Facebook
TwitterShip or vessel detection has a wide range of applications, in the areas of maritime safety, fisheries management, marine pollution, defence and maritime security, protection from piracy, illegal migration, etc. Keeping this in mind, a Governmental Maritime and Coastguard Agency is planning to deploy a computer vision based automated system to identify ship type only from the images taken by the survey boats. You have been hired as a consultant to build an efficient model for this project.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Cat and Dog Classification dataset is a standard computer vision dataset that involves classifying photos as either containing a dog or a cat. This dataset is provided as a subset of photos from a much larger dataset of approximately 25 thousands.
The dataset contains 24,998 images, split into 12,499 Cat images and 12,499 Dog images. The training images are divided equally between cat and dog images, while the test images are not labeled. This allows users to evaluate their models on unseen data.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7367057%2F498b0fc0a7a8cf40ac4337da82a4ebc5%2Fhow-to-introduce-a-dog-to-a-cat-blog-cover.webp?generation=1696702214010539&alt=media" alt="">
Facebook
TwitterThis dataset was created by Ismail ELBOUKNIFY
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains images of wind turbines in dynamic and changing backgrounds. I designed this dataset with drone photographers in mind. Many drones on the commercial market come pre-installed with a Software Development Kits or SDKs (such as DJI drones), which allows the user to program their drone in languages like Python. Therefore, commercially available drones with quality cameras can be paired together with their SDK to create incredible computer vision projects! These projects are limitless, so I will continue to contribute to this dataset with that in mind. STAY TUNED!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Welcome to the Vehicle Detection Image Dataset! This dataset is meticulously curated for object detection and tracking tasks, with a specific focus on vehicle detection. It serves as a valuable resource for researchers, developers, and enthusiasts seeking to advance the capabilities of computer vision systems.
The primary aim of this dataset is to facilitate precise object detection tasks, particularly in identifying and tracking vehicles within images. Whether you are engaged in academic research, developing commercial applications, or exploring the frontiers of computer vision, this dataset provides a solid foundation for your projects.
Both versions of the dataset undergo essential preprocessing steps, including resizing and orientation adjustments. Additionally, the Apply_Grayscale version undergoes augmentation to introduce grayscale variations, thereby enriching the dataset and improving model robustness.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2F4f23bd8094c892d1b6986c767b42baf4%2Fv2.png?generation=1712264632232641&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2Fbfb10eb2a4db31a62eb4615da824c387%2Fdetails_v1.png?generation=1712264660626280&alt=media" alt="">
To ensure compatibility with a wide range of object detection frameworks and tools, each version of the dataset is available in multiple formats:
These formats facilitate seamless integration into various machine learning frameworks and libraries, empowering users to leverage their preferred development environments.
In addition to image datasets, we also provide a video for real-time object detection evaluation. This video allows users to test the performance of their models in real-world scenarios, providing invaluable insights into the effectiveness of their detection algorithms.
To begin exploring the Vehicle Detection Image Dataset, simply download the version and format that best suits your project requirements. Whether you are an experienced practitioner or just embarking on your journey in computer vision, this dataset offers a valuable resource for advancing your understanding and capabilities in object detection and tracking tasks.
If you utilize this dataset in your work, we kindly request that you cite the following:
Parisa Karimi Darabi. (2024). Vehicle Detection Image Dataset: Suitable for Object Detection and tracking Tasks. Retrieved from https://www.kaggle.com/datasets/pkdarabi/vehicle-detection-image-dataset/
I welcome feedback and contributions from the Kaggle community to continually enhance the quality and usability of this dataset. Please feel free to reach out if you have suggestions, questions, or additional data and annotations to contribute. Together, we can drive innovation and progress in computer vision.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by BalΓ‘zs SzabΓ³
Released under MIT
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Images of small objects for small instance detections. Currently, four object types are available.
We collect four datasets of small objects from images/videos on the Internet (e.g.YouTube or Google).
Fly Dataset: contains 600 video frames with an average of 86 Β± 39 flies per frame (648Γ72 @ 30 fps). 32 images are used for training (1:6:187) and 50 images for testing (301:6:600).
Honeybee Dataset: contains 118 images with an average of 28 Β± 6 honeybees per image (640Γ480). The dataset is divided evenly for training and test sets. Only the first 32 images are used for training.
Fish Dataset: contains 387 frames of video with an average of 56Β±9 fish per frame (300Γ410 @ 30 fps). 32 images are used for training (1:3:94) and 65 for testing (193:3:387).
Seagull Dataset: contains three high-resolution images (624Γ964) with an average of 866Β±107 seagulls per image. The first image is used for training, and the rest for testing.
Citation: Small Instance Detection by Integer Programming on Object Density Maps. Zheng Ma, Lei Yu, and Antoni B. Chan, In: IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, Jun 2015
original form of dataset available here
Developing object detection algorithms that are more accurate at detecting small objects or small instances of objects.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset comprises of annotated video frames from positioned in a public space camera. The tracking of each individual in the camera's view has been achieved using the rectangle tool in the Computer Vision Annotation Tool (CVAT).
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fc5a8dc4f63fe85c64a5fead10fad3031%2Fpersons_gif.gif?generation=1690705558283123&alt=media" alt="">
images directory houses the original video frames, serving as the primary source of raw data. annotations.xml file provides the detailed annotation data for the images. boxes directory contains frames that visually represent the bounding box annotations, showing the locations of the tracked individuals within each frame. These images can be used to understand how the tracking has been implemented and to visualize the marked areas for each individual.The annotations are represented as rectangle bounding boxes that are placed around each individual. Each bounding box annotation contains the position ( xtl-ytl-xbr-ybr coordinates ) for the respective box within the frame.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4f274551e10db2754c4d8a16dff97b33%2Fcarbon%20(10).png?generation=1687776281548084&alt=media" alt="">
π You can learn more about our high-quality unique datasets here
keywords: multiple people tracking, human detection dataset, object detection dataset, people tracking dataset, tracking human object interactions, human Identification tracking dataset, people detection annotations, detecting human in a crowd, human trafficking dataset, deep learning object tracking, multi-object tracking dataset, labeled web tracking dataset, large-scale object tracking dataset
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset has been collected by Edge Impulse and used extensively to design the FOMO (Faster Objects, More Objects) object detection architecture. See FOMO documentation or the announcement blog post.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1642573%2F79883abbfc2db2889457586f367002d9%2FScreenshot%202024-06-04%20at%2015.22.46.png?generation=1717508155176192&alt=media" alt="">
The dataset is composed of 70 images including: - 32 blue cubes, - 32 green cubes, - 30 red cubes - 28 yellow cubes
Download link: cubes on a conveyor belt dataset in Edge Impulse Object Detection format.
You can also retrieve this dataset from this Edge Impulse public project.
Data exported from an object detection project in the Edge Impulse Studio is exported in this format, see below to understand the format.
To import this data into a new Edge Impulse project, either use:
edge-impulse-uploader --clean --info-file info.labels
The Edge Impulse object detection acquisition format provides a simple and intuitive way to store images and associated bounding box labels. Folders containing data in this format will take the following structure:
.
βββ testing
β βββ bounding_boxes.labels
β βββ cubes.23im33f2.jpg
β βββ cubes.23j3rclu.jpg
β βββ cubes.23j4jeee.jpg
β ...
β βββ cubes.23j4k0rk.jpg
βββ training
βββ bounding_boxes.labels
βββ blue.23ijdngd.jpg
βββ combo.23ijkgsd.jpg
βββ cubes.23il4pon.jpg
βββ cubes.23im28tb..jpg
...
βββ yellow.23ijdp4o.jpg
2 directories, 73 files
The subdirectories contain image files in JPEG or PNG format. Each image file represents a sample and is associated with its respective bounding box labels in the bounding_boxes.labels file.
The bounding_boxes.labels file in each subdirectory provides detailed information about the labeled objects and their corresponding bounding boxes. The file follows a JSON format, with the following structure:
version: Indicates the version of the label format.files: A list of objects, where each object represents an image and its associated labels.
path: The path or file name of the image.category: Indicates whether the image belongs to the training or testing set.label: Provides information about the labeled objects.type: Specifies the type of label (e.g., a single label).label: The actual label or class name of the object.metadata: Additional metadata associated with the image, such as the site where it was collected, the timestamp or any useful information.boundingBoxes: A list of objects, where each object represents a bounding box for an object within the image.label: The label or class name of the object within the bounding box.x, y: The coordinates of the top-left corner of the bounding box.width, height: The width and height of the bounding box.bounding_boxes.labels example:
{
"version": 1,
"files": [
{
"path": "cubes.23im33f2.jpg",
"category": "testing",
"label": {
"type": "label",
"label": "cubes"
},
"metadata": {
"version": "2023-1234-LAB"
},
"boundingBoxes": [
{
"label": "green",
"x": 105,
"y": 201,
"width": 91,
"height": 90
},
{
"label": "blue",
"x": 283,
"y": 233,
"width": 86,
"height": 87
}
]
},
{
"path": "cubes.23j3rclu.jpg",
"category": "testing",
"label": {
"type": "label",
"label": "cubes"
},
"metadata": {
"version": "2023-4567-PROD"
},
"boundingBoxes": [
{
"label": "red",
...
Facebook
TwitterThis dataset was created by Ryan Holbrook
Released under Data files Β© Original Authors
It contains the following files:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Brain Tumor MRI dataset, curated by Roboflow Universe, is a comprehensive dataset designed for the detection and classification of brain tumors using advanced computer vision techniques. It comprises 3,903 MRI images categorized into four distinct classes:
Each image in the dataset is annotated with bounding boxes to indicate tumor locations, facilitating object detection tasks precisely. The dataset is structured into training (70%), validation (20%), and test (10%) sets, ensuring a robust framework for model development and evaluation.
The primary goal of this dataset is to aid in the early detection and diagnosis of brain tumors, contributing to improved treatment planning and patient outcomes. By offering a diverse range of annotated MRI images, this dataset enables researchers and practitioners to develop and fine-tune computer vision models with high accuracy in identifying and localizing brain tumors.
This dataset supports multiple annotation formats, including YOLOv8, YOLOv9, and YOLOv11, making it versatile and compatible with various machine-learning frameworks. Its integration with these formats ensures real-time and efficient object detection, ideal for applications requiring timely and precise results.
By leveraging this dataset, researchers and healthcare professionals can make significant strides in developing cutting-edge AI solutions for medical imaging, ultimately supporting more effective and accurate diagnoses in clinical settings.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2Fe03fba81bb62e32c0b73d6535a25cb8d%2F3.jpg?generation=1734173601629363&alt=media" alt="">
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset includes 1747 images. Cows-buffalo are annotated in YOLOv8 format.
class_id center_x center_y width height
The following pre-processing was applied to each image:
Auto-orientation of pixel data (with EXIF-orientation stripping) Resize to 640x640 (Stretch) No image augmentation techniques were applied.
Examples:-
0 0.3234375 0.421875 0.1015625 0.346875
0 0.5859375 0.53203125 0.2828125 0.575
0 0.1359375 0.584375 0.271875 0.525
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Fire Detection Dataset is a curated collection of images specifically designed for training, validating, and testing machine learning models tasked with detecting the presence of fire. Each image in the dataset is labeled with the class "fire," indicating whether fire is present or not.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Cans Detection and Counting Dataset
This dataset is designed for object detection and counting aluminum cans on a production conveyor belt using YOLOv8. It contains labeled images for training deep learning models to detect, track, and count cans in real-time manufacturing environments. π Dataset Structure
images/train/ β Training images
images/val/ β Validation images
labels/train/ β YOLO-format annotation files for training
labels/val/ β YOLO-format annotation files for validation
Each annotation file corresponds to an image and contains bounding boxes in the YOLO format (class x_center y_center width height). π οΈ How to Use
This dataset can be used for:
Training YOLO models (e.g., YOLOv8, YOLOv5)
Object detection and tracking in industrial automation
Real-time can counting on conveyor belts
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset includes 302 images. Face are annotated in YOLO v5 PyTorch format.
The following pre-processing was applied to each image: * Auto-orientation of pixel data (with EXIF-orientation stripping) * Resize to 640x640 (Stretch)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Player Detection and Tracking in Sports Videos Dataset Overview This dataset is meticulously crafted for the purpose of training and evaluating object detection and tracking models, particularly focusing on player detection in sports videos. The dataset is formatted for use with the YOLOv8 model, encompassing both images and corresponding labels.
Contents The dataset is organized into two main directories:
Images:
This directory contains a diverse set of images extracted from various sports videos. The images capture different scenarios, including close-ups, wide-angle shots, different lighting conditions, and varying levels of player density on the field. The images are saved in .jpg format and are named sequentially for easy reference. Labels:
Corresponding to each image, this directory includes annotation files in YOLOv8 format. These annotations provide the bounding box coordinates for each detected player along with their class labels. The label files are in .txt format, named identically to their corresponding image files to ensure seamless pairing.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains metadata related to three categories of AI and computer vision applications:
Handwritten Math Solutions: Metadata on images of handwritten math problems with step-by-step solutions.
Multi-lingual Street Signs: Road sign images in various languages, with translations.
Security Camera Anomalies: Surveillance footage metadata distinguishing between normal and suspicious activities.
The dataset is useful for machine learning, image recognition, OCR (Optical Character Recognition), anomaly detection, and AI model training.