29 datasets found

f
ORBIT: A real-world few-shot dataset for teachable object recognition...
city.figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25383/city.14294597.v3
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
SARD - Search And Rescue Dataset
kaggle.com
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NikolasGegenava (2025). SARD - Search And Rescue Dataset [Dataset]. https://www.kaggle.com/datasets/nikolasgegenava/sard-search-and-rescue/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
NikolasGegenava
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
🔎 SARD - Search And Rescue Dataset, Containing Only 1 Class (Human).

Dataset From: https://universe.roboflow.com/datasets-pdabr/sard-8xjhy

⛑️ This dataset contains multi-modal (Image / Labels) information collected from real-world images, taken by drone. It is designed to support the development and evaluation of AI models for locating, identifying, and tracking individuals in distress during disaster or emergency scenarios. To develop the SARD dataset, the authors involved actors, who simulated exhausted and injured people and classic types of movement. The images were recorded by high resolution camera with DJI Phantom 4A Drone.

👀 Best approach to this dataset is YOLO (You Only Look Once) models, especially v5 or v8 for detection.

📰 Publication Date: 2025, 2 May

Classes: Human

Images: Train 4k, Test 0.5k, Valid 1.1k
Data from: NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research...
zenodo.org
data.niaid.nih.gov
zip
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde (2023). NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection [Dataset]. http://doi.org/10.5281/zenodo.7931113
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7931113
Dataset updated
May 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NeSy4VRD

NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.

Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.

The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.

NeSy4VRD on Zenodo: the NeSy4VRD dataset package

This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.

The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.

Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.

The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.

Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.

All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.

NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code

The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of:

comprehensive open source Python-based infrastructure supporting the extensibility of the NeSy4VRD visual relationship annotations (and, thereby, the extensibility of the NeSy4VRD ontology, VRD-World, as well)

open source Python sample code showing how one can work with the NeSy4VRD visual relationship annotations in conjunction with the NeSy4VRD ontology, VRD-World, and RDF knowledge graphs.

The NeSy4VRD infrastructure supporting extensibility consists of:

open source Python code for conducting deep and comprehensive analyses of the NeSy4VRD dataset (the VRD images and their associated NeSy4VRD visual relationship annotations)

an open source, custom-designed NeSy4VRD protocol for specifying visual relationship annotation customisation instructions declaratively, in text files

an open source, custom-designed NeSy4VRD workflow, implemented using Python scripts and modules, for applying small or large volumes of customisations or extensions to the NeSy4VRD visual relationship annotations in a configurable, managed, automated and repeatable process.

The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.

The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.

To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:

use the analysis functionality of the NeSy4VRD extensibility infrastructure to find images containing water (by, say, searching for images whose visual relationships refer to object classes such as 'boat', 'surfboard', 'sand', 'umbrella', etc.);

use free image analysis software (such as GIMP, at gimp.org) to get bounding boxes for instances of water in these images;

use the NeSy4VRD protocol to specify new visual relationships for these images that refer to the new 'water' objects (e.g. <'boat', 'on', 'water'>);

use the NeSy4VRD workflow to introduce the new object class 'water' and to apply the specified new visual relationships to the sets of annotations for the affected images;

introduce class Water to the class hierarchy of the VRD-World ontology (using, say, the free Protege ontology editor);

continue experimenting, now with the added benefit of the additional stuff object class 'water';

contribute the enriched set of NeSy4VRD visual relationship
Face Mask Usage
kaggle.com
Updated Feb 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Arnold Nogra (2022). Face Mask Usage [Dataset]. https://www.kaggle.com/jamesnogra/face-mask-usage/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 26, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
James Arnold Nogra
Description
Demo

https://james.iamcebu.com/images/demo-face-mask.gif" alt="Facemask Usage Detection Demo">

Inspiration

As COVID-19 continues to spread across the world, leaders and individuals are finding ways to halt the spread of the virus. The World Health Organization (WHO) recommended in March 2020 to wear a face-covering to prevent people from inhaling out tiny droplets that may carry the virus. Properly mask wearing means the virus transmission can be lowered.

Code and Reasearch Paper

If you use this dataset, can you please cite this study. The GitHub code is also provided. Research Paper: https://www.jardcs.org/abstract.php?id=5699 Python Code: https://github.com/jamesnogra/ImproperMaskDetector

About the Data

The data consists of four folders, fully_covered, not_covered, not_face, and partially_covered. The folder fully_covered contains faces of people that are wearing face mask properly/correctly according to WHO standards. Folder partially_covered contains face images that the face mask only covers the mouth but not the nose openings. The folder not_covered are face images of people not wearing face mask at all. The not_face folder, which can be optional in training your data, are images that are detected in OpenCV face detection library that are obviously not faces of people.

Contact

If you want the full paper, just email me at jamesnogra@gmail.com or you can visit the information website for this study at https://james.iamcebu.com/#face-mask-detection.
r
Crowd counting database
researchdata.edu.au
researchdatafinder.qut.edu.au
Updated 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
QUT SAIVT: Speech, audio, image and video technologies research (2012). Crowd counting database [Dataset]. http://doi.org/10.4225/09/5858bfb708148
Explore at:
Unique identifier
https://doi.org/10.4225/09/5858bfb708148
Dataset updated
2012
Dataset provided by
Queensland University of Technology
Authors
QUT SAIVT: Speech, audio, image and video technologies research
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Area covered

Description
This dataset was collected for an assessment of a crowd counting alogorithm.

The dataset is a vision dataset taken from a QUT Campus and contains three challenging viewpoints, which are referred to as Camera A, Camera B and Camera C. The sequences contain reflections, shadows and difficult lighting fluctuations, which makes crowd counting difficult. Furthermore, Camera C is positioned at a particularly low camera angle, leading to stronger occlusion than is present in other datasets.

The QUT datasets are annotated at sparse intervals: every 100 frames for cameras B and C, and every 200 frames for camera A as this is a longer sequence. Testing is then performed by comparing the crowd size estimate to the ground truth at these sparse intervals, rather than at every frame. This closely resembles the intended real-world application of this technology, where an operator may periodically ‘query’ the system for a crowd count.

Due to the difficulty of the environmental conditions in these scenes, the first 400-500 frames of each sequence is set aside for learning the background model.
10,109 People - Face Images Dataset
nexdata.ai
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 10,109 People - Face Images Dataset [Dataset]. https://www.nexdata.ai/datasets/1402?source=Github
Explore at:
Dataset updated
Jun 14, 2024
Dataset authored and provided by
Nexdata
Variables measured
Data size, Data format, Data diversity, Age distribution, Race distribution, Gender distribution, Collecting environment
Description
10,109 people - face images dataset includes people collected from many countries. Multiple photos of each person’s daily life are collected, and the gender, race, age, etc. of the person being collected are marked.This Dataset provides a rich resource for artificial intelligence applications. It has been validated by multiple AI companies and proves beneficial for achieving outstanding performance in real-world applications. Throughout the process of Dataset collection, storage, and usage, we have consistently adhered to Dataset protection and privacy regulations to ensure the preservation of user privacy and legal rights. All Dataset comply with regulations such as GDPR, CCPA, PIPL, and other applicable laws.
R
Accident Detection Model Dataset
universe.roboflow.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Accident detection model
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accident Bounding Boxes
Description
Accident-Detection-Model

Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

Problem Statement

Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.

According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.

The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

Accidents survey

https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

Literature Survey

Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.

Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

Research Gap

Lack of real-world data - We trained model for more then 3200 images.

Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.

Outdated Versions of previous works - We aer using Latest version of Yolo v8.

Proposed methodology

We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.

This model after training with 25 iterations and is ready to detect an accident with a significant probability.

Model Set-up

Preparing Custom dataset

We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.

Then we annotated all of them individually on a tool called roboflow.

During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident

Then we divided the data set into train, val, test in the ratio of 8:1:1

At the final step we downloaded the dataset in yolov8 format.
#### Using Google Collab

We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.

You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.

Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.

In Google collab, First of all we Changed runtime from TPU to GPU.

We cross checked it by running command ‘!nvidia-smi’
#### Coding

First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’

Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’

Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’

Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’

After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’

Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’

The results are stored in the runs/detect/predict folder.
Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

Challenges I ran into

I majorly ran into 3 problems while making this model

I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.

I was facing problem on cvat website because i was not sure what
Traffic vehicles Object Detection 🚙🚍
kaggle.com
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hasibullah Aman (2023). Traffic vehicles Object Detection 🚙🚍 [Dataset]. https://www.kaggle.com/datasets/hasibullahaman/objectdetectiondatasetcar/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hasibullah Aman
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
Our goal in this project is to pay attention to and solve the problem of traffic and driver safety at intersections, for this purpose we have developed a system based on artificial intelligence that analyzes the traffic flow by analyzing traffic patterns and predicting the volume of traffic in A simpler intersection. With the benefits that this system has, such as reducing the waiting time of city residents and drivers, reducing fuel consumption and reducing air pollution, as well as reducing noise and controlling and preventing accidents, you can make a big contribution to the society. In this project, we will use machine learning algorithms and traffic cameras installed at a crossroads to control traffic in developing countries. We can evaluate the effectiveness of this project in the form of a simulation or study it in the real world with the conclusion of reducing traffic, increasing security for people and reducing fuel consumption, which itself plays a role in reducing air pollution and improving human quality. In general, in this project, we seek to be able to use artificial intelligence and machine learning algorithms to control traffic in a crossroads and improve urban transportation to help people improve their quality of life by reducing congestion. Traffic and waiting times at intersections, this system can make commuting less stressful and more efficient, allowing people to spend more time on other activities they enjoy.
NBF_StreetView
zenodo.org
zip
Updated Dec 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ismail Qayyum; Ayan Badar; Ayan Rizwan; Ismail Qayyum; Ayan Badar; Ayan Rizwan (2024). NBF_StreetView [Dataset]. http://doi.org/10.5281/zenodo.14552482
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14552482
Dataset updated
Dec 24, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ismail Qayyum; Ayan Badar; Ayan Rizwan; Ismail Qayyum; Ayan Badar; Ayan Rizwan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 1, 2024
Description
This dataset is designed to simulate real-world challenges faced by autonomous vehicles, particularly when encountering degraded images through sensors in various street environments. It includes a diverse set of images from streetview and pedestrian walkways, subjected to various types of degradation such as noise, blur, and flare, mimicking the environmental conditions that can affect sensor performance in automotive applications. The dataset captures a variety of scenes, including vehicles, people, and street scenes, providing a comprehensive representation of potential challenges in visual processing.

Ideal for deep learning and computer vision tasks, this dataset offers a robust resource for training and evaluating models focused on enhancing the resilience and accuracy of autonomous vehicle systems in degraded visual environments. Tasks such as image restoration, denoising, deblurring, and flare correction are well-suited to this dataset, making it an essential tool for advancing computer vision solutions within the automotive and urban infrastructure sectors.

For more specific details on how this dataset was created, please feel free to reach out via email.

Github: https://github.com/IsmailQayyum/NBF_StreetView

License:
This work is licensed under https://creativecommons.org/licenses/by-nc-nd/4.0/?ref=chooser-v1">CC By 4.0
Dark Face Dataset
kaggle.com
Updated May 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soumik Rakshit (2022). Dark Face Dataset [Dataset]. https://www.kaggle.com/datasets/soumikrakshit/dark-face-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Soumik Rakshit
Description
Description

The Dark Face dataset provides 6,000 real-world low light images captured during the nighttime, at teaching buildings, streets, bridges, overpasses, parks, etc., all labeled with bounding boxes for of human face, as the main training and/or validation sets. We also provide 9,000 unlabeled low-light images collected from the same setting. Additionally, we provided a unique set of 789 paired low-light/normal-light images captured in controllable real lighting conditions (but unnecessarily containing faces), which can be used as parts of the training data at the participants' discretization. There will be a hold-out testing set of 4,000 low-light images, with human face bounding boxes annotated.

Credits: Spatial and Temporal Restoration, Understanding and Compression Team, Wangxuan institute of computer technology, Peking University.

Citation

@ARTICLE{poor_visibility_benchmark, author={Yang, Wenhan and Yuan, Ye and Ren, Wenqi and Liu, Jiaying and Scheirer, Walter J. and Wang, Zhangyang and Zhang, and et al.}, journal={IEEE Transactions on Image Processing}, title={Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study}, year={2020}, volume={29}, number={}, pages={5737-5752}, doi={10.1109/TIP.2020.2981922} } @inproceedings{Chen2018Retinex, title={Deep Retinex Decomposition for Low-Light Enhancement}, author={Chen Wei, Wenjing Wang, Wenhan Yang, Jiaying Liu}, booktitle={British Machine Vision Conference}, year={2018}, }
ITOP Dataset
zenodo.org
application/gzip +1
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Haque; Albert Haque; Boya Peng; Zelun Luo; Alexandre Alahi; Serena Yeung; Serena Yeung; Li Fei-Fei; Li Fei-Fei; Boya Peng; Zelun Luo; Alexandre Alahi (2024). ITOP Dataset [Dataset]. http://doi.org/10.5281/zenodo.3932973
Explore at:
application/gzip, jpegAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3932973
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Albert Haque; Albert Haque; Boya Peng; Zelun Luo; Alexandre Alahi; Serena Yeung; Serena Yeung; Li Fei-Fei; Li Fei-Fei; Boya Peng; Zelun Luo; Alexandre Alahi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary

The ITOP dataset (Invariant Top View) contains 100K depth images from side and top views of a person in a scene. For each image, the location of 15 human body parts are labeled with 3-dimensional (x,y,z) coordinates, relative to the sensor's position. Read the full paper for more context [pdf].

Getting Started

Download then decompress the h5.gz file.

gunzip ITOP_side_test_depth_map.h5.gz

Using Python and h5py (pip install h5py or conda install h5py), we can load the contents:

import h5py import numpy as np f = h5py.File('ITOP_side_test_depth_map.h5', 'r') data, ids = f.get('data'), f.get('id') data, ids = np.asarray(data), np.asarray(ids) print(data.shape, ids.shape) # (10501, 240, 320) (10501,)

Note: For any of the *_images.h5.gz files, the underlying file is a tar file and not a h5 file. Please rename the file extension from h5.gz to tar.gz before opening. The following commands will work:

mv ITOP_side_test_images.h5.gz ITOP_side_test_images.tar.gz tar xf ITOP_side_test_images.tar.gz

Metadata

File sizes for images, depth maps, point clouds, and labels refer to the uncompressed size.

+-------+--------+---------+---------+----------+------------+--------------+---------+ | View | Split | Frames | People | Images | Depth Map | Point Cloud | Labels | +-------+--------+---------+---------+----------+------------+--------------+---------+ | Side | Train | 39,795 | 16 | 1.1 GiB | 5.7 GiB | 18 GiB | 2.9 GiB | | Side | Test | 10,501 | 4 | 276 MiB | 1.6 GiB | 4.6 GiB | 771 MiB | | Top | Train | 39,795 | 16 | 974 MiB | 5.7 GiB | 18 GiB | 2.9 GiB | | Top | Test | 10,501 | 4 | 261 MiB | 1.6 GiB | 4.6 GiB | 771 MiB | +-------+--------+---------+---------+----------+------------+--------------+---------+

Data Schema

Each file contains several HDF5 datasets at the root level. Dimensions, attributes, and data types are listed below. The key refers to the (HDF5) dataset name. Let \(n\) denote the number of images.

Transformation

To convert from point clouds to a \(240 \times 320\) image, the following transformations were used. Let \(x_{\textrm{img}}\) and \(y_{\textrm{img}}\) denote the \((x,y)\) coordinate in the image plane. Using the raw point cloud \((x,y,z)\) real world coordinates, we compute the depth map as follows: \(x_{\textrm{img}} = \frac{x}{Cz} + 160\) and \(y_{\textrm{img}} = -\frac{y}{Cz} + 120\) where \(C\approx 3.50×10^{−3} = 0.0035\) is the intrinsic camera calibration parameter. This results in the depth map: \((x_{\textrm{img}}, y_{\textrm{img}}, z)\).

Joint ID (Index) Mapping

joint_id_to_name = { 0: 'Head', 8: 'Torso', 1: 'Neck', 9: 'R Hip', 2: 'R Shoulder', 10: 'L Hip', 3: 'L Shoulder', 11: 'R Knee', 4: 'R Elbow', 12: 'L Knee', 5: 'L Elbow', 13: 'R Foot', 6: 'R Hand', 14: 'L Foot', 7: 'L Hand', }

Depth Maps

Key: id

Dimensions: \((n,)\)

Data Type: uint8

Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number.

Key: data

Dimensions: \((n,240,320)\)

Data Type: float16

Description: Depth map (i.e. mesh) corresponding to a single frame. Depth values are in real world meters (m).

Point Clouds

Key: id

Dimensions: \((n,)\)

Data Type: uint8

Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number.

Key: data

Dimensions: \((n,76800,3)\)

Data Type: float16

Description: Point cloud containing 76,800 points (240x320). Each point is represented by a 3D tuple measured in real world meters (m).

Labels

Key: id

Dimensions: \((n,)\)

Data Type: uint8

Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number.

Key: is_valid

Dimensions: \((n,)\)

Data Type: uint8

Description: Flag corresponding to the result of the human labeling effort. This is a boolean value (represented by an integer) where a one (1) denotes clean, human-approved data. A zero (0) denotes noisy human body part labels. If is_valid is equal to zero, you should not use any of the provided human joint locations for the particular frame.

Key: visible_joints

Dimensions: \((n,15)\)

Data Type: int16

Description: Binary mask indicating if each human joint is visible or occluded. This is denoted by \(\alpha\) in the paper. If \(\alpha_j=1\) then the \(j^{th}\) joint is visible (i.e. not occluded). Otherwise, if \(\alpha_j = 0\) then the \(j^{th}\) joint is occluded.

Key: image_coordinates

Dimensions: \((n,15,2)\)

Data Type: int16

Description: Two-dimensional \((x,y)\) points corresponding to the location of each joint in the depth image or depth map.

Key: real_world_coordinates

Dimensions: \((n,15,3)\)

Data Type: float16

Description: Three-dimensional \((x,y,z)\) points corresponding to the location of each joint in real world meters (m).

Key: segmentation

Dimensions: \((n,240,320)\)

Data Type: int8

Description: Pixel-wise assignment of body part labels. The background class (i.e. no body part) is denoted by −1.

Citation

If you would like to cite our work, please use the following.

Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L. (2016). Towards Viewpoint Invariant 3D Human Pose Estimation. European Conference on Computer Vision. Amsterdam, Netherlands. Springer.

@inproceedings{haque2016viewpoint, title={Towards Viewpoint Invariant 3D Human Pose Estimation}, author={Haque, Albert and Peng, Boya and Luo, Zelun and Alahi, Alexandre and Yeung, Serena and Fei-Fei, Li}, booktitle = {European Conference on Computer Vision}, month = {October}, year = {2016} }
R
Open Poetry Vision Object Detection Dataset - 512x512
public.roboflow.com
zip
Updated Apr 7, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brad Dwyer (2022). Open Poetry Vision Object Detection Dataset - 512x512 [Dataset]. https://public.roboflow.com/object-detection/open-poetry-vision/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 7, 2022
Dataset authored and provided by
Brad Dwyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Bounding Boxes of text
Description
Overview

The Open Poetry Vision dataset is a synthetic dataset created by Roboflow for OCR tasks.

It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.

Example Image: https://i.imgur.com/sZT516a.png" alt="Example Image">

Use Cases

A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.

Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.

Using this Dataset

Use the fork button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

Version 5 of this dataset (classes_all_text-raw-images) has all classes remapped to be labeled as "text." This was accomplished by using Modify Classes as a preprocessing step.

Version 6 of this dataset (classes_all_text-augmented-FAST) has all classes remapped to be labeled as "text." and was trained with Roboflow's Fast Model.

Version 7 of this dataset (classes_all_text-augmented-ACCURATE) has all classes remapped to be labeled as "text." and was trained with Roboflow's Accurate Model.

Introducing the New Roboflow Train

What to Think About When Choosing Model Sizes

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
d
Synthetic Operating Room Table (SORT) Dataset
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ireland, James; Radwan, Ibrahim; Herath, Damith; Goecke, Roland (2024). Synthetic Operating Room Table (SORT) Dataset [Dataset]. http://doi.org/10.7910/DVN/UCG5CW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/UCG5CW
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Ireland, James; Radwan, Ibrahim; Herath, Damith; Goecke, Roland
Description
*Note: Please download all files, place them into a single folder and then use 7-Zip to recombine split files back into the complete dataset. The Synthetic Operating Room Table (SORT) dataset is a large-scale computer vision focused on instance counting, segmentation and localisation of surgical instrument depictions placed on a table. The depictions contained are rendered using the Unreal game engine and annotated leveraging the UnrealCV plugin (Qui, 2017). SORT contains one container class, one material class (gauze) and six instrument classes namely, forceps, scalpels, pincettes (tweezers), syringes, periotomes, and scissors. Each class contains two different 3D representations equally likely to be present for a given instance, with the exception of the container class that leverages three different 3D models. In total, we generated 89,838 images, split into 60% training (53,906), 20% validation (17,965), and 20% test (17,967), containing 365,469, 121,951 and 122,142 separate object instances, respectively. The aim behind this dataset is to develop methods to be able to count surgical instruments and materials via computer vision to aid medical staff in ensuring no instrument is retained by a patient, leading to complications such as chronic pain and sepsis. Currently, this is done manually, with the World Health Organisation (WHO) proposing that manual counts should be completed by two members of staff (Biswas, 2012), typically counting instruments laid out on a surface, either before or after their use. This standard practice of logging the type and number of a given instrument or material to be used during an operation is not managerial overhead but crucial for the prevention of retained instruments, consumables, or materials during surgery, as these would negatively impact a patient's recovery time or even lead to the patient's death. Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T.S. and Wang, Y., 2017, October. Unrealcv: Virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on Multimedia (pp. 1221-1224) R. Biswas, S. Ganguly, M. Saha, S. Saha, S. Mukherjee, and A. Ayaz. Gossypiboma and Surgeon - Current Medicolegal Aspect – A Review. Indian Journal of Surgery, 74(4):318–322, 2012
ChokePoint Dataset
zenodo.org
data.niaid.nih.gov
txt, xz
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell; Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell (2020). ChokePoint Dataset [Dataset]. http://doi.org/10.5281/zenodo.815657
Explore at:
xz, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.815657
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell; Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The ChokePoint dataset is designed for experiments in person identification/verification under real-world surveillance conditions using existing technologies. An array of three cameras was placed above several portals (natural choke points in terms of pedestrian traffic) to capture subjects walking through each portal in a natural way. While a person is walking through a portal, a sequence of face images (ie. a face set) can be captured. Faces in such sets will have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localisation/detection. Due to the three camera configuration, one of the cameras is likely to capture a face set where a subset of the faces is near-frontal.

The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. The recording of portal 1 and portal 2 are one month apart. The dataset has frame rate of 30 fps and the image resolution is 800X600 pixels. In total, the dataset consists of 48 video sequences and 64,204 face images. In all sequences, only one subject is presented in the image at a time. The first 100 frames of each sequence are for background modelling where no foreground objects were presented.

Each sequence was named according to the recording conditions (eg. P2E_S1_C3) where P, S, and C stand for portal, sequence and camera, respectively. E and L indicate subjects either entering or leaving the portal. The numbers indicate the respective portal, sequence and camera label. For example, P2L_S1_C3 indicates that the recording was done in Portal 2, with people leaving the portal, and captured by camera 3 in the first recorded sequence.

To pose a more challenging real-world surveillance problems, two seqeunces (P2E_S5 and P2L_S5) were recorded with crowded scenario. In additional to the aforementioned variations, the sequences were presented with continuous occlusion. This phenomenon presents challenges in identidy tracking and face verification.

This dataset can be applied, but not limited, to the following research areas:

person re-identification

image set matching

face quality measurement

face clustering

3D face reconstruction

pedestrian/face tracking

background estimation and subtraction

Please cite the following paper if you use the ChokePoint dataset in your work (papers, articles, reports, books, software, etc):

Y. Wong, S. Chen, S. Mau, C. Sanderson, B.C. Lovell
Patch-based Probabilistic Image Quality Assessment for Face Selection and Improved Video-based Face Recognition
IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops, pages 81-88, 2011.
http://doi.org/10.1109/CVPRW.2011.5981881
h
ucf_crime
huggingface.co
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MyungHoonJin (2023). ucf_crime [Dataset]. https://huggingface.co/datasets/jinmang2/ucf_crime
Explore at:
Dataset updated
Jul 3, 2023
Authors
MyungHoonJin
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Real-world Anomaly Detection in Surveillance Videos

Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at video-level instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training. We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work.

Problem & Motivation

One critical task in video surveillance is detecting anomalous events such as traffic accidents, crimes or illegal activities. Generally, anomalous events rarely occur as compared to normal activities. Therefore, to alleviate the waste of labor and time, developing intelligent computer vision algorithms for automatic video anomaly detection is a pressing need. The goal of a practical anomaly detection system is to timely signal an activity that deviates normal patterns and identify the time window of the occurring anomaly. Therefore, anomaly detection can be considered as coarse level video understanding, which filters out anomalies from normal patterns. Once an anomaly is detected, it can further be categorized into one of the specific activities using classification techniques. In this work, we propose an anomaly detection algorithm using weakly labeled training videos. That is we only know the video-level labels, i.e. a video is normal or contains anomaly somewhere, but we do not know where. This is intriguing because we can easily annotate a large number of videos by only assigning video-level labels. To formulate a weakly-supervised learning approach, we resort to multiple instance learning. Specifically, we propose to learn anomaly through a deep MIL framework by treating normal and anomalous surveillance videos as bags and short segments/clips of each video as instances in a bag. Based on training videos, we automatically learn an anomaly ranking model that predicts high anomaly scores for anomalous segments in a video. During testing, a longuntrimmed video is divided into segments and fed into our deep network which assigns anomaly score for each video segment such that an anomaly can be detected.

Method

Our proposed approach (summarized in Figure 1) begins with dividing surveillance videos into a fixed number of segments during training. These segments make instances in a bag. Using both positive (anomalous) and negative (normal) bags, we train the anomaly detection model using the proposed deep MIL ranking loss. https://www.crcv.ucf.edu/projects/real-world/method.png

UCF-Crime Dataset

We construct a new large-scale dataset, called UCF-Crime, to evaluate our method. It consists of long untrimmed surveillance videos which cover 13 realworld anomalies, including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety. We compare our dataset with previous anomaly detection datasets in Table 1. For more details about the UCF-Crime dataset, please refer to our paper. A short description of each anomalous event is given below. Abuse: This event contains videos which show bad, cruel or violent behavior against children, old people, animals, and women. Burglary: This event contains videos that show people (thieves) entering into a building or house with the intention to commit theft. It does not include use of force against people. Robbery: This event contains videos showing thieves taking money unlawfully by force or threat of force. These videos do not include shootings. Stealing: This event contains videos showing people taking property or money without permission. They do not include shoplifting. Shooting: This event contains videos showing act of shooting someone with a gun. Shoplifting: This event contains videos showing people stealing goods from a shop while posing as a shopper. Assault: This event contains videos showing a sudden or violent physical attack on someone. Note that in these videos the person who is assaulted does not fight back. Fighting: This event contains videos displaying two are more people attacking one another. Arson: This event contains videos showing people deliberately setting fire to property. Explosion: This event contains videos showing destructive event of something blowing apart. This event does not include videos where a person intentionally sets a fire or sets off an explosion. Arrest: This event contains videos showing police arresting individuals. Road Accident: This event contains videos showing traffic accidents involving vehicles, pedestrians or cyclists. Vandalism: This event contains videos showing action involving deliberate destruction of or damage to public or private property. The term includes property damage, such as graffiti and defacement directed towards any property without permission of the owner. Normal Event: This event contains videos where no crime occurred. These videos include both indoor (such as a shopping mall) and outdoor scenes as well as day and night-time scenes. https://www.crcv.ucf.edu/projects/real-world/dataset_table.png https://www.crcv.ucf.edu/projects/real-world/method.png
AI corporate investment worldwide 2015-2022
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). AI corporate investment worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/941137/ai-investment-and-funding-worldwide/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2022, the global total corporate investment in artificial intelligence (AI) reached almost ** billion U.S. dollars, a slight decrease from the previous year. In 2018, the yearly investment in AI saw a slight downturn, but that was only temporary. Private investments account for a bulk of total AI corporate investment. AI investment has increased more than ******* since 2016, a staggering growth in any market. It is a testament to the importance of the development of AI around the world. What is Artificial Intelligence (AI)? Artificial intelligence, once the subject of people’s imaginations and the main plot of science fiction movies for decades, is no longer a piece of fiction, but rather commonplace in people’s daily lives whether they realize it or not. AI refers to the ability of a computer or machine to imitate the capacities of the human brain, which often learns from previous experiences to understand and respond to language, decisions, and problems. These AI capabilities, such as computer vision and conversational interfaces, have become embedded throughout various industries’ standard business processes. AI investment and startups The global AI market, valued at ***** billion U.S. dollars as of 2023, continues to grow driven by the influx of investments it receives. This is a rapidly growing market, looking to expand from billions to trillions of U.S. dollars in market size in the coming years. From 2020 to 2022, investment in startups globally, and in particular AI startups, increased by **** billion U.S. dollars, nearly double its previous investments, with much of it coming from private capital from U.S. companies. The most recent top-funded AI businesses are all machine learning and chatbot companies, focusing on human interface with machines.
The Three Hair Types
kaggle.com
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vyom bhatia (2020). The Three Hair Types [Dataset]. https://www.kaggle.com/vyombhatia/the-three-hair-types/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 17, 2020
Dataset provided by
Kaggle
Authors
vyom bhatia
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Okay so one random day I felt like making a web app cum image classifier and putting up in my Instagram bio for people to play with it. It classified hair and it helped me learn a lot on how training CNNs for real world application works.

Content

Below are about a thousand images that represent the three most common hair types in the world. Each hair type has 300+ images to it.

Acknowledgements

I scrapped all these images from google images using a chrome extension and sorted them out, image by image. I feel bad because I cannot give credit to the owners and Data Ethics is something I have to improve in as a person.

Inspiration

Fellow data practitioner, the question I put in front of you today is: In what creative ways can you play with this beginner's boring data?
Retinal Disease Classification
kaggle.com
Updated Aug 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Larxel (2021). Retinal Disease Classification [Dataset]. https://www.kaggle.com/andrewmvd/retinal-disease-classification/tasks?taskId=5762
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2021
Dataset provided by
Kaggle
Authors
Larxel
Description
About this dataset

According to the WHO, World report on vision 2019, the number of visually impaired people worldwide is estimated to be 2.2 billion, of whom at least 1 billion have a vision impairment that could have been prevented or is yet to be addressed. The world faces considerable challenges in terms of eye care, including inequalities in the coverage and quality of prevention, treatment, and rehabilitation services. Early detection and diagnosis of ocular pathologies would enable forestall of visual impairment.

For this purpose, we have created a new Retinal Fundus Multi-disease Image Dataset (RFMiD) consisting of a total of 3200 fundus images captured using three different fundus cameras with 46 conditions annotated through adjudicated consensus of two senior retinal experts.

How to use this dataset

Create model to Multi Disease Classification model

Create a model to classify between Healthy and Unhealthy retinas

Highlighted Notebooks

Your kernel can be featured here!

More datasets

Acknowledgements

If you use this dataset in your research, please credit the authors

Citation

Samiksha Pachade, Prasanna Porwal, Dhanshree Thulkar, Manesh Kokare, Girish Deshmukh, Vivek Sahasrabuddhe, Luca Giancardo, Gwenolé Quellec, and Fabrice Mériaudeau, 2021. Retinal Fundus Multi-Disease Image Dataset (RFMiD): A Dataset for Multi-Disease Detection Research. Data, 6(2), p.14. Available (Open Access): https://www.mdpi.com/2306-5729/6/2/14

License

License was not specified, yet a citation was requested whenever the data is used.

Splash banner

Icon by Eucalyp on FlatIcon.
MyAuto.ge-CarDetails
kaggle.com
Updated Apr 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilia Gogotchuri (2020). MyAuto.ge-CarDetails [Dataset]. https://www.kaggle.com/datasets/gogotchuri/myautogecardetails
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ilia Gogotchuri
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
MyAutoData

Actual dataset Location on Kaggle Contains data scrapped by MyAutoScrapper (Written in go)

Purpose

Since this kaggle dataset real car deals, placed by real humans with pictures, It can be used for real world Machine Learning(ML) or Machine Vision. Price predictions, image processing, machine vision etc.

Data structure

This dataset contains data.csv file, which has 100 000 car deal detail. Each row representing each deal. data.csv has 18 columns: - ID: Represents unique identifier for each entry, also for each id, there is a sub-folder in images respectively, which contains images for the given deal. ID is an integer starting from 0. - Manufacturer: A string identifying car manufacturer. - Model: A string identifying car model. - Year: An Integer for the car production year. - Category: A type of the vechile (Sedan, Cabriolet, etc.). - Mileage: An integer representing car mileage in kilometers. - FuelType: A Fuel type the car uses. - EngineVolume: A Floating point number, representing engine volume in litres. - DriveWheels: A String representing car drive wheels (i.e. Front, Rear, 4x4, etc.). - GearBox: A string to identify gear box of the transmission (Manual, Automatic, etc.) - Doors: A string representing car doors (4, 4/5, etc.) - Wheel: Steering wheel position (Left Wheel, Right Wheel) - Color: Color of the car body. - InteriorColor: Interior color. - VIN: VIN number of the vechile, represented as a string. - LeatherInterior: A boolean value, true if car has a leather interior. - Price: Price of the car in USD. If ommited, meants price was set as negotiable. - Clearance: A boolean value identifying, whether customs has been cleared of not.

Important Note! (Disclaimer)

None of the fields (Except ID) are guaranteed to be filled, or filled with correct information. Since, people sometimes don't enter correct information, or hide some information for reasons. But for most of the entries, most of the fields are supposed to be filled with correct information.
o
CarDA - Car door Assembly Activities Dataset
explore.openaire.eu
Updated Aug 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantinos Papoutsakis; Nikolaos Bakalos; Athena Zacharia; Maria Pateraki (2024). CarDA - Car door Assembly Activities Dataset [Dataset]. http://doi.org/10.5281/zenodo.13370888
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13370888
Dataset updated
Aug 25, 2024
Authors
Konstantinos Papoutsakis; Nikolaos Bakalos; Athena Zacharia; Maria Pateraki
Description
The proposed multi-modal dataset for car door assembly activities, noted as CarDA [1], comprises a set of time-synchronized multi-camera RGB-D videos andmotion capture data acquired during car door assembly activities performed by real-line workers in a real manufacturing environment. [1] Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, and Maria Pateraki. A vision-based framework for human behavior understanding in industrial assembly lines. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops - T-CAP 2024 Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications, 2024. CarDA subset Α It contains visual data in the form of .svo (RGB-D acquired using StereoLabs ZED 2 sensors), mp4 videos, .bvh files for 3D human pose data (ground truth), and annotation data (to be added in v2 of the dataset). CarDA subset B Contains visual data in the form of .svo (RGB-D acquired using StereoLabs ZED 2 sensors), mp4 videos, and annotation data. ws10 - svo - mp4 Three pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS10 of the assembly line. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity. ws20 - svo - mp4 Six pairs of RGB-D videos (svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS20 of the assembly line. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity. ws30 - svo - mp4 Three pairs of RGB-D videos (svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS30 of the assembly line. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3

ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.25383/city.14294597.v3

Dataset updated

May 31, 2023

Dataset provided by

City, University of London

Authors

Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.

Clear search

Close search

Google apps

Main menu

ORBIT: A real-world few-shot dataset for teachable object recognition...

SARD - Search And Rescue Dataset

🔎 SARD - Search And Rescue Dataset, Containing Only 1 Class (Human).

Dataset From: https://universe.roboflow.com/datasets-pdabr/sard-8xjhy

👀 Best approach to this dataset is YOLO (You Only Look Once) models, especially v5 or v8 for detection.

📰 Publication Date: 2025, 2 May

Classes: Human

Images: Train 4k, Test 0.5k, Valid 1.1k

Data from: NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research...

Face Mask Usage

Demo

Inspiration

Code and Reasearch Paper

About the Data

Contact

Crowd counting database

10,109 People - Face Images Dataset

Accident Detection Model Dataset

Accident-Detection-Model

Problem Statement

Accidents survey

Literature Survey

Research Gap

Proposed methodology

Model Set-up

Preparing Custom dataset

Challenges I ran into

I majorly ran into 3 problems while making this model

Traffic vehicles Object Detection 🚙🚍

NBF_StreetView

Dark Face Dataset

Description

Citation

ITOP Dataset

Open Poetry Vision Object Detection Dataset - 512x512

Overview

Use Cases

Using this Dataset

Version 5 of this dataset (classes_all_text-raw-images) has all classes remapped to be labeled as "text." This was accomplished by using Modify Classes as a preprocessing step.

Version 6 of this dataset (classes_all_text-augmented-FAST) has all classes remapped to be labeled as "text." and was trained with Roboflow's Fast Model.

Version 7 of this dataset (classes_all_text-augmented-ACCURATE) has all classes remapped to be labeled as "text." and was trained with Roboflow's Accurate Model.

About Roboflow

Synthetic Operating Room Table (SORT) Dataset

ChokePoint Dataset

ucf_crime

Real-world Anomaly Detection in Surveillance Videos

Problem & Motivation

Method

UCF-Crime Dataset

AI corporate investment worldwide 2015-2022

The Three Hair Types

Context

Content

Acknowledgements

Inspiration

Retinal Disease Classification

About this dataset

How to use this dataset

Highlighted Notebooks

Acknowledgements

Citation

License

Splash banner

MyAuto.ge-CarDetails

MyAutoData

Purpose

Data structure

Important Note! (Disclaimer)

CarDA - Car door Assembly Activities Dataset

ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision