Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NeSy4VRD
NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.
Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.
The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.
NeSy4VRD on Zenodo: the NeSy4VRD dataset package
This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.
The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.
Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.
The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.
Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.
All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.
NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code
The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of:
The NeSy4VRD infrastructure supporting extensibility consists of:
The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.
The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.
To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:
https://james.iamcebu.com/images/demo-face-mask.gif" alt="Facemask Usage Detection Demo">
As COVID-19 continues to spread across the world, leaders and individuals are finding ways to halt the spread of the virus. The World Health Organization (WHO) recommended in March 2020 to wear a face-covering to prevent people from inhaling out tiny droplets that may carry the virus. Properly mask wearing means the virus transmission can be lowered.
If you use this dataset, can you please cite this study. The GitHub code is also provided. Research Paper: https://www.jardcs.org/abstract.php?id=5699 Python Code: https://github.com/jamesnogra/ImproperMaskDetector
The data consists of four folders, fully_covered, not_covered, not_face, and partially_covered. The folder fully_covered contains faces of people that are wearing face mask properly/correctly according to WHO standards. Folder partially_covered contains face images that the face mask only covers the mouth but not the nose openings. The folder not_covered are face images of people not wearing face mask at all. The not_face folder, which can be optional in training your data, are images that are detected in OpenCV face detection library that are obviously not faces of people.
If you want the full paper, just email me at jamesnogra@gmail.com or you can visit the information website for this study at https://james.iamcebu.com/#face-mask-detection.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This dataset was collected for an assessment of a crowd counting alogorithm.
The dataset is a vision dataset taken from a QUT Campus and contains three challenging viewpoints, which are referred to as Camera A, Camera B and Camera C. The sequences contain reflections, shadows and difficult lighting fluctuations, which makes crowd counting difficult. Furthermore, Camera C is positioned at a particularly low camera angle, leading to stronger occlusion than is present in other datasets.
The QUT datasets are annotated at sparse intervals: every 100 frames for cameras B and C, and every 200 frames for camera A as this is a longer sequence. Testing is then performed by comparing the crowd size estimate to the ground truth at these sparse intervals, rather than at every frame. This closely resembles the intended real-world application of this technology, where an operator may periodically ‘query’ the system for a crowd count.
Due to the difficulty of the environmental conditions in these scenes, the first 400-500 frames of each sequence is set aside for learning the background model.
10,109 people - face images dataset includes people collected from many countries. Multiple photos of each person’s daily life are collected, and the gender, race, age, etc. of the person being collected are marked.This Dataset provides a rich resource for artificial intelligence applications. It has been validated by multiple AI companies and proves beneficial for achieving outstanding performance in real-world applications. Throughout the process of Dataset collection, storage, and usage, we have consistently adhered to Dataset protection and privacy regulations to ensure the preservation of user privacy and legal rights. All Dataset comply with regulations such as GDPR, CCPA, PIPL, and other applicable laws.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.
https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Our goal in this project is to pay attention to and solve the problem of traffic and driver safety at intersections, for this purpose we have developed a system based on artificial intelligence that analyzes the traffic flow by analyzing traffic patterns and predicting the volume of traffic in A simpler intersection. With the benefits that this system has, such as reducing the waiting time of city residents and drivers, reducing fuel consumption and reducing air pollution, as well as reducing noise and controlling and preventing accidents, you can make a big contribution to the society. In this project, we will use machine learning algorithms and traffic cameras installed at a crossroads to control traffic in developing countries. We can evaluate the effectiveness of this project in the form of a simulation or study it in the real world with the conclusion of reducing traffic, increasing security for people and reducing fuel consumption, which itself plays a role in reducing air pollution and improving human quality. In general, in this project, we seek to be able to use artificial intelligence and machine learning algorithms to control traffic in a crossroads and improve urban transportation to help people improve their quality of life by reducing congestion. Traffic and waiting times at intersections, this system can make commuting less stressful and more efficient, allowing people to spend more time on other activities they enjoy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is designed to simulate real-world challenges faced by autonomous vehicles, particularly when encountering degraded images through sensors in various street environments. It includes a diverse set of images from streetview and pedestrian walkways, subjected to various types of degradation such as noise, blur, and flare, mimicking the environmental conditions that can affect sensor performance in automotive applications. The dataset captures a variety of scenes, including vehicles, people, and street scenes, providing a comprehensive representation of potential challenges in visual processing.
Ideal for deep learning and computer vision tasks, this dataset offers a robust resource for training and evaluating models focused on enhancing the resilience and accuracy of autonomous vehicle systems in degraded visual environments. Tasks such as image restoration, denoising, deblurring, and flare correction are well-suited to this dataset, making it an essential tool for advancing computer vision solutions within the automotive and urban infrastructure sectors.
For more specific details on how this dataset was created, please feel free to reach out via email.
Github: https://github.com/IsmailQayyum/NBF_StreetView
License:
This work is licensed under https://creativecommons.org/licenses/by-nc-nd/4.0/?ref=chooser-v1">CC By 4.0
The Dark Face dataset provides 6,000 real-world low light images captured during the nighttime, at teaching buildings, streets, bridges, overpasses, parks, etc., all labeled with bounding boxes for of human face, as the main training and/or validation sets. We also provide 9,000 unlabeled low-light images collected from the same setting. Additionally, we provided a unique set of 789 paired low-light/normal-light images captured in controllable real lighting conditions (but unnecessarily containing faces), which can be used as parts of the training data at the participants' discretization. There will be a hold-out testing set of 4,000 low-light images, with human face bounding boxes annotated.
Credits: Spatial and Temporal Restoration, Understanding and Compression Team, Wangxuan institute of computer technology, Peking University.
@ARTICLE{poor_visibility_benchmark,
author={Yang, Wenhan and Yuan, Ye and Ren, Wenqi and Liu, Jiaying and Scheirer, Walter J. and Wang, Zhangyang and Zhang, and et al.},
journal={IEEE Transactions on Image Processing},
title={Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study},
year={2020},
volume={29},
number={},
pages={5737-5752},
doi={10.1109/TIP.2020.2981922}
}
@inproceedings{Chen2018Retinex,
title={Deep Retinex Decomposition for Low-Light Enhancement},
author={Chen Wei, Wenjing Wang, Wenhan Yang, Jiaying Liu},
booktitle={British Machine Vision Conference},
year={2018},
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
The ITOP dataset (Invariant Top View) contains 100K depth images from side and top views of a person in a scene. For each image, the location of 15 human body parts are labeled with 3-dimensional (x,y,z) coordinates, relative to the sensor's position. Read the full paper for more context [pdf].
Getting Started
Download then decompress the h5.gz file.
gunzip ITOP_side_test_depth_map.h5.gz
Using Python and h5py (pip install h5py or conda install h5py), we can load the contents:
import h5py
import numpy as np
f = h5py.File('ITOP_side_test_depth_map.h5', 'r')
data, ids = f.get('data'), f.get('id')
data, ids = np.asarray(data), np.asarray(ids)
print(data.shape, ids.shape)
# (10501, 240, 320) (10501,)
Note: For any of the *_images.h5.gz files, the underlying file is a tar file and not a h5 file. Please rename the file extension from h5.gz to tar.gz before opening. The following commands will work:
mv ITOP_side_test_images.h5.gz ITOP_side_test_images.tar.gz
tar xf ITOP_side_test_images.tar.gz
Metadata
File sizes for images, depth maps, point clouds, and labels refer to the uncompressed size.
+-------+--------+---------+---------+----------+------------+--------------+---------+
| View | Split | Frames | People | Images | Depth Map | Point Cloud | Labels |
+-------+--------+---------+---------+----------+------------+--------------+---------+
| Side | Train | 39,795 | 16 | 1.1 GiB | 5.7 GiB | 18 GiB | 2.9 GiB |
| Side | Test | 10,501 | 4 | 276 MiB | 1.6 GiB | 4.6 GiB | 771 MiB |
| Top | Train | 39,795 | 16 | 974 MiB | 5.7 GiB | 18 GiB | 2.9 GiB |
| Top | Test | 10,501 | 4 | 261 MiB | 1.6 GiB | 4.6 GiB | 771 MiB |
+-------+--------+---------+---------+----------+------------+--------------+---------+
Data Schema
Each file contains several HDF5 datasets at the root level. Dimensions, attributes, and data types are listed below. The key refers to the (HDF5) dataset name. Let \(n\) denote the number of images.
Transformation
To convert from point clouds to a \(240 \times 320\) image, the following transformations were used. Let \(x_{\textrm{img}}\) and \(y_{\textrm{img}}\) denote the \((x,y)\) coordinate in the image plane. Using the raw point cloud \((x,y,z)\) real world coordinates, we compute the depth map as follows: \(x_{\textrm{img}} = \frac{x}{Cz} + 160\) and \(y_{\textrm{img}} = -\frac{y}{Cz} + 120\) where \(C\approx 3.50×10^{−3} = 0.0035\) is the intrinsic camera calibration parameter. This results in the depth map: \((x_{\textrm{img}}, y_{\textrm{img}}, z)\).
Joint ID (Index) Mapping
joint_id_to_name = {
0: 'Head', 8: 'Torso',
1: 'Neck', 9: 'R Hip',
2: 'R Shoulder', 10: 'L Hip',
3: 'L Shoulder', 11: 'R Knee',
4: 'R Elbow', 12: 'L Knee',
5: 'L Elbow', 13: 'R Foot',
6: 'R Hand', 14: 'L Foot',
7: 'L Hand',
}
Depth Maps
Point Clouds
Labels
Citation
If you would like to cite our work, please use the following.
Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L. (2016). Towards Viewpoint Invariant 3D Human Pose Estimation. European Conference on Computer Vision. Amsterdam, Netherlands. Springer.
@inproceedings{haque2016viewpoint, title={Towards Viewpoint Invariant 3D Human Pose Estimation}, author={Haque, Albert and Peng, Boya and Luo, Zelun and Alahi, Alexandre and Yeung, Serena and Fei-Fei, Li}, booktitle = {European Conference on Computer Vision}, month = {October}, year = {2016} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Open Poetry Vision
dataset is a synthetic dataset created by Roboflow for OCR tasks.
It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.
Example Image:
https://i.imgur.com/sZT516a.png" alt="Example Image">
A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.
Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.
Use the fork
button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
*Note: Please download all files, place them into a single folder and then use 7-Zip to recombine split files back into the complete dataset. The Synthetic Operating Room Table (SORT) dataset is a large-scale computer vision focused on instance counting, segmentation and localisation of surgical instrument depictions placed on a table. The depictions contained are rendered using the Unreal game engine and annotated leveraging the UnrealCV plugin (Qui, 2017). SORT contains one container class, one material class (gauze) and six instrument classes namely, forceps, scalpels, pincettes (tweezers), syringes, periotomes, and scissors. Each class contains two different 3D representations equally likely to be present for a given instance, with the exception of the container class that leverages three different 3D models. In total, we generated 89,838 images, split into 60% training (53,906), 20% validation (17,965), and 20% test (17,967), containing 365,469, 121,951 and 122,142 separate object instances, respectively. The aim behind this dataset is to develop methods to be able to count surgical instruments and materials via computer vision to aid medical staff in ensuring no instrument is retained by a patient, leading to complications such as chronic pain and sepsis. Currently, this is done manually, with the World Health Organisation (WHO) proposing that manual counts should be completed by two members of staff (Biswas, 2012), typically counting instruments laid out on a surface, either before or after their use. This standard practice of logging the type and number of a given instrument or material to be used during an operation is not managerial overhead but crucial for the prevention of retained instruments, consumables, or materials during surgery, as these would negatively impact a patient's recovery time or even lead to the patient's death. Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T.S. and Wang, Y., 2017, October. Unrealcv: Virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on Multimedia (pp. 1221-1224) R. Biswas, S. Ganguly, M. Saha, S. Saha, S. Mukherjee, and A. Ayaz. Gossypiboma and Surgeon - Current Medicolegal Aspect – A Review. Indian Journal of Surgery, 74(4):318–322, 2012
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The ChokePoint dataset is designed for experiments in person identification/verification under real-world surveillance conditions using existing technologies. An array of three cameras was placed above several portals (natural choke points in terms of pedestrian traffic) to capture subjects walking through each portal in a natural way. While a person is walking through a portal, a sequence of face images (ie. a face set) can be captured. Faces in such sets will have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localisation/detection. Due to the three camera configuration, one of the cameras is likely to capture a face set where a subset of the faces is near-frontal.
The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. The recording of portal 1 and portal 2 are one month apart. The dataset has frame rate of 30 fps and the image resolution is 800X600 pixels. In total, the dataset consists of 48 video sequences and 64,204 face images. In all sequences, only one subject is presented in the image at a time. The first 100 frames of each sequence are for background modelling where no foreground objects were presented.
Each sequence was named according to the recording conditions (eg. P2E_S1_C3) where P, S, and C stand for portal, sequence and camera, respectively. E and L indicate subjects either entering or leaving the portal. The numbers indicate the respective portal, sequence and camera label. For example, P2L_S1_C3 indicates that the recording was done in Portal 2, with people leaving the portal, and captured by camera 3 in the first recorded sequence.
To pose a more challenging real-world surveillance problems, two seqeunces (P2E_S5 and P2L_S5) were recorded with crowded scenario. In additional to the aforementioned variations, the sequences were presented with continuous occlusion. This phenomenon presents challenges in identidy tracking and face verification.
This dataset can be applied, but not limited, to the following research areas:
Please cite the following paper if you use the ChokePoint dataset in your work (papers, articles, reports, books, software, etc):
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at video-level instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training. We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work.
One critical task in video surveillance is detecting anomalous events such as traffic accidents, crimes or illegal activities. Generally, anomalous events rarely occur as compared to normal activities. Therefore, to alleviate the waste of labor and time, developing intelligent computer vision algorithms for automatic video anomaly detection is a pressing need. The goal of a practical anomaly detection system is to timely signal an activity that deviates normal patterns and identify the time window of the occurring anomaly. Therefore, anomaly detection can be considered as coarse level video understanding, which filters out anomalies from normal patterns. Once an anomaly is detected, it can further be categorized into one of the specific activities using classification techniques. In this work, we propose an anomaly detection algorithm using weakly labeled training videos. That is we only know the video-level labels, i.e. a video is normal or contains anomaly somewhere, but we do not know where. This is intriguing because we can easily annotate a large number of videos by only assigning video-level labels. To formulate a weakly-supervised learning approach, we resort to multiple instance learning. Specifically, we propose to learn anomaly through a deep MIL framework by treating normal and anomalous surveillance videos as bags and short segments/clips of each video as instances in a bag. Based on training videos, we automatically learn an anomaly ranking model that predicts high anomaly scores for anomalous segments in a video. During testing, a longuntrimmed video is divided into segments and fed into our deep network which assigns anomaly score for each video segment such that an anomaly can be detected.
Our proposed approach (summarized in Figure 1) begins with dividing surveillance videos into a fixed number of segments during training. These segments make instances in a bag. Using both positive (anomalous) and negative (normal) bags, we train the anomaly detection model using the proposed deep MIL ranking loss. https://www.crcv.ucf.edu/projects/real-world/method.png
We construct a new large-scale dataset, called UCF-Crime, to evaluate our method. It consists of long untrimmed surveillance videos which cover 13 realworld anomalies, including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety. We compare our dataset with previous anomaly detection datasets in Table 1. For more details about the UCF-Crime dataset, please refer to our paper. A short description of each anomalous event is given below. Abuse: This event contains videos which show bad, cruel or violent behavior against children, old people, animals, and women. Burglary: This event contains videos that show people (thieves) entering into a building or house with the intention to commit theft. It does not include use of force against people. Robbery: This event contains videos showing thieves taking money unlawfully by force or threat of force. These videos do not include shootings. Stealing: This event contains videos showing people taking property or money without permission. They do not include shoplifting. Shooting: This event contains videos showing act of shooting someone with a gun. Shoplifting: This event contains videos showing people stealing goods from a shop while posing as a shopper. Assault: This event contains videos showing a sudden or violent physical attack on someone. Note that in these videos the person who is assaulted does not fight back. Fighting: This event contains videos displaying two are more people attacking one another. Arson: This event contains videos showing people deliberately setting fire to property. Explosion: This event contains videos showing destructive event of something blowing apart. This event does not include videos where a person intentionally sets a fire or sets off an explosion. Arrest: This event contains videos showing police arresting individuals. Road Accident: This event contains videos showing traffic accidents involving vehicles, pedestrians or cyclists. Vandalism: This event contains videos showing action involving deliberate destruction of or damage to public or private property. The term includes property damage, such as graffiti and defacement directed towards any property without permission of the owner. Normal Event: This event contains videos where no crime occurred. These videos include both indoor (such as a shopping mall) and outdoor scenes as well as day and night-time scenes. https://www.crcv.ucf.edu/projects/real-world/dataset_table.png https://www.crcv.ucf.edu/projects/real-world/method.png
In 2022, the global total corporate investment in artificial intelligence (AI) reached almost ** billion U.S. dollars, a slight decrease from the previous year. In 2018, the yearly investment in AI saw a slight downturn, but that was only temporary. Private investments account for a bulk of total AI corporate investment. AI investment has increased more than ******* since 2016, a staggering growth in any market. It is a testament to the importance of the development of AI around the world. What is Artificial Intelligence (AI)? Artificial intelligence, once the subject of people’s imaginations and the main plot of science fiction movies for decades, is no longer a piece of fiction, but rather commonplace in people’s daily lives whether they realize it or not. AI refers to the ability of a computer or machine to imitate the capacities of the human brain, which often learns from previous experiences to understand and respond to language, decisions, and problems. These AI capabilities, such as computer vision and conversational interfaces, have become embedded throughout various industries’ standard business processes. AI investment and startups The global AI market, valued at ***** billion U.S. dollars as of 2023, continues to grow driven by the influx of investments it receives. This is a rapidly growing market, looking to expand from billions to trillions of U.S. dollars in market size in the coming years. From 2020 to 2022, investment in startups globally, and in particular AI startups, increased by **** billion U.S. dollars, nearly double its previous investments, with much of it coming from private capital from U.S. companies. The most recent top-funded AI businesses are all machine learning and chatbot companies, focusing on human interface with machines.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Okay so one random day I felt like making a web app cum image classifier and putting up in my Instagram bio for people to play with it. It classified hair and it helped me learn a lot on how training CNNs for real world application works.
Below are about a thousand images that represent the three most common hair types in the world. Each hair type has 300+ images to it.
I scrapped all these images from google images using a chrome extension and sorted them out, image by image. I feel bad because I cannot give credit to the owners and Data Ethics is something I have to improve in as a person.
Fellow data practitioner, the question I put in front of you today is: In what creative ways can you play with this beginner's boring data?
According to the WHO, World report on vision 2019, the number of visually impaired people worldwide is estimated to be 2.2 billion, of whom at least 1 billion have a vision impairment that could have been prevented or is yet to be addressed. The world faces considerable challenges in terms of eye care, including inequalities in the coverage and quality of prevention, treatment, and rehabilitation services. Early detection and diagnosis of ocular pathologies would enable forestall of visual impairment.
For this purpose, we have created a new Retinal Fundus Multi-disease Image Dataset (RFMiD) consisting of a total of 3200 fundus images captured using three different fundus cameras with 46 conditions annotated through adjudicated consensus of two senior retinal experts.
- Create model to Multi Disease Classification model
- Create a model to classify between Healthy and Unhealthy retinas
- Your kernel can be featured here!
- More datasets
If you use this dataset in your research, please credit the authors
Citation
Samiksha Pachade, Prasanna Porwal, Dhanshree Thulkar, Manesh Kokare, Girish Deshmukh, Vivek Sahasrabuddhe, Luca Giancardo, Gwenolé Quellec, and Fabrice Mériaudeau, 2021. Retinal Fundus Multi-Disease Image Dataset (RFMiD): A Dataset for Multi-Disease Detection Research. Data, 6(2), p.14. Available (Open Access): https://www.mdpi.com/2306-5729/6/2/14
License
License was not specified, yet a citation was requested whenever the data is used.
Splash banner
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Actual dataset Location on Kaggle Contains data scrapped by MyAutoScrapper (Written in go)
Since this kaggle dataset real car deals, placed by real humans with pictures, It can be used for real world Machine Learning(ML) or Machine Vision. Price predictions, image processing, machine vision etc.
This dataset contains data.csv file, which has 100 000 car deal detail. Each row representing each deal. data.csv has 18 columns: - ID: Represents unique identifier for each entry, also for each id, there is a sub-folder in images respectively, which contains images for the given deal. ID is an integer starting from 0. - Manufacturer: A string identifying car manufacturer. - Model: A string identifying car model. - Year: An Integer for the car production year. - Category: A type of the vechile (Sedan, Cabriolet, etc.). - Mileage: An integer representing car mileage in kilometers. - FuelType: A Fuel type the car uses. - EngineVolume: A Floating point number, representing engine volume in litres. - DriveWheels: A String representing car drive wheels (i.e. Front, Rear, 4x4, etc.). - GearBox: A string to identify gear box of the transmission (Manual, Automatic, etc.) - Doors: A string representing car doors (4, 4/5, etc.) - Wheel: Steering wheel position (Left Wheel, Right Wheel) - Color: Color of the car body. - InteriorColor: Interior color. - VIN: VIN number of the vechile, represented as a string. - LeatherInterior: A boolean value, true if car has a leather interior. - Price: Price of the car in USD. If ommited, meants price was set as negotiable. - Clearance: A boolean value identifying, whether customs has been cleared of not.
None of the fields (Except ID) are guaranteed to be filled, or filled with correct information. Since, people sometimes don't enter correct information, or hide some information for reasons. But for most of the entries, most of the fields are supposed to be filled with correct information.
The proposed multi-modal dataset for car door assembly activities, noted as CarDA [1], comprises a set of time-synchronized multi-camera RGB-D videos andmotion capture data acquired during car door assembly activities performed by real-line workers in a real manufacturing environment. [1] Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, and Maria Pateraki. A vision-based framework for human behavior understanding in industrial assembly lines. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops - T-CAP 2024 Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications, 2024. CarDA subset Α It contains visual data in the form of .svo (RGB-D acquired using StereoLabs ZED 2 sensors), mp4 videos, .bvh files for 3D human pose data (ground truth), and annotation data (to be added in v2 of the dataset). CarDA subset B Contains visual data in the form of .svo (RGB-D acquired using StereoLabs ZED 2 sensors), mp4 videos, and annotation data. ws10 - svo - mp4 Three pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS10 of the assembly line. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity. ws20 - svo - mp4 Six pairs of RGB-D videos (svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS20 of the assembly line. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity. ws30 - svo - mp4 Three pairs of RGB-D videos (svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS30 of the assembly line. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.