29 datasets found
  1. f

    ORBIT: A real-world few-shot dataset for teachable object recognition...

    • city.figshare.com
    bin
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    City, University of London
    Authors
    Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.

  2. SARD - Search And Rescue Dataset

    • kaggle.com
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NikolasGegenava (2025). SARD - Search And Rescue Dataset [Dataset]. https://www.kaggle.com/datasets/nikolasgegenava/sard-search-and-rescue/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    NikolasGegenava
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🔎 SARD - Search And Rescue Dataset, Containing Only 1 Class (Human).

    Dataset From: https://universe.roboflow.com/datasets-pdabr/sard-8xjhy

    ⛑️ This dataset contains multi-modal (Image / Labels) information collected from real-world images, taken by drone. It is designed to support the development and evaluation of AI models for locating, identifying, and tracking individuals in distress during disaster or emergency scenarios. To develop the SARD dataset, the authors involved actors, who simulated exhausted and injured people and classic types of movement. The images were recorded by high resolution camera with DJI Phantom 4A Drone.

    👀 Best approach to this dataset is YOLO (You Only Look Once) models, especially v5 or v8 for detection.

    📰 Publication Date: 2025, 2 May

    Classes: Human

    Images: Train 4k, Test 0.5k, Valid 1.1k

  3. Data from: NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde (2023). NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection [Dataset]. http://doi.org/10.5281/zenodo.7931113
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Herron; David Herron; Ernesto Jimenez-Ruiz; Ernesto Jimenez-Ruiz; Giacomo Tarroni; Giacomo Tarroni; Tillman Weyde; Tillman Weyde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NeSy4VRD

    NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the computer vision field of AI and, within that field, to the application tasks of visual relationship detection (VRD) and scene graph generation.

    Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being multipurpose: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives.

    The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub.

    NeSy4VRD on Zenodo: the NeSy4VRD dataset package

    This entry on Zenodo hosts the NeSy4VRD dataset package, which includes the NeSy4VRD dataset and its companion NeSy4VRD ontology, an OWL ontology called VRD-World.

    The NeSy4VRD dataset consists of an image dataset with associated visual relationship annotations. The images of the NeSy4VRD dataset are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The NeSy4VRD dataset is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the NeSy4VRD dataset has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>.

    Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing.

    The NeSy4VRD ontology, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the NeSy4VRD dataset. It directly describes the domain of the NeSy4VRD dataset, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning.

    Use of the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the NeSy4VRD ontology and use the NeSy4VRD dataset by itself.

    All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub.

    NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code

    The NeSy4VRD research resource incorporates additional components that are companions to the NeSy4VRD dataset package here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of:

    • comprehensive open source Python-based infrastructure supporting the extensibility of the NeSy4VRD visual relationship annotations (and, thereby, the extensibility of the NeSy4VRD ontology, VRD-World, as well)
    • open source Python sample code showing how one can work with the NeSy4VRD visual relationship annotations in conjunction with the NeSy4VRD ontology, VRD-World, and RDF knowledge graphs.

    The NeSy4VRD infrastructure supporting extensibility consists of:

    • open source Python code for conducting deep and comprehensive analyses of the NeSy4VRD dataset (the VRD images and their associated NeSy4VRD visual relationship annotations)
    • an open source, custom-designed NeSy4VRD protocol for specifying visual relationship annotation customisation instructions declaratively, in text files
    • an open source, custom-designed NeSy4VRD workflow, implemented using Python scripts and modules, for applying small or large volumes of customisations or extensions to the NeSy4VRD visual relationship annotations in a configurable, managed, automated and repeatable process.

    The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the NeSy4VRD dataset in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned.

    The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the NeSy4VRD ontology, VRD-World, in conjunction with the NeSy4VRD dataset. These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained.

    To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between thing objects (that have well-defined shapes) and stuff objects (that are amorphous). Suppose a researcher wishes to have a greater number of stuff object classes with which to work. Water is such a stuff object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a Water class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following:

    • use the analysis functionality of the NeSy4VRD extensibility infrastructure to find images containing water (by, say, searching for images whose visual relationships refer to object classes such as 'boat', 'surfboard', 'sand', 'umbrella', etc.);
    • use free image analysis software (such as GIMP, at gimp.org) to get bounding boxes for instances of water in these images;
    • use the NeSy4VRD protocol to specify new visual relationships for these images that refer to the new 'water' objects (e.g. <'boat', 'on', 'water'>);
    • use the NeSy4VRD workflow to introduce the new object class 'water' and to apply the specified new visual relationships to the sets of annotations for the affected images;
    • introduce class Water to the class hierarchy of the VRD-World ontology (using, say, the free Protege ontology editor);
    • continue experimenting, now with the added benefit of the additional stuff object class 'water';
    • contribute the enriched set of NeSy4VRD visual relationship

  4. Face Mask Usage

    • kaggle.com
    Updated Feb 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Arnold Nogra (2022). Face Mask Usage [Dataset]. https://www.kaggle.com/jamesnogra/face-mask-usage/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 26, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    James Arnold Nogra
    Description

    Demo

    https://james.iamcebu.com/images/demo-face-mask.gif" alt="Facemask Usage Detection Demo">

    Inspiration

    As COVID-19 continues to spread across the world, leaders and individuals are finding ways to halt the spread of the virus. The World Health Organization (WHO) recommended in March 2020 to wear a face-covering to prevent people from inhaling out tiny droplets that may carry the virus. Properly mask wearing means the virus transmission can be lowered.

    Code and Reasearch Paper

    If you use this dataset, can you please cite this study. The GitHub code is also provided. Research Paper: https://www.jardcs.org/abstract.php?id=5699 Python Code: https://github.com/jamesnogra/ImproperMaskDetector

    About the Data

    The data consists of four folders, fully_covered, not_covered, not_face, and partially_covered. The folder fully_covered contains faces of people that are wearing face mask properly/correctly according to WHO standards. Folder partially_covered contains face images that the face mask only covers the mouth but not the nose openings. The folder not_covered are face images of people not wearing face mask at all. The not_face folder, which can be optional in training your data, are images that are detected in OpenCV face detection library that are obviously not faces of people.

    Contact

    If you want the full paper, just email me at jamesnogra@gmail.com or you can visit the information website for this study at https://james.iamcebu.com/#face-mask-detection.

  5. r

    Crowd counting database

    • researchdata.edu.au
    • researchdatafinder.qut.edu.au
    Updated 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QUT SAIVT: Speech, audio, image and video technologies research (2012). Crowd counting database [Dataset]. http://doi.org/10.4225/09/5858bfb708148
    Explore at:
    Dataset updated
    2012
    Dataset provided by
    Queensland University of Technology
    Authors
    QUT SAIVT: Speech, audio, image and video technologies research
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Area covered
    Description

    This dataset was collected for an assessment of a crowd counting alogorithm.

    The dataset is a vision dataset taken from a QUT Campus and contains three challenging viewpoints, which are referred to as Camera A, Camera B and Camera C. The sequences contain reflections, shadows and difficult lighting fluctuations, which makes crowd counting difficult. Furthermore, Camera C is positioned at a particularly low camera angle, leading to stronger occlusion than is present in other datasets.

    The QUT datasets are annotated at sparse intervals: every 100 frames for cameras B and C, and every 200 frames for camera A as this is a longer sequence. Testing is then performed by comparing the crowd size estimate to the ground truth at these sparse intervals, rather than at every frame. This closely resembles the intended real-world application of this technology, where an operator may periodically ‘query’ the system for a crowd count.

    Due to the difficulty of the environmental conditions in these scenes, the first 400-500 frames of each sequence is set aside for learning the background model.

  6. 10,109 People - Face Images Dataset

    • nexdata.ai
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 10,109 People - Face Images Dataset [Dataset]. https://www.nexdata.ai/datasets/1402?source=Github
    Explore at:
    Dataset updated
    Jun 14, 2024
    Dataset authored and provided by
    Nexdata
    Variables measured
    Data size, Data format, Data diversity, Age distribution, Race distribution, Gender distribution, Collecting environment
    Description

    10,109 people - face images dataset includes people collected from many countries. Multiple photos of each person’s daily life are collected, and the gender, race, age, etc. of the person being collected are marked.This Dataset provides a rich resource for artificial intelligence applications. It has been validated by multiple AI companies and proves beneficial for achieving outstanding performance in real-world applications. Throughout the process of Dataset collection, storage, and usage, we have consistently adhered to Dataset protection and privacy regulations to ensure the preservation of user privacy and legal rights. All Dataset comply with regulations such as GDPR, CCPA, PIPL, and other applicable laws.

  7. R

    Accident Detection Model Dataset

    • universe.roboflow.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2024
    Dataset authored and provided by
    Accident detection model
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Accident Bounding Boxes
    Description

    Accident-Detection-Model

    Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

    Problem Statement

    • Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.
    • According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.
    • The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

    Accidents survey

    https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

    Literature Survey

    • Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.
    • Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

    Research Gap

    • Lack of real-world data - We trained model for more then 3200 images.
    • Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.
    • Outdated Versions of previous works - We aer using Latest version of Yolo v8.

    Proposed methodology

    • We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.
    • This model after training with 25 iterations and is ready to detect an accident with a significant probability.

    Model Set-up

    Preparing Custom dataset

    • We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.
    • Then we annotated all of them individually on a tool called roboflow.
    • During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident
    • Then we divided the data set into train, val, test in the ratio of 8:1:1
    • At the final step we downloaded the dataset in yolov8 format.
      #### Using Google Collab
    • We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.
    • You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.
    • Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.
    • In Google collab, First of all we Changed runtime from TPU to GPU.
    • We cross checked it by running command ‘!nvidia-smi’
      #### Coding
    • First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’
    • Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’
    • Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’
    • Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’
    • After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’
    • Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’
    • The results are stored in the runs/detect/predict folder.
      Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

    Challenges I ran into

    I majorly ran into 3 problems while making this model

    • I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.
    • I was facing problem on cvat website because i was not sure what
  8. Traffic vehicles Object Detection 🚙🚍

    • kaggle.com
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasibullah Aman (2023). Traffic vehicles Object Detection 🚙🚍 [Dataset]. https://www.kaggle.com/datasets/hasibullahaman/objectdetectiondatasetcar/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hasibullah Aman
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Our goal in this project is to pay attention to and solve the problem of traffic and driver safety at intersections, for this purpose we have developed a system based on artificial intelligence that analyzes the traffic flow by analyzing traffic patterns and predicting the volume of traffic in A simpler intersection. With the benefits that this system has, such as reducing the waiting time of city residents and drivers, reducing fuel consumption and reducing air pollution, as well as reducing noise and controlling and preventing accidents, you can make a big contribution to the society. In this project, we will use machine learning algorithms and traffic cameras installed at a crossroads to control traffic in developing countries. We can evaluate the effectiveness of this project in the form of a simulation or study it in the real world with the conclusion of reducing traffic, increasing security for people and reducing fuel consumption, which itself plays a role in reducing air pollution and improving human quality. In general, in this project, we seek to be able to use artificial intelligence and machine learning algorithms to control traffic in a crossroads and improve urban transportation to help people improve their quality of life by reducing congestion. Traffic and waiting times at intersections, this system can make commuting less stressful and more efficient, allowing people to spend more time on other activities they enjoy.

  9. NBF_StreetView

    • zenodo.org
    zip
    Updated Dec 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ismail Qayyum; Ayan Badar; Ayan Rizwan; Ismail Qayyum; Ayan Badar; Ayan Rizwan (2024). NBF_StreetView [Dataset]. http://doi.org/10.5281/zenodo.14552482
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ismail Qayyum; Ayan Badar; Ayan Rizwan; Ismail Qayyum; Ayan Badar; Ayan Rizwan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 1, 2024
    Description

    This dataset is designed to simulate real-world challenges faced by autonomous vehicles, particularly when encountering degraded images through sensors in various street environments. It includes a diverse set of images from streetview and pedestrian walkways, subjected to various types of degradation such as noise, blur, and flare, mimicking the environmental conditions that can affect sensor performance in automotive applications. The dataset captures a variety of scenes, including vehicles, people, and street scenes, providing a comprehensive representation of potential challenges in visual processing.

    Ideal for deep learning and computer vision tasks, this dataset offers a robust resource for training and evaluating models focused on enhancing the resilience and accuracy of autonomous vehicle systems in degraded visual environments. Tasks such as image restoration, denoising, deblurring, and flare correction are well-suited to this dataset, making it an essential tool for advancing computer vision solutions within the automotive and urban infrastructure sectors.

    For more specific details on how this dataset was created, please feel free to reach out via email.

    Github: https://github.com/IsmailQayyum/NBF_StreetView

    License:
    This work is licensed under https://creativecommons.org/licenses/by-nc-nd/4.0/?ref=chooser-v1">CC By 4.0

  10. Dark Face Dataset

    • kaggle.com
    Updated May 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soumik Rakshit (2022). Dark Face Dataset [Dataset]. https://www.kaggle.com/datasets/soumikrakshit/dark-face-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Soumik Rakshit
    Description

    Description

    The Dark Face dataset provides 6,000 real-world low light images captured during the nighttime, at teaching buildings, streets, bridges, overpasses, parks, etc., all labeled with bounding boxes for of human face, as the main training and/or validation sets. We also provide 9,000 unlabeled low-light images collected from the same setting. Additionally, we provided a unique set of 789 paired low-light/normal-light images captured in controllable real lighting conditions (but unnecessarily containing faces), which can be used as parts of the training data at the participants' discretization. There will be a hold-out testing set of 4,000 low-light images, with human face bounding boxes annotated.

    Credits: Spatial and Temporal Restoration, Understanding and Compression Team, Wangxuan institute of computer technology, Peking University.

    Citation

    @ARTICLE{poor_visibility_benchmark,
     author={Yang, Wenhan and Yuan, Ye and Ren, Wenqi and Liu, Jiaying and Scheirer, Walter J. and Wang, Zhangyang and Zhang, and et al.},
     journal={IEEE Transactions on Image Processing}, 
     title={Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study}, 
     year={2020},
     volume={29},
     number={},
     pages={5737-5752},
     doi={10.1109/TIP.2020.2981922}
    }
    
    @inproceedings{Chen2018Retinex,
        title={Deep Retinex Decomposition for Low-Light Enhancement},
        author={Chen Wei, Wenjing Wang, Wenhan Yang, Jiaying Liu},
        booktitle={British Machine Vision Conference},
        year={2018},
    }
    
  11. ITOP Dataset

    • zenodo.org
    application/gzip +1
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Haque; Albert Haque; Boya Peng; Zelun Luo; Alexandre Alahi; Serena Yeung; Serena Yeung; Li Fei-Fei; Li Fei-Fei; Boya Peng; Zelun Luo; Alexandre Alahi (2024). ITOP Dataset [Dataset]. http://doi.org/10.5281/zenodo.3932973
    Explore at:
    application/gzip, jpegAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Albert Haque; Albert Haque; Boya Peng; Zelun Luo; Alexandre Alahi; Serena Yeung; Serena Yeung; Li Fei-Fei; Li Fei-Fei; Boya Peng; Zelun Luo; Alexandre Alahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    The ITOP dataset (Invariant Top View) contains 100K depth images from side and top views of a person in a scene. For each image, the location of 15 human body parts are labeled with 3-dimensional (x,y,z) coordinates, relative to the sensor's position. Read the full paper for more context [pdf].

    Getting Started

    Download then decompress the h5.gz file.

    gunzip ITOP_side_test_depth_map.h5.gz

    Using Python and h5py (pip install h5py or conda install h5py), we can load the contents:

    import h5py
    import numpy as np
    
    f = h5py.File('ITOP_side_test_depth_map.h5', 'r')
    data, ids = f.get('data'), f.get('id')
    data, ids = np.asarray(data), np.asarray(ids)
    
    print(data.shape, ids.shape)
    # (10501, 240, 320) (10501,)

    Note: For any of the *_images.h5.gz files, the underlying file is a tar file and not a h5 file. Please rename the file extension from h5.gz to tar.gz before opening. The following commands will work:

    mv ITOP_side_test_images.h5.gz ITOP_side_test_images.tar.gz
    tar xf ITOP_side_test_images.tar.gz

    Metadata

    File sizes for images, depth maps, point clouds, and labels refer to the uncompressed size.

    +-------+--------+---------+---------+----------+------------+--------------+---------+
    | View | Split | Frames | People | Images  | Depth Map | Point Cloud | Labels |
    +-------+--------+---------+---------+----------+------------+--------------+---------+
    | Side | Train | 39,795 |   16 | 1.1 GiB | 5.7 GiB  | 18 GiB    | 2.9 GiB |
    | Side | Test  | 10,501 |   4 | 276 MiB | 1.6 GiB  | 4.6 GiB   | 771 MiB |
    | Top  | Train | 39,795 |   16 | 974 MiB | 5.7 GiB  | 18 GiB    | 2.9 GiB |
    | Top  | Test  | 10,501 |   4 | 261 MiB | 1.6 GiB  | 4.6 GiB   | 771 MiB |
    +-------+--------+---------+---------+----------+------------+--------------+---------+

    Data Schema

    Each file contains several HDF5 datasets at the root level. Dimensions, attributes, and data types are listed below. The key refers to the (HDF5) dataset name. Let \(n\) denote the number of images.

    Transformation

    To convert from point clouds to a \(240 \times 320\) image, the following transformations were used. Let \(x_{\textrm{img}}\) and \(y_{\textrm{img}}\) denote the \((x,y)\) coordinate in the image plane. Using the raw point cloud \((x,y,z)\) real world coordinates, we compute the depth map as follows: \(x_{\textrm{img}} = \frac{x}{Cz} + 160\) and \(y_{\textrm{img}} = -\frac{y}{Cz} + 120\) where \(C\approx 3.50×10^{−3} = 0.0035\) is the intrinsic camera calibration parameter. This results in the depth map: \((x_{\textrm{img}}, y_{\textrm{img}}, z)\).

    Joint ID (Index) Mapping

    joint_id_to_name = {
     0: 'Head',    8: 'Torso',
     1: 'Neck',    9: 'R Hip',
     2: 'R Shoulder', 10: 'L Hip',
     3: 'L Shoulder', 11: 'R Knee',
     4: 'R Elbow',   12: 'L Knee',
     5: 'L Elbow',   13: 'R Foot',
     6: 'R Hand',   14: 'L Foot',
     7: 'L Hand',
    }

    Depth Maps

    • Key: id
      • Dimensions: \((n,)\)
      • Data Type: uint8
      • Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number.
    • Key: data
      • Dimensions: \((n,240,320)\)
      • Data Type: float16
      • Description: Depth map (i.e. mesh) corresponding to a single frame. Depth values are in real world meters (m).

    Point Clouds

    • Key: id
      • Dimensions: \((n,)\)
      • Data Type: uint8
      • Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number.
    • Key: data
      • Dimensions: \((n,76800,3)\)
      • Data Type: float16
      • Description: Point cloud containing 76,800 points (240x320). Each point is represented by a 3D tuple measured in real world meters (m).

    Labels

    • Key: id
      • Dimensions: \((n,)\)
      • Data Type: uint8
      • Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number.
    • Key: is_valid
      • Dimensions: \((n,)\)
      • Data Type: uint8
      • Description: Flag corresponding to the result of the human labeling effort. This is a boolean value (represented by an integer) where a one (1) denotes clean, human-approved data. A zero (0) denotes noisy human body part labels. If is_valid is equal to zero, you should not use any of the provided human joint locations for the particular frame.
    • Key: visible_joints
      • Dimensions: \((n,15)\)
      • Data Type: int16
      • Description: Binary mask indicating if each human joint is visible or occluded. This is denoted by \(\alpha\) in the paper. If \(\alpha_j=1\) then the \(j^{th}\) joint is visible (i.e. not occluded). Otherwise, if \(\alpha_j = 0\) then the \(j^{th}\) joint is occluded.
    • Key: image_coordinates
      • Dimensions: \((n,15,2)\)
      • Data Type: int16
      • Description: Two-dimensional \((x,y)\) points corresponding to the location of each joint in the depth image or depth map.
    • Key: real_world_coordinates
      • Dimensions: \((n,15,3)\)
      • Data Type: float16
      • Description: Three-dimensional \((x,y,z)\) points corresponding to the location of each joint in real world meters (m).
    • Key: segmentation
      • Dimensions: \((n,240,320)\)
      • Data Type: int8
      • Description: Pixel-wise assignment of body part labels. The background class (i.e. no body part) is denoted by −1.

    Citation

    If you would like to cite our work, please use the following.

    Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L. (2016). Towards Viewpoint Invariant 3D Human Pose Estimation. European Conference on Computer Vision. Amsterdam, Netherlands. Springer.

    @inproceedings{haque2016viewpoint,
      title={Towards Viewpoint Invariant 3D Human Pose Estimation},
      author={Haque, Albert and Peng, Boya and Luo, Zelun and Alahi, Alexandre and Yeung, Serena and Fei-Fei, Li},
      booktitle = {European Conference on Computer Vision},
      month = {October},
      year = {2016}
    }
  12. R

    Open Poetry Vision Object Detection Dataset - 512x512

    • public.roboflow.com
    zip
    Updated Apr 7, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Dwyer (2022). Open Poetry Vision Object Detection Dataset - 512x512 [Dataset]. https://public.roboflow.com/object-detection/open-poetry-vision/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2022
    Dataset authored and provided by
    Brad Dwyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of text
    Description

    Overview

    The Open Poetry Vision dataset is a synthetic dataset created by Roboflow for OCR tasks.

    It combines a random image from the Open Images Dataset with text primarily sampled from Gwern's GPT-2 Poetry project. Each image in the dataset contains between 1 and 5 strings in a variety of fonts and colors randomly positioned in the 512x512 canvas. The classes correspond to the font of the text.

    Example Image: https://i.imgur.com/sZT516a.png" alt="Example Image">

    Use Cases

    A common OCR workflow is to use a neural network to isolate text for input into traditional optical character recognition software. This dataset could make a good starting point for an OCR project like business card parsing or automated paper form-processing.

    Alternatively, you could try your hand using this as a neural font identification dataset. Nvidia, amongst others, have had success with this task.

    Using this Dataset

    Use the fork button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.

    Version 5 of this dataset (classes_all_text-raw-images) has all classes remapped to be labeled as "text." This was accomplished by using Modify Classes as a preprocessing step.

    Version 6 of this dataset (classes_all_text-augmented-FAST) has all classes remapped to be labeled as "text." and was trained with Roboflow's Fast Model.

    Version 7 of this dataset (classes_all_text-augmented-ACCURATE) has all classes remapped to be labeled as "text." and was trained with Roboflow's Accurate Model.

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

    Roboflow Workmark

  13. d

    Synthetic Operating Room Table (SORT) Dataset

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ireland, James; Radwan, Ibrahim; Herath, Damith; Goecke, Roland (2024). Synthetic Operating Room Table (SORT) Dataset [Dataset]. http://doi.org/10.7910/DVN/UCG5CW
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Ireland, James; Radwan, Ibrahim; Herath, Damith; Goecke, Roland
    Description

    *Note: Please download all files, place them into a single folder and then use 7-Zip to recombine split files back into the complete dataset. The Synthetic Operating Room Table (SORT) dataset is a large-scale computer vision focused on instance counting, segmentation and localisation of surgical instrument depictions placed on a table. The depictions contained are rendered using the Unreal game engine and annotated leveraging the UnrealCV plugin (Qui, 2017). SORT contains one container class, one material class (gauze) and six instrument classes namely, forceps, scalpels, pincettes (tweezers), syringes, periotomes, and scissors. Each class contains two different 3D representations equally likely to be present for a given instance, with the exception of the container class that leverages three different 3D models. In total, we generated 89,838 images, split into 60% training (53,906), 20% validation (17,965), and 20% test (17,967), containing 365,469, 121,951 and 122,142 separate object instances, respectively. The aim behind this dataset is to develop methods to be able to count surgical instruments and materials via computer vision to aid medical staff in ensuring no instrument is retained by a patient, leading to complications such as chronic pain and sepsis. Currently, this is done manually, with the World Health Organisation (WHO) proposing that manual counts should be completed by two members of staff (Biswas, 2012), typically counting instruments laid out on a surface, either before or after their use. This standard practice of logging the type and number of a given instrument or material to be used during an operation is not managerial overhead but crucial for the prevention of retained instruments, consumables, or materials during surgery, as these would negatively impact a patient's recovery time or even lead to the patient's death. Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim, T.S. and Wang, Y., 2017, October. Unrealcv: Virtual worlds for computer vision. In Proceedings of the 25th ACM international conference on Multimedia (pp. 1221-1224) R. Biswas, S. Ganguly, M. Saha, S. Saha, S. Mukherjee, and A. Ayaz. Gossypiboma and Surgeon - Current Medicolegal Aspect – A Review. Indian Journal of Surgery, 74(4):318–322, 2012

  14. ChokePoint Dataset

    • zenodo.org
    • data.niaid.nih.gov
    txt, xz
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell; Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell (2020). ChokePoint Dataset [Dataset]. http://doi.org/10.5281/zenodo.815657
    Explore at:
    xz, txtAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell; Yongkang Wong; Shaokang Chen; Sandra Mau; Conrad Sanderson; Brian Lovell
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The ChokePoint dataset is designed for experiments in person identification/verification under real-world surveillance conditions using existing technologies. An array of three cameras was placed above several portals (natural choke points in terms of pedestrian traffic) to capture subjects walking through each portal in a natural way. While a person is walking through a portal, a sequence of face images (ie. a face set) can be captured. Faces in such sets will have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localisation/detection. Due to the three camera configuration, one of the cameras is likely to capture a face set where a subset of the faces is near-frontal.

    The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. The recording of portal 1 and portal 2 are one month apart. The dataset has frame rate of 30 fps and the image resolution is 800X600 pixels. In total, the dataset consists of 48 video sequences and 64,204 face images. In all sequences, only one subject is presented in the image at a time. The first 100 frames of each sequence are for background modelling where no foreground objects were presented.

    Each sequence was named according to the recording conditions (eg. P2E_S1_C3) where P, S, and C stand for portal, sequence and camera, respectively. E and L indicate subjects either entering or leaving the portal. The numbers indicate the respective portal, sequence and camera label. For example, P2L_S1_C3 indicates that the recording was done in Portal 2, with people leaving the portal, and captured by camera 3 in the first recorded sequence.

    To pose a more challenging real-world surveillance problems, two seqeunces (P2E_S5 and P2L_S5) were recorded with crowded scenario. In additional to the aforementioned variations, the sequences were presented with continuous occlusion. This phenomenon presents challenges in identidy tracking and face verification.

    This dataset can be applied, but not limited, to the following research areas:

    • person re-identification
    • image set matching
    • face quality measurement
    • face clustering
    • 3D face reconstruction
    • pedestrian/face tracking
    • background estimation and subtraction

    Please cite the following paper if you use the ChokePoint dataset in your work (papers, articles, reports, books, software, etc):

    • Y. Wong, S. Chen, S. Mau, C. Sanderson, B.C. Lovell
      Patch-based Probabilistic Image Quality Assessment for Face Selection and Improved Video-based Face Recognition
      IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops, pages 81-88, 2011.
      http://doi.org/10.1109/CVPRW.2011.5981881

  15. h

    ucf_crime

    • huggingface.co
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MyungHoonJin (2023). ucf_crime [Dataset]. https://huggingface.co/datasets/jinmang2/ucf_crime
    Explore at:
    Dataset updated
    Jul 3, 2023
    Authors
    MyungHoonJin
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Real-world Anomaly Detection in Surveillance Videos

    Surveillance videos are able to capture a variety of realistic anomalies. In this paper, we propose to learn anomalies by exploiting both normal and anomalous videos. To avoid annotating the anomalous segments or clips in training videos, which is very time consuming, we propose to learn anomaly through the deep multiple instance ranking framework by leveraging weakly labeled training videos, i.e. the training labels (anomalous or normal) are at video-level instead of clip-level. In our approach, we consider normal and anomalous videos as bags and video segments as instances in multiple instance learning (MIL), and automatically learn a deep anomaly ranking model that predicts high anomaly scores for anomalous video segments. Furthermore, we introduce sparsity and temporal smoothness constraints in the ranking loss function to better localize anomaly during training. We also introduce a new large-scale first of its kind dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities. This dataset can be used for two tasks. First, general anomaly detection considering all anomalies in one group and all normal activities in another group. Second, for recognizing each of 13 anomalous activities. Our experimental results show that our MIL method for anomaly detection achieves significant improvement on anomaly detection performance as compared to the state-of-the-art approaches. We provide the results of several recent deep learning baselines on anomalous activity recognition. The low recognition performance of these baselines reveals that our dataset is very challenging and opens more opportunities for future work.

    Problem & Motivation

    One critical task in video surveillance is detecting anomalous events such as traffic accidents, crimes or illegal activities. Generally, anomalous events rarely occur as compared to normal activities. Therefore, to alleviate the waste of labor and time, developing intelligent computer vision algorithms for automatic video anomaly detection is a pressing need. The goal of a practical anomaly detection system is to timely signal an activity that deviates normal patterns and identify the time window of the occurring anomaly. Therefore, anomaly detection can be considered as coarse level video understanding, which filters out anomalies from normal patterns. Once an anomaly is detected, it can further be categorized into one of the specific activities using classification techniques. In this work, we propose an anomaly detection algorithm using weakly labeled training videos. That is we only know the video-level labels, i.e. a video is normal or contains anomaly somewhere, but we do not know where. This is intriguing because we can easily annotate a large number of videos by only assigning video-level labels. To formulate a weakly-supervised learning approach, we resort to multiple instance learning. Specifically, we propose to learn anomaly through a deep MIL framework by treating normal and anomalous surveillance videos as bags and short segments/clips of each video as instances in a bag. Based on training videos, we automatically learn an anomaly ranking model that predicts high anomaly scores for anomalous segments in a video. During testing, a longuntrimmed video is divided into segments and fed into our deep network which assigns anomaly score for each video segment such that an anomaly can be detected.

    Method

    Our proposed approach (summarized in Figure 1) begins with dividing surveillance videos into a fixed number of segments during training. These segments make instances in a bag. Using both positive (anomalous) and negative (normal) bags, we train the anomaly detection model using the proposed deep MIL ranking loss. https://www.crcv.ucf.edu/projects/real-world/method.png

    UCF-Crime Dataset

    We construct a new large-scale dataset, called UCF-Crime, to evaluate our method. It consists of long untrimmed surveillance videos which cover 13 realworld anomalies, including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety. We compare our dataset with previous anomaly detection datasets in Table 1. For more details about the UCF-Crime dataset, please refer to our paper. A short description of each anomalous event is given below. Abuse: This event contains videos which show bad, cruel or violent behavior against children, old people, animals, and women. Burglary: This event contains videos that show people (thieves) entering into a building or house with the intention to commit theft. It does not include use of force against people. Robbery: This event contains videos showing thieves taking money unlawfully by force or threat of force. These videos do not include shootings. Stealing: This event contains videos showing people taking property or money without permission. They do not include shoplifting. Shooting: This event contains videos showing act of shooting someone with a gun. Shoplifting: This event contains videos showing people stealing goods from a shop while posing as a shopper. Assault: This event contains videos showing a sudden or violent physical attack on someone. Note that in these videos the person who is assaulted does not fight back. Fighting: This event contains videos displaying two are more people attacking one another. Arson: This event contains videos showing people deliberately setting fire to property. Explosion: This event contains videos showing destructive event of something blowing apart. This event does not include videos where a person intentionally sets a fire or sets off an explosion. Arrest: This event contains videos showing police arresting individuals. Road Accident: This event contains videos showing traffic accidents involving vehicles, pedestrians or cyclists. Vandalism: This event contains videos showing action involving deliberate destruction of or damage to public or private property. The term includes property damage, such as graffiti and defacement directed towards any property without permission of the owner. Normal Event: This event contains videos where no crime occurred. These videos include both indoor (such as a shopping mall) and outdoor scenes as well as day and night-time scenes. https://www.crcv.ucf.edu/projects/real-world/dataset_table.png https://www.crcv.ucf.edu/projects/real-world/method.png

  16. AI corporate investment worldwide 2015-2022

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). AI corporate investment worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/941137/ai-investment-and-funding-worldwide/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2022, the global total corporate investment in artificial intelligence (AI) reached almost ** billion U.S. dollars, a slight decrease from the previous year. In 2018, the yearly investment in AI saw a slight downturn, but that was only temporary. Private investments account for a bulk of total AI corporate investment. AI investment has increased more than ******* since 2016, a staggering growth in any market. It is a testament to the importance of the development of AI around the world. What is Artificial Intelligence (AI)? Artificial intelligence, once the subject of people’s imaginations and the main plot of science fiction movies for decades, is no longer a piece of fiction, but rather commonplace in people’s daily lives whether they realize it or not. AI refers to the ability of a computer or machine to imitate the capacities of the human brain, which often learns from previous experiences to understand and respond to language, decisions, and problems. These AI capabilities, such as computer vision and conversational interfaces, have become embedded throughout various industries’ standard business processes. AI investment and startups The global AI market, valued at ***** billion U.S. dollars as of 2023, continues to grow driven by the influx of investments it receives. This is a rapidly growing market, looking to expand from billions to trillions of U.S. dollars in market size in the coming years. From 2020 to 2022, investment in startups globally, and in particular AI startups, increased by **** billion U.S. dollars, nearly double its previous investments, with much of it coming from private capital from U.S. companies. The most recent top-funded AI businesses are all machine learning and chatbot companies, focusing on human interface with machines.

  17. The Three Hair Types

    • kaggle.com
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vyom bhatia (2020). The Three Hair Types [Dataset]. https://www.kaggle.com/vyombhatia/the-three-hair-types/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 17, 2020
    Dataset provided by
    Kaggle
    Authors
    vyom bhatia
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Okay so one random day I felt like making a web app cum image classifier and putting up in my Instagram bio for people to play with it. It classified hair and it helped me learn a lot on how training CNNs for real world application works.

    Content

    Below are about a thousand images that represent the three most common hair types in the world. Each hair type has 300+ images to it.

    Acknowledgements

    I scrapped all these images from google images using a chrome extension and sorted them out, image by image. I feel bad because I cannot give credit to the owners and Data Ethics is something I have to improve in as a person.

    Inspiration

    Fellow data practitioner, the question I put in front of you today is: In what creative ways can you play with this beginner's boring data?

  18. Retinal Disease Classification

    • kaggle.com
    Updated Aug 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Larxel (2021). Retinal Disease Classification [Dataset]. https://www.kaggle.com/andrewmvd/retinal-disease-classification/tasks?taskId=5762
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2021
    Dataset provided by
    Kaggle
    Authors
    Larxel
    Description

    About this dataset

    According to the WHO, World report on vision 2019, the number of visually impaired people worldwide is estimated to be 2.2 billion, of whom at least 1 billion have a vision impairment that could have been prevented or is yet to be addressed. The world faces considerable challenges in terms of eye care, including inequalities in the coverage and quality of prevention, treatment, and rehabilitation services. Early detection and diagnosis of ocular pathologies would enable forestall of visual impairment.

    For this purpose, we have created a new Retinal Fundus Multi-disease Image Dataset (RFMiD) consisting of a total of 3200 fundus images captured using three different fundus cameras with 46 conditions annotated through adjudicated consensus of two senior retinal experts.

    How to use this dataset

    • Create model to Multi Disease Classification model
    • Create a model to classify between Healthy and Unhealthy retinas

    Highlighted Notebooks

    Acknowledgements

    If you use this dataset in your research, please credit the authors

    Citation

    Samiksha Pachade, Prasanna Porwal, Dhanshree Thulkar, Manesh Kokare, Girish Deshmukh, Vivek Sahasrabuddhe, Luca Giancardo, Gwenolé Quellec, and Fabrice Mériaudeau, 2021. Retinal Fundus Multi-Disease Image Dataset (RFMiD): A Dataset for Multi-Disease Detection Research. Data, 6(2), p.14. Available (Open Access): https://www.mdpi.com/2306-5729/6/2/14

    License

    License was not specified, yet a citation was requested whenever the data is used.

    Splash banner

    Icon by Eucalyp on FlatIcon.

  19. MyAuto.ge-CarDetails

    • kaggle.com
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilia Gogotchuri (2020). MyAuto.ge-CarDetails [Dataset]. https://www.kaggle.com/datasets/gogotchuri/myautogecardetails
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ilia Gogotchuri
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    MyAutoData

    Actual dataset Location on Kaggle Contains data scrapped by MyAutoScrapper (Written in go)

    Purpose

    Since this kaggle dataset real car deals, placed by real humans with pictures, It can be used for real world Machine Learning(ML) or Machine Vision. Price predictions, image processing, machine vision etc.

    Data structure

    This dataset contains data.csv file, which has 100 000 car deal detail. Each row representing each deal. data.csv has 18 columns: - ID: Represents unique identifier for each entry, also for each id, there is a sub-folder in images respectively, which contains images for the given deal. ID is an integer starting from 0. - Manufacturer: A string identifying car manufacturer. - Model: A string identifying car model. - Year: An Integer for the car production year. - Category: A type of the vechile (Sedan, Cabriolet, etc.). - Mileage: An integer representing car mileage in kilometers. - FuelType: A Fuel type the car uses. - EngineVolume: A Floating point number, representing engine volume in litres. - DriveWheels: A String representing car drive wheels (i.e. Front, Rear, 4x4, etc.). - GearBox: A string to identify gear box of the transmission (Manual, Automatic, etc.) - Doors: A string representing car doors (4, 4/5, etc.) - Wheel: Steering wheel position (Left Wheel, Right Wheel) - Color: Color of the car body. - InteriorColor: Interior color. - VIN: VIN number of the vechile, represented as a string. - LeatherInterior: A boolean value, true if car has a leather interior. - Price: Price of the car in USD. If ommited, meants price was set as negotiable. - Clearance: A boolean value identifying, whether customs has been cleared of not.

    Important Note! (Disclaimer)

    None of the fields (Except ID) are guaranteed to be filled, or filled with correct information. Since, people sometimes don't enter correct information, or hide some information for reasons. But for most of the entries, most of the fields are supposed to be filled with correct information.

  20. o

    CarDA - Car door Assembly Activities Dataset

    • explore.openaire.eu
    Updated Aug 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantinos Papoutsakis; Nikolaos Bakalos; Athena Zacharia; Maria Pateraki (2024). CarDA - Car door Assembly Activities Dataset [Dataset]. http://doi.org/10.5281/zenodo.13370888
    Explore at:
    Dataset updated
    Aug 25, 2024
    Authors
    Konstantinos Papoutsakis; Nikolaos Bakalos; Athena Zacharia; Maria Pateraki
    Description

    The proposed multi-modal dataset for car door assembly activities, noted as CarDA [1], comprises a set of time-synchronized multi-camera RGB-D videos andmotion capture data acquired during car door assembly activities performed by real-line workers in a real manufacturing environment. [1] Konstantinos Papoutsakis, Nikolaos Bakalos, Konstantinos Fragkoulis, Athena Zacharia, Georgia Kapetadimitri, and Maria Pateraki. A vision-based framework for human behavior understanding in industrial assembly lines. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops - T-CAP 2024 Towards a Complete Analysis of People: Fine-grained Understanding for Real-World Applications, 2024. CarDA subset Α It contains visual data in the form of .svo (RGB-D acquired using StereoLabs ZED 2 sensors), mp4 videos, .bvh files for 3D human pose data (ground truth), and annotation data (to be added in v2 of the dataset). CarDA subset B Contains visual data in the form of .svo (RGB-D acquired using StereoLabs ZED 2 sensors), mp4 videos, and annotation data. ws10 - svo - mp4 Three pairs of RGB-D videos (.svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS10 of the assembly line. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity. ws20 - svo - mp4 Six pairs of RGB-D videos (svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS20 of the assembly line. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity. ws30 - svo - mp4 Three pairs of RGB-D videos (svo) acquired by two StereoLabs ZED 2 different stereo cameras placed in the real workplace are provided. Each pair of RGB-D videos demonstrates a complete car door task cycle for workstation WS30 of the assembly line. MP4 videos are also available. Extracted using the left camera of the stereo pair of each camera. Annotation data for the task cycles are provided in the xls file related to the temporal segmentation and semantics of the assembly activities performed and the duration any of the supported EAWS-based postures occurred during an assembly activity.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3

ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision

Explore at:
binAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.

Search
Clear search
Close search
Google apps
Main menu