100+ datasets found

f
ORBIT: A real-world few-shot dataset for teachable object recognition...
city.figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25383/city.14294597.v3
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
R
Person Counter Dataset
universe.roboflow.com
zip
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tkbees (2023). Person Counter Dataset [Dataset]. https://universe.roboflow.com/tkbees-ogrtd/person-counter-tq0wf/model/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 15, 2023
Dataset authored and provided by
Tkbees
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Bounding Boxes
Description
Here are a few use cases for this project:

Retail Analytics: Store owners can use the model to track the number of customers visiting their stores during different times of the day or seasons, which can help in workforce and resource allocation.

Crowd Management: Event organizers or public authorities can utilize the model to monitor crowd sizes at concerts, festivals, public gatherings or protests, aiding in security and emergency planning.

Smart Transportation: The model can be integrated into public transit systems to count the number of passengers in buses or trains, providing real-time occupancy information and assisting in transportation planning.

Health and Safety Compliance: During times of pandemics or emergencies, the model can be used to count the number of people in a location, ensuring compliance with restrictions on gathering sizes.

Building Security: The model can be adopted in security systems to track how many people enter and leave a building or a particular area, providing useful data for access control.
E
The Human Know-How Dataset
dtechtive.com
find.data.gov.scot
pdf, zip
Updated Apr 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). The Human Know-How Dataset [Dataset]. http://doi.org/10.7488/ds/1394
Explore at:
pdf(0.0582 MB), zip(19.67 MB), zip(0.0298 MB), zip(9.433 MB), zip(13.06 MB), zip(0.2837 MB), zip(5.372 MB), zip(69.8 MB), zip(20.43 MB), zip(5.769 MB), zip(14.86 MB), zip(19.78 MB), zip(43.28 MB), zip(62.92 MB), zip(92.88 MB), zip(90.08 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/1394
Dataset updated
Apr 29, 2016
Description
The Human Know-How Dataset describes 211,696 human activities from many different domains. These activities are decomposed into 2,609,236 entities (each with an English textual label). These entities represent over two million actions and half a million pre-requisites. Actions are interconnected both according to their dependencies (temporal/logical orders between actions) and decompositions (decomposition of complex actions into simpler ones). This dataset has been integrated with DBpedia (259,568 links). For more information see: - The project website: http://homepages.inf.ed.ac.uk/s1054760/prohow/index.htm - The data is also available on datahub: https://datahub.io/dataset/human-activities-and-instructions ---------------------------------------------------------------- * Quickstart: if you want to experiment with the most high-quality data before downloading all the datasets, download the file '9of11_knowhow_wikihow', and optionally files 'Process - Inputs', 'Process - Outputs', 'Process - Step Links' and 'wikiHow categories hierarchy'. * Data representation based on the PROHOW vocabulary: http://w3id.org/prohow# Data extracted from existing web resources is linked to the original resources using the Open Annotation specification * Data Model: an example of how the data is represented within the datasets is available in the attached Data Model PDF file. The attached example represents a simple set of instructions, but instructions in the dataset can have more complex structures. For example, instructions could have multiple methods, steps could have further sub-steps, and complex requirements could be decomposed into sub-requirements. ---------------------------------------------------------------- Statistics: * 211,696: number of instructions. From wikiHow: 167,232 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 44,464 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 2,609,236: number of RDF nodes within the instructions From wikiHow: 1,871,468 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 737,768 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 255,101: number of process inputs linked to 8,453 distinct DBpedia concepts (dataset Process - Inputs) * 4,467: number of process outputs linked to 3,439 distinct DBpedia concepts (dataset Process - Outputs) * 376,795: number of step links between 114,166 different sets of instructions (dataset Process - Step Links)
R
Human Detection Dataset
universe.roboflow.com
zip
Updated Aug 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
thehafizsampersonal (2022). Human Detection Dataset [Dataset]. https://universe.roboflow.com/thehafizsampersonal-pbe9x/human-detection-zi7fv/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 4, 2022
Dataset authored and provided by
thehafizsampersonal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Bounding Boxes
Description
Using customer counting technologies, you may optimize daily operations and staffing by figuring out how many employees are needed to serve consumers and provide exceptional customer service. Enhancing customer service and increasing sales possibilities will be positively correlated. Systems for counting customers in help assessing their potential to boost sales and profitability. It is insufficient to gauge this only by looking at income generated. A far more useful and efficient approach is to compare the traffic ratio to the number of sales.
R
Image_person_dog_cat Dataset
universe.roboflow.com
zip
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
many people (2024). Image_person_dog_cat Dataset [Dataset]. https://universe.roboflow.com/many-people/image_person_dog_cat
Explore at:
zipAvailable download formats
Dataset updated
Jun 28, 2024
Dataset authored and provided by
many people
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person_dog Person_cat Person Bounding Boxes
Description
Image_person_dog_cat

## Overview Image_person_dog_cat is a dataset for object detection tasks - it contains Person_dog Person_cat Person annotations for 258 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Data from: Facial Expression Image Dataset for Computer Vision Algorithms
salford.figshare.com
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Alameer; Odunmolorun Osonuga (2025). Facial Expression Image Dataset for Computer Vision Algorithms [Dataset]. http://doi.org/10.17866/rd.salford.21220835.v2
Explore at:
Unique identifier
https://doi.org/10.17866/rd.salford.21220835.v2
Dataset updated
Apr 29, 2025
Dataset provided by
University of Salford
Authors
Ali Alameer; Odunmolorun Osonuga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset for this project is characterised by photos of individual human emotion expression and these photos are taken with the help of both digital camera and a mobile phone camera from different angles, posture, background, light exposure, and distances. This task might look and sound very easy but there were some challenges encountered along the process which are reviewed below: 1) People constraint One of the major challenges faced during this project is getting people to participate in the image capturing process as school was on vacation, and other individuals gotten around the environment were not willing to let their images be captured for personal and security reasons even after explaining the notion behind the project which is mainly for academic research purposes. Due to this challenge, we resorted to capturing the images of the researcher and just a few other willing individuals. 2) Time constraint As with all deep learning projects, the more data available the more accuracy and less error the result will produce. At the initial stage of the project, it was agreed to have 10 emotional expression photos each of at least 50 persons and we can increase the number of photos for more accurate results but due to the constraint in time of this project an agreement was later made to just capture the researcher and a few other people that are willing and available. These photos were taken for just two types of human emotion expression that is, “happy” and “sad” faces due to time constraint too. To expand our work further on this project (as future works and recommendations), photos of other facial expression such as anger, contempt, disgust, fright, and surprise can be included if time permits. 3) The approved facial emotions capture. It was agreed to capture as many angles and posture of just two facial emotions for this project with at least 10 images emotional expression per individual, but due to time and people constraints few persons were captured with as many postures as possible for this project which is stated below: Ø Happy faces: 65 images Ø Sad faces: 62 images There are many other types of facial emotions and again to expand our project in the future, we can include all the other types of the facial emotions if time permits, and people are readily available. 4) Expand Further. This project can be improved furthermore with so many abilities, again due to the limitation of time given to this project, these improvements can be implemented later as future works. In simple words, this project is to detect/predict real-time human emotion which involves creating a model that can detect the percentage confidence of any happy or sad facial image. The higher the percentage confidence the more accurate the facial fed into the model. 5) Other Questions Can the model be reproducible? the supposed response to this question should be YES. If and only if the model will be fed with the proper data (images) such as images of other types of emotional expression.
n
Data from: How many faces do people know?
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Sep 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob Jenkins; Andrew J. Dowsett; A. Mike Burton (2018). How many faces do people know? [Dataset]. http://doi.org/10.5061/dryad.7f25j43
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.7f25j43
Dataset updated
Sep 19, 2018
Dataset provided by
University of York
University of Aberdeen
Authors
Rob Jenkins; Andrew J. Dowsett; A. Mike Burton
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United Kingdom, USA
Description
Over our species history, humans have typically lived in small groups of under a hundred individuals. However, our face recognition abilities appear to equip us to recognize very many individuals, perhaps thousands. Modern society provides access to huge numbers of faces, but no one has established how many faces people actually know. Here we describe a method for estimating this number. By combining separate measures of recall and recognition, we show that people know about 5000 faces on average, and that individual differences are large. Our findings offer a possible explanation for large variation in identification performance. They also provide constraints on understanding the qualitative differences between perception of familiar and unfamiliar faces—a distinction that underlies all current theories of face recognition.
z
Image Dataset of Accessibility Barriers
zenodo.org
explore.openaire.eu
zip
Updated Mar 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Stolberg; Jakob Stolberg (2022). Image Dataset of Accessibility Barriers [Dataset]. http://doi.org/10.5281/zenodo.6382090
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6382090
Dataset updated
Mar 25, 2022
Dataset provided by
Zenodo
Authors
Jakob Stolberg; Jakob Stolberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Data
The dataset consist of 5538 images of public spaces, annotated with steps, stairs, ramps and grab bars for stairs and ramps. The dataset has annotations 3564 of steps, 1492 of stairs, 143 of ramps and 922 of grab bars.

Each step annotation is attributed with an estimate of the height of the step, as falling into one of three categories: less than 3cm, 3cm to 7cm or more than 7cm. Additionally it is attributed with a 'type', with the possibilities 'doorstep', 'curb' or 'other'.

Stair annotations are attributed with the number of steps in the stair.

Ramps are attributed with an estimate of their width, also falling into three categories: less than 50cm, 50cm to 100cm and more than 100cm.

In order to preserve all additional attributes of the labels, the data is published in the CVAT XML format for images.

Annotating Process
The labelling has been done using bounding boxes around the objects. This format is compatible with many popular object detection models, e.g. the YOLO object model. A bounding box is placed so it contains exactly the visible part of the respective objects. This implies that only objects that are visible in the photo are annotated. This means in particular a photo of a stair or step from above, where the object cannot be seen, have not been annotated, even when a human viewer can possibly infer that there is a stair or a step from other features in the photo.

Steps
A step is annotated, when there is an vertical increment that functions as a passage between two surface areas intended human or vehicle traffic. This means that we have not included:

Increments that are to high to reasonably be considered at passage.

Increments that does not lead to a surface intended for human or vehicle traffic, e.g. a 'step' in front of a wall or a curb in front of a bush.

In particular, the bounding box of a step object contains exactly the incremental part of the step, but does not extend into the top or bottom horizontal surface any more than necessary to enclose entirely the incremental part. This has been chosen for consistency reasons, as including parts of the horizontal surfaces would imply a non-trivial choice of how much to include, which we deemed would most likely lead to more inconstistent annotations.

The height of the steps are estimated by the annotators, and are therefore not guarranteed to be accurate.

The type of the steps typically fall into the category 'doorstep' or 'curb'. Steps that are in a doorway, entrance or likewise are attributed as doorsteps. We also include in this category steps that are immediately leading to a doorway within a proximity of 1-2m. Steps between different types of pathways, e.g. between streets and sidewalks, are annotated as curbs. Any other type of step are annotated with 'other'. Many of the 'other' steps are for example steps to terraces.

Stairs
The stair label is used whenever two or more steps directly follow each other in a consistent pattern. All vertical increments are enclosed in the bounding box, as well as intermediate surfaces of the steps. However the top and bottom surface is not included more than necessary for the same reason as for steps, as described in the previous section.

The annotator counts the number of steps, and attribute this to the stair object label.

Ramps
Ramps have been annotated when a sloped passage way has been placed or built to connect two surface areas intended for human or vehicle traffic. This implies the same considerations as with steps. Alike also only the sloped part of a ramp is annotated, not including the bottom or top surface area.

For each ramp, the annotator makes an assessment of the width of the ramp in three categories: less than 50cm, 50cm to 100cm and more than 100cm. This parameter is visually hard to assess, and sometimes impossible due to the view of the ramp.

Grab Bars
Grab bars are annotated for hand rails and similar that are in direct connection to a stair or a ramp. While horizontal grab bars could also have been included, this was omitted due to the implied ambiguities of fences and similar objects. As the grab bar was originally intended as an attributal information to stairs and ramps, we chose to keep this focus. The bounding box encloses the part of the grab bar that functions as a hand rail for the stair or ramp.

Usage
As is often the case when annotating data, much information depends on the subjective assessment of the annotator. As each data point in this dataset has been annotated only by one person, caution should be taken if the data is applied.

Generally speaking, the mindset and usage guiding the annotations have been wheelchair accessibility. While we have strived to annotate at an object level, hopefully making the data more widely applicable than this, we state this explicitly as it may have swayed untrivial annotation choices.

The attributal data, such as step height or ramp width are highly subjective estimations. We still provide this data to give a post-hoc method to adjust which annotations to use. E.g. for some purposes, one may be interested in detecting only steps that are indeed more than 3cm. The attributal data makes it possible to sort away the steps less than 3cm, so a machine learning algorithm can be trained on this more appropriate dataset for that use case. We stress however, that one cannot expect to train accurate machine learning algorithms inferring the attributal data, as this is not accurate data in the first place.

We hope this dataset will be a useful building block in the endeavours for automating barrier detection and documentation.
d
Traffic Crashes - People
catalog.data.gov
data.cityofchicago.org
Updated Aug 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2025). Traffic Crashes - People [Dataset]. https://catalog.data.gov/dataset/traffic-crashes-people
Explore at:
Dataset updated
Aug 2, 2025
Dataset provided by
data.cityofchicago.org
Description
This data contains information about people involved in a crash and if any injuries were sustained. This dataset should be used in combination with the traffic Crash and Vehicle dataset. Each record corresponds to an occupant in a vehicle listed in the Crash dataset. Some people involved in a crash may not have been an occupant in a motor vehicle, but may have been a pedestrian, bicyclist, or using another non-motor vehicle mode of transportation. Injuries reported are reported by the responding police officer. Fatalities that occur after the initial reports are typically updated in these records up to 30 days after the date of the crash. Person data can be linked with the Crash and Vehicle dataset using the “CRASH_RECORD_ID” field. A vehicle can have multiple occupants and hence have a one to many relationship between Vehicle and Person dataset. However, a pedestrian is a “unit” by itself and have a one to one relationship between the Vehicle and Person table. The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here. Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.
h
Data from: stereoset
huggingface.co
opendatalab.com
Updated Dec 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McGill NLP Group (2021). stereoset [Dataset]. https://huggingface.co/datasets/McGill-NLP/stereoset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2021
Dataset authored and provided by
McGill NLP Group
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for StereoSet

Dataset Summary

StereoSet is a dataset that measures stereotype bias in language models. StereoSet consists of 17,000 sentences that measures model preferences across gender, race, religion, and profession.

Supported Tasks and Leaderboards

multiple-choice question answering

Languages

English (en)

Dataset Structure Data Instances

intersentence

{'bias_type': 'race', 'context': 'Many people live in… See the full description on the dataset page: https://huggingface.co/datasets/McGill-NLP/stereoset.
Open Famous People Faces
kaggle.com
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yves Romero (2024). Open Famous People Faces [Dataset]. http://doi.org/10.34740/kaggle/dsv/8500944
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/8500944
Dataset updated
May 23, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yves Romero
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
This dataset was created to compare methods for face reidentification, that is, given an image and a name of a person, check if that image belongs to that person. But it also can be used to test face recognition algorithms, since the dataset has been categorized.

The autors have made a great effort to collect as much images as they could for all classes inside the dataset. Faces were aligned using eye position alignment and then cropped using landmarks to find the region of interest.

The Open Famous People Faces dataset contains 258 classes with at least 5 images per class. Images have different sizes, some are low quality and small sized images, others are high quality and big sized images. We have images from the same person at different ages.
Mobility data| Isochrones | Global coverage | How many people or places can...
datarade.ai
.csv, .json
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PTV Group (2025). Mobility data| Isochrones | Global coverage | How many people or places can be reached in a given time. [Dataset]. https://datarade.ai/data-products/isochrones-connectivity-how-many-people-or-places-can-be-ptv-group
Explore at:
.csv, .jsonAvailable download formats
Dataset updated
Apr 16, 2025
Dataset authored and provided by
PTV Grouphttps://www.ptvgroup.com/
Area covered
Kiribati, Ascension and Tristan da Cunha, Italy, French Polynesia, New Zealand, Taiwan, Iraq, Aruba, Wallis and Futuna, Mongolia
Description
PTV Isochrones provides for each location for various modes ( Car, foot, bicycle , public transport) : – The calculated catchment area for selected mode for a given max. travel time - information (population and work places) and travel time to/from each building in the catchment area – alternatively, information (population and work places) and travel time to/from each hectare grid in the catchment area – Optionally, the travel time between the location and all POIs within the catchment area – Optionally, travel time distributions of the number of POIs or POI attributes (e.g., population, workplaces, number of restaurants)

The catchment area can be calculated in two different ways: – The area from which the location can be reached within the specified time using the specified mode(s), or – The area that can be reached from the location within the specified time using the specified mode(s).

The customer must provide the following specifications:

– x, y coordinates (location) – A travel time (determines the size of the catchment area) – Means of transport for which the catchment area should be provided – Desired attributes (population, jobs, population per purchasing power, and others) – Desired POIs – Desired intervals for the travel time distribution

The data provided by PTV is then:

Polygon for the selected means of transport(s) Requested data per polygon on building or grid level

Public transport travel time is calculated based on the current time table. Travel time of all other modes are calculated on the latest OSM map.
It is also possible to retrieve multiple locations and deliver data for the overlapping regions.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
h
diffusiondb
huggingface.co
Updated Mar 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polo Club of Data Science (2023). diffusiondb [Dataset]. https://huggingface.co/datasets/poloclub/diffusiondb
Explore at:
Dataset updated
Mar 16, 2023
Dataset authored and provided by
Polo Club of Data Science
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
DiffusionDB is the first large-scale text-to-image prompt dataset. It contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models.
d
Bicycle & Pedestrian Counts
catalog.data.gov
data.somervillema.gov
+1more
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.somervillema.gov (2025). Bicycle & Pedestrian Counts [Dataset]. https://catalog.data.gov/dataset/bicycle-pedestrian-counts
Explore at:
Dataset updated
Feb 7, 2025
Dataset provided by
data.somervillema.gov
Description
The annual bike and pedestrian count is a volunteer data collection effort each fall that helps the City understand where and how many people are biking and walking in Somerville, and how those numbers are changing over time. This program has been taking place each year since 2010. Counts are collected Tuesday, Wednesday, or Thursday for one hour in the morning and evening using a “screen line” method, whereby cyclists and pedestrians are counted as they pass by an imaginary line across the street and sidewalks. Morning count sessions begin between 7:15 and 7:45 am, and evening count sessions begin between 4:45 and 5:15 pm. Bike counts capture the number of people riding bicycles, so an adult and child riding on the same bike would be counted as two counts even though it is only one bike. Pedestrian counts capture people walking or jogging, people using a wheelchair or assistive device, children in strollers, and people using other micro-mobility devices, such as skateboards, scooters, or roller skates. While the City and its amazing volunteers do their best to collect accurate and complete data each year and the City does quality control to catch clear errors, it is not possible to ensure 100% accuracy of the data and not all locations have been counted every year of the program. There are also several external factors impacting counts that are not consistent year-to-year, such as nearby construction and weather. For these reasons, the counts are intended to be used to observe high-level trends across the city and at count locations, and not to extrapolate that biking and walking in Somerville has changed by a specific percentage or number. Data in this dataset are available at the location count level. To request data at the movement level, please contact transportation@somervillema.gov.
g
Coronavirus (Covid-19) Data in the United States
github.com
openicpsr.org
+2more
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
Explore at:
csvAvailable download formats
Dataset provided by
New York Times
License
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Z
SH17 Dataset for PPE Detection
data.niaid.nih.gov
zenodo.org
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad, Hafiz Mughees (2024). SH17 Dataset for PPE Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12659324
Explore at:
Dataset updated
Jul 4, 2024
Dataset authored and provided by
Ahmad, Hafiz Mughees
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We propose Safe Human dataset consisting of 17 different objects referred to as SH17 dataset. We scrapped images from the Pexels website, which offers clear usage rights for all its images, showcasing a range of human activities across diverse industrial operations.

To extract relevant images, we used multiple queries such as manufacturing worker, industrial worker, human worker, labor, etc. The tags associated with Pexels images proved reasonably accurate. After removing duplicate samples, we obtained a dataset of 8,099 images. The dataset exhibits significant diversity, representing manufacturing environments globally, thus minimizing potential regional or racial biases. Samples of the dataset are shown below.

Key features

Collected from diverse industrial environments globally

High quality images (max resolution 8192x5462, min 1920x1002)

Average of 9.38 instances per image

Includes small objects like ears and earmuffs (39,764 annotations < 1% image area, 59,025 annotations < 5% area)

Classes

Person

Head

Face

Glasses

Face-mask-medical

Face-guard

Ear

Earmuffs

Hands

Gloves

Foot

Shoes

Safety-vest

Tools

Helmet

Medical-suit

Safety-suit

The data consists of three folders,

images contains all images

labels contains labels in YOLO format for all images

voc_labels contains labels in VOC format for all images

train_files.txt contains list of all images we used for training

val_files.txt contains list of all images we used for validation

Disclaimer and Responsible Use:

This dataset, scrapped through the Pexels website, is intended for educational, research, and analysis purposes only. You may be able to use the data for training of the Machine learning models only. Users are urged to use this data responsibly, ethically, and within the bounds of legal stipulations.

Users should adhere to Copyright Notice of Pexels when utilizing this dataset.

Legal Simplicity: All photos and videos on Pexels can be downloaded and used for free.

Allowed 👌

All photos and videos on Pexels are free to use.

Attribution is not required. Giving credit to the photographer or Pexels is not necessary but always appreciated.

You can modify the photos and videos from Pexels. Be creative and edit them as you like.

Not allowed 👎

Identifiable people may not appear in a bad light or in a way that is offensive.

Don't sell unaltered copies of a photo or video, e.g. as a poster, print or on a physical product without modifying it first.

Don't imply endorsement of your product by people or brands on the imagery.

Don't redistribute or sell the photos and videos on other stock photo or wallpaper platforms.

Don't use the photos or videos as part of your trade-mark, design-mark, trade-name, business name or service mark.

No Warranty Disclaimer:

The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for its use by others.

Ethical Use:

Users are encouraged to consider the ethical implications of their analyses and the potential impact on broader community.

GitHub Page:

https://github.com/ahmadmughees/SH17dataset
Q
Data for: The Bystander Affect Detection (BAD) Dataset for Failure Detection...
data.qdr.syr.edu
pdf, tsv, txt, zip
Updated Sep 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Bremers; Alexandra Bremers; Xuanyu Fang; Xuanyu Fang; Natalie Friedman; Natalie Friedman; Wendy Ju; Wendy Ju (2023). Data for: The Bystander Affect Detection (BAD) Dataset for Failure Detection in HRI [Dataset]. http://doi.org/10.5064/F6TAWBGS
Explore at:
zip(66872585), zip(67359564), zip(49981372), zip(45063165), zip(35942055), tsv(5431), zip(63732190), zip(32108293), zip(33064251), zip(49848937), zip(38858151), zip(137880775), zip(90804192), zip(36477139), zip(38068214), zip(36039067), zip(37592931), zip(34234760), zip(63445623), zip(38092264), zip(45582594), zip(50915158), zip(111033502), zip(32955394), zip(30549219), zip(39991378), zip(166237686), zip(50351519), zip(62744513), zip(46810648), zip(34379478), zip(35492684), zip(22036189), pdf(197935), zip(66187509), zip(40085473), zip(40798037), pdf(113804), zip(12931695), zip(31593404), zip(26677367), zip(35547615), tsv(244631), zip(35954889), txt(7329), zip(74593629), zip(52574377), zip(55483165), zip(31323914), zip(43519637), zip(42743107), zip(55790691), zip(50499507), zip(76761027), zip(38063092), zip(55654900), zip(30504764), zip(48203736), zip(40422817)Available download formats
Unique identifier
https://doi.org/10.5064/F6TAWBGS
Dataset updated
Sep 25, 2023
Dataset provided by
Qualitative Data Repository
Authors
Alexandra Bremers; Alexandra Bremers; Xuanyu Fang; Xuanyu Fang; Natalie Friedman; Natalie Friedman; Wendy Ju; Wendy Ju
License
https://qdr.syr.edu/policies/qdr-restricted-access-conditionshttps://qdr.syr.edu/policies/qdr-restricted-access-conditions
Description
Project Overview For a robot to repair its own error, it must first know it has made a mistake. One way that people detect errors is from the implicit reactions from bystanders – their confusion, smirks, or giggles clue us in that something unexpected occurred. To enable robots to detect and act on bystander responses to task failures, we developed a novel method to elicit bystander responses to human and robot errors. Data Overview This project introduces the Bystander Affect Detection (BAD) dataset – a dataset of videos of bystander reactions to videos of failures. This dataset includes 2,452 human reactions to failure, collected in contexts that approximate “in-the-wild” data collection – including natural variances in webcam quality, lighting, and background. The BAD dataset may be requested for use in related research projects. As the dataset contains facial video data of participants, access can be requested along with the presentation of a research protocol and data use agreement that protects participants. Data Collection Overview and Access Conditions Using 46 different stimulus videos featuring a variety of human and machine task failures, we collected a total of 2,452 webcam videos of human reactions from 54 participants. Recruitment happened through the online behavioral research platform Prolific (https://www.prolific.co/about), where the options were selected to recruit a gender-balanced sample across all countries available. Participants had to use a laptop or desktop. Compensation was set at the Prolific rate of $12/hr, which came down to about $8 per participant for about 40 minutes of participation. Participants agreed that their data can be shared for future research projects and the data were approved to be shared publicly by IRB review. However, considering the fact that this is a machine-learning dataset containing identifiable crowdsourced human subjects data, the research team has decided that potential secondary users of the data must meet the following criteria for the access request to be granted: 1. Agreement to three usage terms: - I will not redistribute the contents of the BAD Dataset - I will not use videos for purposes outside of human interaction research (broadly defined as any project that aims to study or develop improvements to human interactions with technology to result in a better user experience) - I will not use the videos to identify, defame, or otherwise negatively impact the health, welfare, employment or reputation of human participants 2. A description of what you want to use the BAD dataset for, indicating any applicable human subjects protection measures that are in place. (For instance, "Me and my fellow researchers at University of X, lab of Y, will use the BAD dataset to train a model to detect when our Nao robot interrupts people at awkward times. The PI is Professor Z. Our protocol was approved under IRB #.") 3. A copy of the IRB record or ethics approval document, confirming the research protocol and institutional approval. Data Analysis To test the viability of the collected data, we used the Bystander Reaction Dataset as input to a deep-learning model, BADNet, to predict failure occurrence. We tested different data labeling methods and learned how they affect model performance, achieving precisions above 90%. Shared Data Organization This data project consists of 54 zipped folders of recorded video data organized by participant, totaling 2,452 videos. The accompanying documentation includes a file containing the text of the consent form used for the research project, an inventory of the stimulus videos used, aggregate survey data, this data narrative, and an administrative readme file. Special Notes The data were approved to be shared publicly by IRB review. However, considering the fact that this is a machine-learning dataset containing identifiable crowdsourced human subjects data, the research team has decided that potential secondary users of the data must meet specific criteria before they qualify for access. Please consult the Terms tab below for more details and follow the instructions there if interested in requesting access.
R
Tonaalt Sonar Human Dataset
universe.roboflow.com
zip
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nuremberg Institute of Technology (2023). Tonaalt Sonar Human Dataset [Dataset]. https://universe.roboflow.com/nuremberg-institute-of-technology/tonaalt-sonar-human-dataset/dataset/4
Explore at:
zipAvailable download formats
Dataset updated
Oct 5, 2023
Dataset authored and provided by
Nuremberg Institute of Technology
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Humans Bounding Boxes
Description
The Data is collected and shared by Toni Aaltonen. The Original data can be viewed on github: https://github.com/tonaalt/sonar_human_dataset and is shared under the MIT License.

MIT License

Copyright (c) 2023 Toni Aaltonen

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3

ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.25383/city.14294597.v3

Dataset updated

May 31, 2023

Dataset provided by

City, University of London

Authors

Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.

Clear search

Close search

Google apps

Main menu

ORBIT: A real-world few-shot dataset for teachable object recognition...

Person Counter Dataset

The Human Know-How Dataset

Human Detection Dataset

Image_person_dog_cat Dataset

Image_person_dog_cat

Data from: Facial Expression Image Dataset for Computer Vision Algorithms

Data from: How many faces do people know?

Image Dataset of Accessibility Barriers

Traffic Crashes - People

Data from: stereoset

intersentence

Open Famous People Faces

Mobility data| Isochrones | Global coverage | How many people or places can...

Geonames - All Cities with a population > 1000

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

diffusiondb

Bicycle & Pedestrian Counts

Coronavirus (Covid-19) Data in the United States

SH17 Dataset for PPE Detection

Data for: The Bystander Affect Detection (BAD) Dataset for Failure Detection...

Tonaalt Sonar Human Dataset

ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision