68 datasets found

Reading habit Dataset
kaggle.com
Updated Sep 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overfitted (2020). Reading habit Dataset [Dataset]. https://www.kaggle.com/vipulgote4/reading-habit-dataset/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 3, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Overfitted
Description
Context

this dataset contain how much amount of peoples read books or audiobooks and therir age,income ,Education etc.

Acknowledgements

dataset collected from this site: https://www.pewresearch.org/internet/
P
The People’s Speech Dataset
paperswithcode.com
opendatalab.com
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Galvez; Greg Diamos; Juan Ciro; Juan Felipe Cerón; Keith Achorn; Anjali Gopi; David Kanter; Maximilian Lam; Mark Mazumder; Vijay Janapa Reddi (2023). The People’s Speech Dataset [Dataset]. https://paperswithcode.com/dataset/the-peoples-speech
Explore at:
Dataset updated
Dec 4, 2023
Authors
Daniel Galvez; Greg Diamos; Juan Ciro; Juan Felipe Cerón; Keith Achorn; Anjali Gopi; David Kanter; Maximilian Lam; Mark Mazumder; Vijay Janapa Reddi
Description
The People's Speech is a free-to-download 30,000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset). The data is collected via searching the Internet for appropriately licensed audio data with existing transcriptions.
f
ORBIT: A real-world few-shot dataset for teachable object recognition...
city.figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25383/city.14294597.v3
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
R
Detect People Talking Phone Dataset
universe.roboflow.com
zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Detect people talking on the phone (2024). Detect People Talking Phone Dataset [Dataset]. https://universe.roboflow.com/detect-people-talking-on-the-phone/detect-people-talking-phone
Explore at:
zipAvailable download formats
Dataset updated
Jul 19, 2024
Dataset authored and provided by
Detect people talking on the phone
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Detect People Talking On The Pho Bounding Boxes
Description
Detect People Talking Phone

## Overview Detect People Talking Phone is a dataset for object detection tasks - it contains Detect People Talking On The Pho annotations for 1,043 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
10,109 People - Face Images Dataset
nexdata.ai
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). 10,109 People - Face Images Dataset [Dataset]. https://www.nexdata.ai/datasets/1402?source=Github
Explore at:
Dataset updated
Jun 14, 2024
Dataset authored and provided by
Nexdata
Variables measured
Data size, Data format, Data diversity, Age distribution, Race distribution, Gender distribution, Collecting environment
Description
10,109 people - face images dataset includes people collected from many countries. Multiple photos of each person’s daily life are collected, and the gender, race, age, etc. of the person being collected are marked.This Dataset provides a rich resource for artificial intelligence applications. It has been validated by multiple AI companies and proves beneficial for achieving outstanding performance in real-world applications. Throughout the process of Dataset collection, storage, and usage, we have consistently adhered to Dataset protection and privacy regulations to ensure the preservation of user privacy and legal rights. All Dataset comply with regulations such as GDPR, CCPA, PIPL, and other applicable laws.
f
Data from: Facial Expression Image Dataset for Computer Vision Algorithms
salford.figshare.com
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Alameer; Odunmolorun Osonuga (2025). Facial Expression Image Dataset for Computer Vision Algorithms [Dataset]. http://doi.org/10.17866/rd.salford.21220835.v2
Explore at:
Unique identifier
https://doi.org/10.17866/rd.salford.21220835.v2
Dataset updated
Apr 29, 2025
Dataset provided by
University of Salford
Authors
Ali Alameer; Odunmolorun Osonuga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset for this project is characterised by photos of individual human emotion expression and these photos are taken with the help of both digital camera and a mobile phone camera from different angles, posture, background, light exposure, and distances. This task might look and sound very easy but there were some challenges encountered along the process which are reviewed below: 1) People constraint One of the major challenges faced during this project is getting people to participate in the image capturing process as school was on vacation, and other individuals gotten around the environment were not willing to let their images be captured for personal and security reasons even after explaining the notion behind the project which is mainly for academic research purposes. Due to this challenge, we resorted to capturing the images of the researcher and just a few other willing individuals. 2) Time constraint As with all deep learning projects, the more data available the more accuracy and less error the result will produce. At the initial stage of the project, it was agreed to have 10 emotional expression photos each of at least 50 persons and we can increase the number of photos for more accurate results but due to the constraint in time of this project an agreement was later made to just capture the researcher and a few other people that are willing and available. These photos were taken for just two types of human emotion expression that is, “happy” and “sad” faces due to time constraint too. To expand our work further on this project (as future works and recommendations), photos of other facial expression such as anger, contempt, disgust, fright, and surprise can be included if time permits. 3) The approved facial emotions capture. It was agreed to capture as many angles and posture of just two facial emotions for this project with at least 10 images emotional expression per individual, but due to time and people constraints few persons were captured with as many postures as possible for this project which is stated below: Ø Happy faces: 65 images Ø Sad faces: 62 images There are many other types of facial emotions and again to expand our project in the future, we can include all the other types of the facial emotions if time permits, and people are readily available. 4) Expand Further. This project can be improved furthermore with so many abilities, again due to the limitation of time given to this project, these improvements can be implemented later as future works. In simple words, this project is to detect/predict real-time human emotion which involves creating a model that can detect the percentage confidence of any happy or sad facial image. The higher the percentage confidence the more accurate the facial fed into the model. 5) Other Questions Can the model be reproducible? the supposed response to this question should be YES. If and only if the model will be fed with the proper data (images) such as images of other types of emotional expression.
Gender Detection & Classification - Face Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). Gender Detection & Classification - Face Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/gender-detection-and-classification-image-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Gender Detection & Classification - face recognition dataset

The dataset is created on the basis of Face Mask Detection dataset

Dataset Description:

The dataset comprises a collection of photos of people, organized into folders labeled "women" and "men." Each folder contains a significant number of images to facilitate training and testing of gender detection algorithms or models.

The dataset contains a variety of images capturing female and male individuals from diverse backgrounds, age groups, and ethnicities.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F1c4708f0b856f7889e3c0eea434fe8e2%2FFrame%2045%20(1).png?generation=1698764294000412&alt=media" alt="">

This labeled dataset can be utilized as training data for machine learning models, computer vision applications, and gender detection algorithms.

💴 For Commercial Usage: Full version of the dataset includes 376 000+ photos of people, leave a request on TrainingData to buy the dataset

Metadata for the full dataset:

assignment_id - unique identifier of the media file

worker_id - unique identifier of the person

age - age of the person

true_gender - gender of the person

country - country of the person

ethnicity - ethnicity of the person

photo_1_extension, photo_2_extension, photo_3_extension, photo_4_extension - photo extensions in the dataset

photo_1_resolution, photo_2_resolution, photo_3_extension, photo_4_resolution - photo resolution in the dataset

OTHER BIOMETRIC DATASETS:

Anti Spoofing Real Dataset

Antispoofing Replay Dataset

Selfies, ID Images dataset (5591 sets of 15 files)

Selfies and video dataset (4 052 sets)

Dataset of bald people, 5000 images

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to learn about the price and buy the dataset

Content

The dataset is split into train and test folders, each folder includes: - folders women and men - folders with images of people with the corresponding gender, - .csv file - contains information about the images and people in the dataset

File with the extension .csv

file: link to access the file,

gender: gender of a person in the photo (woman/man),

split: classification on train and test

TrainingData provides high-quality data annotation tailored to your needs

keywords: biometric system, biometric system attacks, biometric dataset, face recognition database, face recognition dataset, face detection dataset, facial analysis, gender detection, supervised learning dataset, gender classification dataset, gender recognition dataset
P
PeopleArt Dataset
paperswithcode.com
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Westlake; Hongping Cai; Peter Hall, PeopleArt Dataset [Dataset]. https://paperswithcode.com/dataset/peopleart
Explore at:
Authors
Nicholas Westlake; Hongping Cai; Peter Hall
Description
People-Art is an object detection dataset which consists of people in 43 different styles. People contained in this dataset are quite different from those in common photographs. There are 42 categories of art styles and movements including Naturalism, Cubism, Socialist Realism, Impressionism, and Suprematism
Healthy People 2020 Final Progress by Population Group Chart and Table
catalog.data.gov
data.virginia.gov
+2more
Updated Apr 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). Healthy People 2020 Final Progress by Population Group Chart and Table [Dataset]. https://catalog.data.gov/dataset/healthy-people-2020-final-progress-by-population-group-chart-and-table-617d0
Explore at:
Dataset updated
Apr 23, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
[1] The Progress by Population Group analysis is a component of the Healthy People 2020 (HP2020) Final Review. The analysis included subsets of the 1,111 measurable HP2020 objectives that have data available for any of six broad population characteristics: sex, race and ethnicity, educational attainment, family income, disability status, and geographic location. Progress toward meeting HP2020 targets is presented for up to 24 population groups within these characteristics, based on objective data aggregated across HP2020 topic areas. The Progress by Population Group data are also available at the individual objective level in the downloadable data set. [2] The final value was generally based on data available on the HP2020 website as of January 2020. For objectives that are continuing into HP2030, more recent data will be included on the HP2030 website as it becomes available: https://health.gov/healthypeople. [3] For more information on the HP2020 methodology for measuring progress toward target attainment and the elimination of health disparities, see: Healthy People Statistical Notes, no 27; available from: https://www.cdc.gov/nchs/data/statnt/statnt27.pdf. [4] Status for objectives included in the HP2020 Progress by Population Group analysis was determined using the baseline, final, and target value. The progress status categories used in HP2020 were: a. Target met or exceeded—One of the following applies: (i) At baseline, the target was not met or exceeded, and the most recent value was equal to or exceeded the target (the percentage of targeted change achieved was equal to or greater than 100%); (ii) The baseline and most recent values were equal to or exceeded the target (the percentage of targeted change achieved was not assessed). b. Improved—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved 10% or more of the targeted change. c. Little or no detectable change—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was not statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved less than 10% of the targeted change; (iii) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was not statistically significant; (iv) Movement was away from the baseline and target, standard errors were not available, and the objective had moved less than 10% relative to the baseline; (v) No change was observed between the baseline and the final data point. d. Got worse—One of the following applies: (i) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was statistically significant; (ii) Movement was away from the baseline and target, standard errors were not available, and the objective had moved 10% or more relative to the baseline. NOTE: Measurable objectives had baseline data. SOURCE: National Center for Health Statistics, Healthy People 2020 Progress by Population Group database.
i
Online Learning Global Queries Dataset: A Comprehensive Dataset of What...
ieee-dataport.org
Updated May 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isabella Hall (2022). Online Learning Global Queries Dataset: A Comprehensive Dataset of What People from Different Countries ask Google about Online Learning [Dataset]. https://ieee-dataport.org/documents/online-learning-global-queries-dataset-comprehensive-dataset-what-people-different
Explore at:
Dataset updated
May 11, 2022
Authors
Isabella Hall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Any work using this dataset should cite the following paper:
USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
h
Black_People_Face_Recognition
huggingface.co
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AxonLabs (2024). Black_People_Face_Recognition [Dataset]. https://huggingface.co/datasets/AxonData/Black_People_Face_Recognition
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 30, 2024
Authors
AxonLabs
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Black people Face Detection Dataset: 3M+ Identities

Large human faces dataset for face recognition models (10M+ images)

Share with us your feedback and recieve additional samples for free!😊 Full version of dataset is availible for commercial usage - leave a request on our website Axon Labs to purchase the dataset 💰

Dataset targeting 1:N and 1:1 NIST face recognition tests. Dataset contains 3M individuals, each with 3-5 images containing their faces The… See the full description on the dataset page: https://huggingface.co/datasets/AxonData/Black_People_Face_Recognition.
d
Traffic Crashes - People
catalog.data.gov
data.cityofchicago.org
Updated Jul 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2025). Traffic Crashes - People [Dataset]. https://catalog.data.gov/dataset/traffic-crashes-people
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.cityofchicago.org
Description
This data contains information about people involved in a crash and if any injuries were sustained. This dataset should be used in combination with the traffic Crash and Vehicle dataset. Each record corresponds to an occupant in a vehicle listed in the Crash dataset. Some people involved in a crash may not have been an occupant in a motor vehicle, but may have been a pedestrian, bicyclist, or using another non-motor vehicle mode of transportation. Injuries reported are reported by the responding police officer. Fatalities that occur after the initial reports are typically updated in these records up to 30 days after the date of the crash. Person data can be linked with the Crash and Vehicle dataset using the “CRASH_RECORD_ID” field. A vehicle can have multiple occupants and hence have a one to many relationship between Vehicle and Person dataset. However, a pedestrian is a “unit” by itself and have a one to one relationship between the Vehicle and Person table. The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here. Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.
P
Data from: ImageNet Dataset
paperswithcode.com
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li (2021). ImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/imagenet
Explore at:
Dataset updated
Apr 15, 2024
Authors
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
Description
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million
O
COVID-19 case rate per 100,000 population and percent test positivity in the...
data.ct.gov
catalog.data.gov
application/rdfxml +5
Updated Jun 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health (2022). COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/hree-nys2
Explore at:
application/rssxml, xml, csv, json, tsv, application/rdfxmlAvailable download formats
Dataset updated
Jun 23, 2022
Dataset authored and provided by
Department of Public Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).

A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.

Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.

These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.

These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd

As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm
The ORBIT (Object Recognition for Blind Image Training)-India Dataset
zenodo.org
data.niaid.nih.gov
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones (2025). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. http://doi.org/10.5281/zenodo.12608444
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.12608444
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gesu India; Gesu India; Martin Grayson; Martin Grayson; Daniela Massiceti; Daniela Massiceti; Cecily Morrison; Cecily Morrison; Simon Robinson; Simon Robinson; Jennifer Pearson; Jennifer Pearson; Matt Jones; Matt Jones
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

REFERENCES:

Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
R
Person Counter Dataset
universe.roboflow.com
zip
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tkbees (2023). Person Counter Dataset [Dataset]. https://universe.roboflow.com/tkbees-ogrtd/person-counter-tq0wf
Explore at:
zipAvailable download formats
Dataset updated
Jun 15, 2023
Dataset authored and provided by
Tkbees
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Bounding Boxes
Description
Here are a few use cases for this project:

Retail Analytics: Store owners can use the model to track the number of customers visiting their stores during different times of the day or seasons, which can help in workforce and resource allocation.

Crowd Management: Event organizers or public authorities can utilize the model to monitor crowd sizes at concerts, festivals, public gatherings or protests, aiding in security and emergency planning.

Smart Transportation: The model can be integrated into public transit systems to count the number of passengers in buses or trains, providing real-time occupancy information and assisting in transportation planning.

Health and Safety Compliance: During times of pandemics or emergencies, the model can be used to count the number of people in a location, ensuring compliance with restrictions on gathering sizes.

Building Security: The model can be adopted in security systems to track how many people enter and leave a building or a particular area, providing useful data for access control.
P
People Snapshot Dataset Dataset
paperswithcode.com
Updated Dec 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiemo Alldieck; Marcus Magnor; Weipeng Xu; Christian Theobalt; Gerard Pons-Moll (2023). People Snapshot Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/people-snapshot-dataset
Explore at:
Dataset updated
Dec 3, 2023
Authors
Thiemo Alldieck; Marcus Magnor; Weipeng Xu; Christian Theobalt; Gerard Pons-Moll
Description
Enables detailed human body model reconstruction in clothing from a single monocular RGB video without requiring a pre scanned template or manually clicked points.
COVID-19 Case Surveillance Public Use Data
data.cdc.gov
paperswithcode.com
+5more
application/rdfxml +5
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
Explore at:
application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

CDC has three COVID-19 case surveillance datasets:
COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements)
COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements)
COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (33 data elements)
The following apply to all three datasets:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data cells are suppressed to protect individual privacy.
The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

COVID-19 Case Reports

COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

For questions, please contact Ask SRRG (eocevent394@cdc.gov).

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
Z
Empathy dataset
data.niaid.nih.gov
zenodo.org
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathematical Research Data Initiative (2024). Empathy dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7683906
Explore at:
Dataset updated
Dec 18, 2024
Dataset authored and provided by
Mathematical Research Data Initiative
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.

The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.

Size: A dataset of size 1973*28

Number of features: 28

Ground truth: No

Type of Graph: Mixed graph

The following gives the description of the variables:

Feature FeatureLabel Domain Item meaning from Davis 1980

001 1FS Green I daydream and fantasize, with some regularity, about things that might happen to me.

002 2EC Purple I often have tender, concerned feelings for people less fortunate than me.

003 3PT_R Yellow I sometimes find it difficult to see things from the “other guy’s” point of view.

004 4EC_R Purple Sometimes I don’t feel very sorry for other people when they are having problems.

005 5FS Green I really get involved with the feelings of the characters in a novel.

006 6PD Red In emergency situations, I feel apprehensive and ill-at-ease.

007 7FS_R Green I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed)

008 8PT Yellow I try to look at everybody’s side of a disagreement before I make a decision.

009 9EC Purple When I see someone being taken advantage of, I feel kind of protective towards them.

010 10PD Red I sometimes feel helpless when I am in the middle of a very emotional situation.

011 11PT Yellow sometimes try to understand my friends better by imagining how things look from their perspective

012 12FS_R Green Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed)

013 13PD_R Red When I see someone get hurt, I tend to remain calm. (Reversed)

014 14EC_R Purple Other people’s misfortunes do not usually disturb me a great deal. (Reversed)

015 15PT_R Yellow If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed)

016 16FS Green After seeing a play or movie, I have felt as though I were one of the characters.

017 17PD Red Being in a tense emotional situation scares me.

018 18EC_R Purple When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed)

019 19PD_R Red I am usually pretty effective in dealing with emergencies. (Reversed)

020 20FS Green I am often quite touched by things that I see happen.

021 21PT Yellow I believe that there are two sides to every question and try to look at them both.

022 22EC Purple I would describe myself as a pretty soft-hearted person.

023 23FS Green When I watch a good movie, I can very easily put myself in the place of a leading character.

024 24PD Red I tend to lose control during emergencies.

025 25PT Yellow When I’m upset at someone, I usually try to “put myself in his shoes” for a while.

026 26FS Green When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me.

027 27PD Red When I see someone who badly needs help in an emergency, I go to pieces.

028 28PT Yellow Before criticizing somebody, I try to imagine how I would feel if I were in their place

More information about the dataset is contained in empathy_description.html file.

Facebook

Twitter

Click to copy link

Link copied

Cite

Overfitted (2020). Reading habit Dataset [Dataset]. https://www.kaggle.com/vipulgote4/reading-habit-dataset/tasks

Reading habit Dataset

dataset about how much peoples read

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 3, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Overfitted

Description

Context

this dataset contain how much amount of peoples read books or audiobooks and therir age,income ,Education etc.

Acknowledgements

dataset collected from this site: https://www.pewresearch.org/internet/

Clear search

Close search

Google apps

Main menu

Reading habit Dataset

Context

Acknowledgements

The People’s Speech Dataset

ORBIT: A real-world few-shot dataset for teachable object recognition...

Detect People Talking Phone Dataset

Detect People Talking Phone

10,109 People - Face Images Dataset

Data from: Facial Expression Image Dataset for Computer Vision Algorithms

Gender Detection & Classification - Face Dataset

Gender Detection & Classification - face recognition dataset

The dataset is created on the basis of Face Mask Detection dataset

💴 For Commercial Usage: Full version of the dataset includes 376 000+ photos of people, leave a request on TrainingData to buy the dataset

Metadata for the full dataset:

OTHER BIOMETRIC DATASETS:

💴 Buy the Dataset: This is just an example of the data. Leave a request on https://trainingdata.pro/datasets to learn about the price and buy the dataset

Content

File with the extension .csv

TrainingData provides high-quality data annotation tailored to your needs

PeopleArt Dataset

Healthy People 2020 Final Progress by Population Group Chart and Table

Online Learning Global Queries Dataset: A Comprehensive Dataset of What...

USA Name Data

Context

Content

Acknowledgements

Inspiration

Black_People_Face_Recognition

Traffic Crashes - People

Data from: ImageNet Dataset

COVID-19 case rate per 100,000 population and percent test positivity in the...

The ORBIT (Object Recognition for Blind Image Training)-India Dataset

Person Counter Dataset

People Snapshot Dataset Dataset

COVID-19 Case Surveillance Public Use Data

CDC has three COVID-19 case surveillance datasets:

Overview

COVID-19 Case Reports

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

Empathy dataset

Reading habit Dataset

dataset about how much peoples read

Context

Acknowledgements