100+ datasets found

E
The Human Know-How Dataset
find.data.gov.scot
dtechtive.com
pdf, zip
Updated Apr 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). The Human Know-How Dataset [Dataset]. http://doi.org/10.7488/ds/1394
Explore at:
zip(19.78 MB), zip(0.2837 MB), zip(19.67 MB), zip(69.8 MB), zip(9.433 MB), zip(62.92 MB), zip(20.43 MB), zip(43.28 MB), zip(92.88 MB), zip(13.06 MB), zip(14.86 MB), zip(5.372 MB), zip(0.0298 MB), pdf(0.0582 MB), zip(5.769 MB), zip(90.08 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/1394
Dataset updated
Apr 29, 2016
Description
The Human Know-How Dataset describes 211,696 human activities from many different domains. These activities are decomposed into 2,609,236 entities (each with an English textual label). These entities represent over two million actions and half a million pre-requisites. Actions are interconnected both according to their dependencies (temporal/logical orders between actions) and decompositions (decomposition of complex actions into simpler ones). This dataset has been integrated with DBpedia (259,568 links). For more information see: - The project website: http://homepages.inf.ed.ac.uk/s1054760/prohow/index.htm - The data is also available on datahub: https://datahub.io/dataset/human-activities-and-instructions ---------------------------------------------------------------- * Quickstart: if you want to experiment with the most high-quality data before downloading all the datasets, download the file '9of11_knowhow_wikihow', and optionally files 'Process - Inputs', 'Process - Outputs', 'Process - Step Links' and 'wikiHow categories hierarchy'. * Data representation based on the PROHOW vocabulary: http://w3id.org/prohow# Data extracted from existing web resources is linked to the original resources using the Open Annotation specification * Data Model: an example of how the data is represented within the datasets is available in the attached Data Model PDF file. The attached example represents a simple set of instructions, but instructions in the dataset can have more complex structures. For example, instructions could have multiple methods, steps could have further sub-steps, and complex requirements could be decomposed into sub-requirements. ---------------------------------------------------------------- Statistics: * 211,696: number of instructions. From wikiHow: 167,232 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 44,464 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 2,609,236: number of RDF nodes within the instructions From wikiHow: 1,871,468 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 737,768 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 255,101: number of process inputs linked to 8,453 distinct DBpedia concepts (dataset Process - Inputs) * 4,467: number of process outputs linked to 3,439 distinct DBpedia concepts (dataset Process - Outputs) * 376,795: number of step links between 114,166 different sets of instructions (dataset Process - Step Links)
Reading habit Dataset
kaggle.com
Updated Sep 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overfitted (2020). Reading habit Dataset [Dataset]. https://www.kaggle.com/vipulgote4/reading-habit-dataset/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 3, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Overfitted
Description
Context

this dataset contain how much amount of peoples read books or audiobooks and therir age,income ,Education etc.

Acknowledgements

dataset collected from this site: https://www.pewresearch.org/internet/
US Broadband Usage Across Counties
kaggle.com
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
United States
Description
US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

By Amber Thomas [source]

About this dataset

This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use the US Broadband Usage Dataset

This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

Research Ideas

Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.

Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.

Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: broadband_data_2020October.csv

Acknowledgements

If you use this dataset in your research,...
LinkedIn Dataset - US People Profiles
kaggle.com
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph from Proxycurl (2023). LinkedIn Dataset - US People Profiles [Dataset]. https://www.kaggle.com/datasets/proxycurl/10000-us-people-profiles/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Joseph from Proxycurl
Description
Full profile of 10,000 people in the US - download here, data schema here, with more than 40 data points including - Full Name - Education - Location - Work Experience History and many more!

There are additionally 258+ Million US people profiles available, visit the LinkDB product page here.

Our LinkDB database is an exhaustive database of publicly accessible LinkedIn people and companies profiles. It contains close to 500 Million people and companies profiles globally.
h
male-selfie-image-dataset
huggingface.co
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2024). male-selfie-image-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/male-selfie-image-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2024
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Face Recognition, Face Detection, Male Photo Dataset 👨

If you are interested in biometric data - visit our website to learn more and buy the dataset :)

110,000+ photos of 74,000+ men from 141 countries. The dataset includes photos of people's faces. All people presented in the dataset are men. The dataset contains a variety of images capturing individuals from diverse backgrounds and age groups. Our dataset will diversify your data by adding more photos of men of… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/male-selfie-image-dataset.
z
Image Dataset of Accessibility Barriers
zenodo.org
explore.openaire.eu
zip
Updated Mar 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Stolberg; Jakob Stolberg (2022). Image Dataset of Accessibility Barriers [Dataset]. http://doi.org/10.5281/zenodo.6382090
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6382090
Dataset updated
Mar 25, 2022
Dataset provided by
Zenodo
Authors
Jakob Stolberg; Jakob Stolberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Data
The dataset consist of 5538 images of public spaces, annotated with steps, stairs, ramps and grab bars for stairs and ramps. The dataset has annotations 3564 of steps, 1492 of stairs, 143 of ramps and 922 of grab bars.

Each step annotation is attributed with an estimate of the height of the step, as falling into one of three categories: less than 3cm, 3cm to 7cm or more than 7cm. Additionally it is attributed with a 'type', with the possibilities 'doorstep', 'curb' or 'other'.

Stair annotations are attributed with the number of steps in the stair.

Ramps are attributed with an estimate of their width, also falling into three categories: less than 50cm, 50cm to 100cm and more than 100cm.

In order to preserve all additional attributes of the labels, the data is published in the CVAT XML format for images.

Annotating Process
The labelling has been done using bounding boxes around the objects. This format is compatible with many popular object detection models, e.g. the YOLO object model. A bounding box is placed so it contains exactly the visible part of the respective objects. This implies that only objects that are visible in the photo are annotated. This means in particular a photo of a stair or step from above, where the object cannot be seen, have not been annotated, even when a human viewer can possibly infer that there is a stair or a step from other features in the photo.

Steps
A step is annotated, when there is an vertical increment that functions as a passage between two surface areas intended human or vehicle traffic. This means that we have not included:

Increments that are to high to reasonably be considered at passage.

Increments that does not lead to a surface intended for human or vehicle traffic, e.g. a 'step' in front of a wall or a curb in front of a bush.

In particular, the bounding box of a step object contains exactly the incremental part of the step, but does not extend into the top or bottom horizontal surface any more than necessary to enclose entirely the incremental part. This has been chosen for consistency reasons, as including parts of the horizontal surfaces would imply a non-trivial choice of how much to include, which we deemed would most likely lead to more inconstistent annotations.

The height of the steps are estimated by the annotators, and are therefore not guarranteed to be accurate.

The type of the steps typically fall into the category 'doorstep' or 'curb'. Steps that are in a doorway, entrance or likewise are attributed as doorsteps. We also include in this category steps that are immediately leading to a doorway within a proximity of 1-2m. Steps between different types of pathways, e.g. between streets and sidewalks, are annotated as curbs. Any other type of step are annotated with 'other'. Many of the 'other' steps are for example steps to terraces.

Stairs
The stair label is used whenever two or more steps directly follow each other in a consistent pattern. All vertical increments are enclosed in the bounding box, as well as intermediate surfaces of the steps. However the top and bottom surface is not included more than necessary for the same reason as for steps, as described in the previous section.

The annotator counts the number of steps, and attribute this to the stair object label.

Ramps
Ramps have been annotated when a sloped passage way has been placed or built to connect two surface areas intended for human or vehicle traffic. This implies the same considerations as with steps. Alike also only the sloped part of a ramp is annotated, not including the bottom or top surface area.

For each ramp, the annotator makes an assessment of the width of the ramp in three categories: less than 50cm, 50cm to 100cm and more than 100cm. This parameter is visually hard to assess, and sometimes impossible due to the view of the ramp.

Grab Bars
Grab bars are annotated for hand rails and similar that are in direct connection to a stair or a ramp. While horizontal grab bars could also have been included, this was omitted due to the implied ambiguities of fences and similar objects. As the grab bar was originally intended as an attributal information to stairs and ramps, we chose to keep this focus. The bounding box encloses the part of the grab bar that functions as a hand rail for the stair or ramp.

Usage
As is often the case when annotating data, much information depends on the subjective assessment of the annotator. As each data point in this dataset has been annotated only by one person, caution should be taken if the data is applied.

Generally speaking, the mindset and usage guiding the annotations have been wheelchair accessibility. While we have strived to annotate at an object level, hopefully making the data more widely applicable than this, we state this explicitly as it may have swayed untrivial annotation choices.

The attributal data, such as step height or ramp width are highly subjective estimations. We still provide this data to give a post-hoc method to adjust which annotations to use. E.g. for some purposes, one may be interested in detecting only steps that are indeed more than 3cm. The attributal data makes it possible to sort away the steps less than 3cm, so a machine learning algorithm can be trained on this more appropriate dataset for that use case. We stress however, that one cannot expect to train accurate machine learning algorithms inferring the attributal data, as this is not accurate data in the first place.

We hope this dataset will be a useful building block in the endeavours for automating barrier detection and documentation.
Multi-Camera Action Dataset (MCAD)
zenodo.org
application/gzip +2
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli; Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli (2020). Multi-Camera Action Dataset (MCAD) [Dataset]. http://doi.org/10.5281/zenodo.884592
Explore at:
application/gzip, json, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.884592
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli; Wenhui Li; Yongkang Wong; An-An Liu; Yang Li; Yu-Ting Su; Mohan Kankanhalli
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Action recognition has received increasing attentions from the computer vision and machine learning community in the last decades. Ever since then, the recognition task has evolved from single view recording under controlled laboratory environment to unconstrained environment (i.e., surveillance environment or user generated videos). Furthermore, recent work focused on other aspect of action recognition problem, such as cross-view classification, cross domain learning, multi-modality learning, and action localization. Despite the large variations of studies, we observed limited works that explore the open-set and open-view classification problem, which is a genuine inherited properties in action recognition problem. In other words, a well designed algorithm should robustly identify an unfamiliar action as “unknown” and achieved similar performance across sensors with similar field of view. The Multi-Camera Action Dataset (MCAD) is designed to evaluate the open-view classification problem under surveillance environment.

In our multi-camera action dataset, different from common action datasets we use a total of five cameras, which can be divided into two types of cameras (StaticandPTZ), to record actions. Particularly, there are three Static cameras (Cam04 & Cam05 & Cam06) with fish eye effect and two PanTilt-Zoom (PTZ) cameras (PTZ04 & PTZ06). Static camera has a resolution of 1280×960 pixels, while PTZ camera has a resolution of 704×576 pixels and a smaller field of view than Static camera. What’s more, we don’t control the illumination environment. We even set two contrasting conditions (Daytime and Nighttime environment) which makes our dataset more challenge than many controlled datasets with strongly controlled illumination environment.The distribution of the cameras is shown in the picture on the right.

We identified 18 units single person daily actions with/without object which are inherited from the KTH, IXMAS, and TRECIVD datasets etc. The list and the definition of actions are shown in the table. These actions can also be divided into 4 types actions. Micro action without object (action ID of 01, 02 ,05) and with object (action ID of 10, 11, 12 ,13). Intense action with object (action ID of 03, 04 ,06, 07, 08, 09) and with object (action ID of 14, 15, 16, 17, 18). We recruited a total of 20 human subjects. Each candidate repeats 8 times (4 times during the day and 4 times in the evening) of each action under one camera. In the recording process, we use five cameras to record each action sample separately. During recording stage we just tell candidates the action name then they could perform the action freely with their own habit, only if they do the action in the field of view of the current camera. This can make our dataset much closer to reality. As a results there is high intra action class variation among different action samples as shown in picture of action samples.

URL: http://mmas.comp.nus.edu.sg/MCAD/MCAD.html

Resources:

IDXXXX.mp4.tar.gz contains video data for each individual

boundingbox.tar.gz contains person bounding box for all videos

protocol.json contains the evaluation protocol

img_list.txt contains the download URLs for the images version of the video data

idt_list.txt contians the download URLs for the improved Dense Trajectory feature

stip_list.txt contians the download URLs for the STIP feature

Manual annotated 2D joints for selected camera view and action class (available via http://zju-capg.org/heightmap/)

How to Cite:

Please cite the following paper if you use the MCAD dataset in your work (papers, articles, reports, books, software, etc):

Wenhui Liu, Yongkang Wong, An-An Liu, Yang Li, Yu-Ting Su, Mohan Kankanhalli
Multi-Camera Action Dataset for Cross-Camera Action Recognition Benchmarking
IEEE Winter Conference on Applications of Computer Vision (WACV), 2017.
http://doi.org/10.1109/WACV.2017.28
f
ORBIT: A real-world few-shot dataset for teachable object recognition...
city.figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25383/city.14294597.v3
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
Effect of suicide rates on life expectancy dataset
zenodo.org
csv
Updated Apr 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Filip Zoubek; Filip Zoubek (2021). Effect of suicide rates on life expectancy dataset [Dataset]. http://doi.org/10.5281/zenodo.4694270
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4694270
Dataset updated
Apr 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Filip Zoubek; Filip Zoubek
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Effect of suicide rates on life expectancy dataset

Abstract
In 2015, approximately 55 million people died worldwide, of which 8 million committed suicide. In the USA, one of the main causes of death is the aforementioned suicide, therefore, this experiment is dealing with the question of how much suicide rates affects the statistics of average life expectancy.
The experiment takes two datasets, one with the number of suicides and life expectancy in the second one and combine data into one dataset. Subsequently, I try to find any patterns and correlations among the variables and perform statistical test using simple regression to confirm my assumptions.

Data

The experiment uses two datasets - WHO Suicide Statistics[1] and WHO Life Expectancy[2], which were firstly appropriately preprocessed. The final merged dataset to the experiment has 13 variables, where country and year are used as index: Country, Year, Suicides number, Life expectancy, Adult Mortality, which is probability of dying between 15 and 60 years per 1000 population, Infant deaths, which is number of Infant Deaths per 1000 population, Alcohol, which is alcohol, recorded per capita (15+) consumption, Under-five deaths, which is number of under-five deaths per 1000 population, HIV/AIDS, which is deaths per 1 000 live births HIV/AIDS, GDP, which is Gross Domestic Product per capita, Population, Income composition of resources, which is Human Development Index in terms of income composition of resources, and Schooling, which is number of years of schooling.

LICENSE

THE EXPERIMENT USES TWO DATASET - WHO SUICIDE STATISTICS AND WHO LIFE EXPECTANCY, WHICH WERE COLLEECTED FROM WHO AND UNITED NATIONS WEBSITE. THEREFORE, ALL DATASETS ARE UNDER THE LICENSE ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 3.0 IGO (https://creativecommons.org/licenses/by-nc-sa/3.0/igo/).

[1] https://www.kaggle.com/szamil/who-suicide-statistics

[2] https://www.kaggle.com/kumarajarshi/life-expectancy-who
a
Downsampled Open Images V4 Dataset
academictorrents.com
bittorrent
Updated Dec 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2018). Downsampled Open Images V4 Dataset [Dataset]. https://academictorrents.com/details/9208d33aceb2ca3eb2beb70a192600c9c41efba1
Explore at:
bittorrent(85220313799)Available download formats
Dataset updated
Dec 19, 2018
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
This is the downsampled version of the Open Images V4 Dataset. The Open Images V4 dataset contains 15.4M bounding-boxes for 600 categories on 1.9M images and 30.1M human-verified image-level labels for 19794 categories. The dataset is available at this link. This total size of the full dataset is 18TB. There s also a smaller version which contains rescaled images to have at most 1024 pixels on the longest side. However, the total size of the rescaled dataset is still large (513GB for training, 12GB for validation and 36GB for testing). I provide a much smaller version of the Open Images Dataset V4, as inspired by Downsampled ImageNet datasets @PatrykChrabaszcz. These downsampled dataset are much smaller in size so everyone can download it with ease (59GB for training with 512px version and 16GB for training with 256px version). Experiments on these downsampled datasets are also much faster than the original. | Dataset | Train Size | Validation Size | Test Size | Test Challenge Size |
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

Empathy dataset

zenodo.org

bin, csv, html

Updated Dec 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Zenodo (2024). Empathy dataset [Dataset]. http://doi.org/10.5281/zenodo.7683907

Explore at:

bin, html, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7683907

Dataset updated

Dec 18, 2024

Dataset provided by

Zenodohttp://zenodo.org/

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

The database for this study (Briganti et al. 2018; the same for the Braun study analysis) was composed of 1973 French-speaking students in several universities or schools for higher education in the following fields: engineering (31%), medicine (18%), nursing school (16%), economic sciences (15%), physiotherapy, (4%), psychology (11%), law school (4%) and dietetics (1%). The subjects were 17 to 25 years old (M = 19.6 years, SD = 1.6 years), 57% were females and 43% were males. Even though the full dataset was composed of 1973 participants, only 1270 answered the full questionnaire: missing data are handled using pairwise complete observations in estimating a Gaussian Graphical Model, meaning that all available information from every subject are used.

The feature set is composed of 28 items meant to assess the four following components: fantasy, perspective taking, empathic concern and personal distress. In the questionnaire, the items are mixed; reversed items (items 3, 4, 7, 12, 13, 14, 15, 18, 19) are present. Items are scored from 0 to 4, where “0” means “Doesn’t describe me very well” and “4” means “Describes me very well”; reverse-scoring is calculated afterwards. The questionnaires were anonymized. The reanalysis of the database in this retrospective study was approved by the ethical committee of the Erasmus Hospital.

Size: A dataset of size 1973*28

Number of features: 28

Ground truth: No

Type of Graph: Mixed graph

The following gives the description of the variables:

Feature	FeatureLabel	Domain	Item meaning from Davis 1980
001	1FS	Green	I daydream and fantasize, with some regularity, about things that might happen to me.
002	2EC	Purple	I often have tender, concerned feelings for people less fortunate than me.
003	3PT_R	Yellow	I sometimes find it difficult to see things from the “other guy’s” point of view.
004	4EC_R	Purple	Sometimes I don’t feel very sorry for other people when they are having problems.
005	5FS	Green	I really get involved with the feelings of the characters in a novel.
006	6PD	Red	In emergency situations, I feel apprehensive and ill-at-ease.
007	7FS_R	Green	I am usually objective when I watch a movie or play, and I don’t often get completely caught up in it.(Reversed)
008	8PT	Yellow	I try to look at everybody’s side of a disagreement before I make a decision.
009	9EC	Purple	When I see someone being taken advantage of, I feel kind of protective towards them.
010	10PD	Red	I sometimes feel helpless when I am in the middle of a very emotional situation.
011	11PT	Yellow	sometimes try to understand my friends better by imagining how things look from their perspective
012	12FS_R	Green	Becoming extremely involved in a good book or movie is somewhat rare for me. (Reversed)
013	13PD_R	Red	When I see someone get hurt, I tend to remain calm. (Reversed)
014	14EC_R	Purple	Other people’s misfortunes do not usually disturb me a great deal. (Reversed)
015	15PT_R	Yellow	If I’m sure I’m right about something, I don’t waste much time listening to other people’s arguments. (Reversed)
016	16FS	Green	After seeing a play or movie, I have felt as though I were one of the characters.
017	17PD	Red	Being in a tense emotional situation scares me.
018	18EC_R	Purple	When I see someone being treated unfairly, I sometimes don’t feel very much pity for them. (Reversed)
019	19PD_R	Red	I am usually pretty effective in dealing with emergencies. (Reversed)
020	20FS	Green	I am often quite touched by things that I see happen.
021	21PT	Yellow	I believe that there are two sides to every question and try to look at them both.
022	22EC	Purple	I would describe myself as a pretty soft-hearted person.
023	23FS	Green	When I watch a good movie, I can very easily put myself in the place of a leading character.
024	24PD	Red	I tend to lose control during emergencies.
025	25PT	Yellow	When I’m upset at someone, I usually try to “put myself in his shoes” for a while.
026	26FS	Green	When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me.
027	27PD	Red	When I see someone who badly needs help in an emergency, I go to pieces.
028	28PT	Yellow	Before criticizing somebody, I try to imagine how I would feel if I were in their place

More information about the dataset is contained in empathy_description.html file.

USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
h
Data from: stereoset
huggingface.co
opendatalab.com
Updated Dec 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McGill NLP Group (2021). stereoset [Dataset]. https://huggingface.co/datasets/McGill-NLP/stereoset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2021
Dataset authored and provided by
McGill NLP Group
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for StereoSet

Dataset Summary

StereoSet is a dataset that measures stereotype bias in language models. StereoSet consists of 17,000 sentences that measures model preferences across gender, race, religion, and profession.

Supported Tasks and Leaderboards

multiple-choice question answering

Languages

English (en)

Dataset Structure Data Instances

intersentence

{'bias_type': 'race', 'context': 'Many people live in… See the full description on the dataset page: https://huggingface.co/datasets/McGill-NLP/stereoset.
d
Traffic Crashes - People
catalog.data.gov
data.cityofchicago.org
Updated Aug 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2025). Traffic Crashes - People [Dataset]. https://catalog.data.gov/dataset/traffic-crashes-people
Explore at:
Dataset updated
Aug 11, 2025
Dataset provided by
data.cityofchicago.org
Description
This data contains information about people involved in a crash and if any injuries were sustained. This dataset should be used in combination with the traffic Crash and Vehicle dataset. Each record corresponds to an occupant in a vehicle listed in the Crash dataset. Some people involved in a crash may not have been an occupant in a motor vehicle, but may have been a pedestrian, bicyclist, or using another non-motor vehicle mode of transportation. Injuries reported are reported by the responding police officer. Fatalities that occur after the initial reports are typically updated in these records up to 30 days after the date of the crash. Person data can be linked with the Crash and Vehicle dataset using the “CRASH_RECORD_ID” field. A vehicle can have multiple occupants and hence have a one to many relationship between Vehicle and Person dataset. However, a pedestrian is a “unit” by itself and have a one to one relationship between the Vehicle and Person table. The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here. Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.
f
Data from: Facial Expression Image Dataset for Computer Vision Algorithms
salford.figshare.com
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Alameer; Odunmolorun Osonuga (2025). Facial Expression Image Dataset for Computer Vision Algorithms [Dataset]. http://doi.org/10.17866/rd.salford.21220835.v2
Explore at:
Unique identifier
https://doi.org/10.17866/rd.salford.21220835.v2
Dataset updated
Apr 29, 2025
Dataset provided by
University of Salford
Authors
Ali Alameer; Odunmolorun Osonuga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset for this project is characterised by photos of individual human emotion expression and these photos are taken with the help of both digital camera and a mobile phone camera from different angles, posture, background, light exposure, and distances. This task might look and sound very easy but there were some challenges encountered along the process which are reviewed below: 1) People constraint One of the major challenges faced during this project is getting people to participate in the image capturing process as school was on vacation, and other individuals gotten around the environment were not willing to let their images be captured for personal and security reasons even after explaining the notion behind the project which is mainly for academic research purposes. Due to this challenge, we resorted to capturing the images of the researcher and just a few other willing individuals. 2) Time constraint As with all deep learning projects, the more data available the more accuracy and less error the result will produce. At the initial stage of the project, it was agreed to have 10 emotional expression photos each of at least 50 persons and we can increase the number of photos for more accurate results but due to the constraint in time of this project an agreement was later made to just capture the researcher and a few other people that are willing and available. These photos were taken for just two types of human emotion expression that is, “happy” and “sad” faces due to time constraint too. To expand our work further on this project (as future works and recommendations), photos of other facial expression such as anger, contempt, disgust, fright, and surprise can be included if time permits. 3) The approved facial emotions capture. It was agreed to capture as many angles and posture of just two facial emotions for this project with at least 10 images emotional expression per individual, but due to time and people constraints few persons were captured with as many postures as possible for this project which is stated below: Ø Happy faces: 65 images Ø Sad faces: 62 images There are many other types of facial emotions and again to expand our project in the future, we can include all the other types of the facial emotions if time permits, and people are readily available. 4) Expand Further. This project can be improved furthermore with so many abilities, again due to the limitation of time given to this project, these improvements can be implemented later as future works. In simple words, this project is to detect/predict real-time human emotion which involves creating a model that can detect the percentage confidence of any happy or sad facial image. The higher the percentage confidence the more accurate the facial fed into the model. 5) Other Questions Can the model be reproducible? the supposed response to this question should be YES. If and only if the model will be fed with the proper data (images) such as images of other types of emotional expression.
S
Dataset: Deenz Dark Triad Scale – Poland
sodha.be
tsv
Updated Feb 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deen Mohd Dar; Deen Mohd Dar (2025). Dataset: Deenz Dark Triad Scale – Poland [Dataset]. http://doi.org/10.34934/DVN/4WYRN9
Explore at:
tsv(6069)Available download formats
Unique identifier
https://doi.org/10.34934/DVN/4WYRN9
Dataset updated
Feb 20, 2025
Dataset provided by
Social Sciences and Digital Humanities Archive – SODHA
Authors
Deen Mohd Dar; Deen Mohd Dar
License
https://www.sodha.be/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34934/DVN/4WYRN9https://www.sodha.be/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34934/DVN/4WYRN9
Area covered
Poland
Description
This dataset comes from a study conducted in Poland with 44 participants. The goal of the study was to measure personality traits known as the Dark Triad. The Dark Triad consists of three key traits that influence how people think and behave towards others. These traits are Machiavellianism, Narcissism, and Psychopathy. Machiavellianism refers to a person's tendency to manipulate others and be strategic in their actions. People with high Machiavellianism scores often believe that deception is necessary to achieve their goals. Narcissism is related to self-importance and the need for admiration. Individuals with high narcissism scores may see themselves as special and expect others to recognize their greatness. Psychopathy is linked to impulsive behavior and a lack of empathy. People with high psychopathy scores tend to be less concerned about the feelings of others and may take risks without worrying about consequences. Each participant in the dataset answered 30 questions, divided into three sections, with 10 questions per trait. The answers were recorded using a Likert scale from 1 to 5, where: 1 means "Strongly Disagree" 2 means "Disagree" 3 means "Neutral" 4 means "Agree" 5 means "Strongly Agree" This scale helps measure how much a person agrees with statements related to each of the three traits. The dataset also includes basic demographic information. Each participant has a unique ID (such as P001, P002, etc.) to keep their identity anonymous. The dataset records their age, which ranges from 18 to 60 years old, and their gender, which is categorized as "Male," "Female," or "Other." The responses in the dataset are realistic, with small variations to reflect natural differences in personality. On average, participants scored around 3.2 for Machiavellianism, meaning most people showed a moderate tendency to be strategic or manipulative. The average Narcissism score was 3.5, indicating that some participants valued themselves highly and sought admiration. The average Psychopathy score was 2.8, showing that most participants did not strongly exhibit impulsive or reckless behaviors. This dataset can be useful for many purposes. Researchers can use it to analyze personality traits and see how they compare across different groups. The data can also be used for cross-cultural comparisons, allowing researchers to study how personality traits in Poland differ from those in other countries. Additionally, psychologists can use this data to understand how Dark Triad traits influence behavior in everyday life. The dataset is saved in a CSV format, which makes it easy to open in programs like Excel, SPSS, or Python for further analysis. Because the data is structured and anonymized, it can be used safely for research without revealing personal information. In summary, this dataset provides valuable insights into personality traits among people in Poland. It allows researchers to explore how Machiavellianism, Narcissism, and Psychopathy vary among individuals. By studying these traits, psychologists can better understand human behavior and how it affects relationships, decision-making, and personal success.
d
Overtone Journalistic Content Bot/Human Indicator Dataset
datarade.ai
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overtone (2023). Overtone Journalistic Content Bot/Human Indicator Dataset [Dataset]. https://datarade.ai/data-products/overtone-journalistic-content-bot-human-indicator-dataset-overtone
Explore at:
Dataset updated
Jan 23, 2023
Dataset authored and provided by
Overtone
Area covered
Finland, Virgin Islands (U.S.), Russian Federation, Australia, Panama, Aruba, Belarus, Brazil, Belize, Falkland Islands (Malvinas)
Description
We indicate how likely a piece of content is computer generated or human written. Content: any text in English or Spanish, from a single sentence to articles of 1,000s words length.

Data uniqueness: we use custom built and trained NLP algorithms to assess human effort metrics that are inherent in text content. We focus on what's in the text, not metadata such as publication or engagement. Our AI algorithms are co-created by NLP & journalism experts. Our datasets have all been human-reviewed and labeled.

Dataset: CSV containing URL and/or body text, with attributed scoring as an integer and model confidence as a percentage. We ignore metadata such as author, publication, date, word count, shares and so on, to provide a clean and maximally unbiased assessment of how much human effort has been invested in content. Our data is provided in CSV/RSS/JSON format. One row = one scored article. CSV contains URL and/or body text, with attributed scoring as an integer and model confidence as a percentage.

Integrity indicators provided as integers on a 1–5 scale. We also have custom models with 35 categories that can be added on request.

Data sourcing: public websites, crawlers, scrapers, other partnerships where available. We generally can assess content behind paywalls as well as without paywalls. We source from ~4,000 news outlets, examples include: Bloomberg, CNN, BCC are one each. Countries: all English-speaking markets world-wide. Includes English-language content from non English majority regions, such as Germany, Scandinavia, Japan. Also available in Spanish on request.

Use-cases: assessing the implicit integrity and reliability of an article. There is correlation between integrity and human value: we have shown that articles scoring highly according to our scales show increased, sustained, ongoing end-user engagement. Clients also use this to assess journalistic output, publication relevance and to create datasets of 'quality' journalism.

Overtone provides a range of qualitative metrics for journalistic, newsworthy and long-form content. We find, highlight and synthesise content that shows added human effort and, by extension, added human value.
h
diffusiondb
huggingface.co
Updated Aug 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polo Club of Data Science (2022). diffusiondb [Dataset]. https://huggingface.co/datasets/poloclub/diffusiondb
Explore at:
Dataset updated
Aug 20, 2022
Dataset authored and provided by
Polo Club of Data Science
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
DiffusionDB is the first large-scale text-to-image prompt dataset. It contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models.
Face Dataset Of People That Don't Exist
kaggle.com
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2023). Face Dataset Of People That Don't Exist [Dataset]. http://doi.org/10.34740/kaggle/dsv/6433550
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6433550
Dataset updated
Sep 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

All the images of faces here are generated using https://thispersondoesnotexist.com/

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4c3d3569f4f9c12fc898d76390f68dab%2FBeFunky-collage.jpg?generation=1662079836729388&alt=media" alt="">

Copyrighting of AI Generated images

Under US copyright law, these images are technically not subject to copyright protection. Only "original works of authorship" are considered. "To qualify as a work of 'authorship' a work must be created by a human being," according to a US Copyright Office's report [PDF].

https://www.theregister.com/2022/08/14/ai_digital_artwork_copyright/

Tagging

I manually tagged all images as best as I could and separated them between the two classes below

Female- 3860 images

Male- 3013 images

Some may pass either female or male, but I will leave it to you to do the reviewing. I included toddlers and babies under Male/ Female

How it works

Each of the faces are totally fake, created using an algorithm called Generative Adversarial Networks (GANs).

A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss).

Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning,and reinforcement learning.

https://www.youtube.com/watch?v=u8qPvzk0AfY

https://www.youtube.com/watch?v=dCKbRCUyop8

https://www.youtube.com/watch?v=SWoravHhsUU

Github implementation of website

https://github.com/NVlabs/stylegan2

https://github.com/lucidrains/stylegan2-pytorch

https://github.com/lucidrains/lightweight-gan

How I gathered the images

Just a simple Jupyter notebook that looped and invoked the website https://thispersondoesnotexist.com/ , saving all images locally

Facebook

Twitter

Click to copy link

Link copied

Cite

(2016). The Human Know-How Dataset [Dataset]. http://doi.org/10.7488/ds/1394

The Human Know-How Dataset

Explore at:

76 scholarly articles cite this dataset (View in Google Scholar)

zip(19.78 MB), zip(0.2837 MB), zip(19.67 MB), zip(69.8 MB), zip(9.433 MB), zip(62.92 MB), zip(20.43 MB), zip(43.28 MB), zip(92.88 MB), zip(13.06 MB), zip(14.86 MB), zip(5.372 MB), zip(0.0298 MB), pdf(0.0582 MB), zip(5.769 MB), zip(90.08 MB)Available download formats

Unique identifier

https://doi.org/10.7488/ds/1394

Dataset updated

Apr 29, 2016

Description

The Human Know-How Dataset describes 211,696 human activities from many different domains. These activities are decomposed into 2,609,236 entities (each with an English textual label). These entities represent over two million actions and half a million pre-requisites. Actions are interconnected both according to their dependencies (temporal/logical orders between actions) and decompositions (decomposition of complex actions into simpler ones). This dataset has been integrated with DBpedia (259,568 links). For more information see: - The project website: http://homepages.inf.ed.ac.uk/s1054760/prohow/index.htm - The data is also available on datahub: https://datahub.io/dataset/human-activities-and-instructions ---------------------------------------------------------------- * Quickstart: if you want to experiment with the most high-quality data before downloading all the datasets, download the file '9of11_knowhow_wikihow', and optionally files 'Process - Inputs', 'Process - Outputs', 'Process - Step Links' and 'wikiHow categories hierarchy'. * Data representation based on the PROHOW vocabulary: http://w3id.org/prohow# Data extracted from existing web resources is linked to the original resources using the Open Annotation specification * Data Model: an example of how the data is represented within the datasets is available in the attached Data Model PDF file. The attached example represents a simple set of instructions, but instructions in the dataset can have more complex structures. For example, instructions could have multiple methods, steps could have further sub-steps, and complex requirements could be decomposed into sub-requirements. ---------------------------------------------------------------- Statistics: * 211,696: number of instructions. From wikiHow: 167,232 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 44,464 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 2,609,236: number of RDF nodes within the instructions From wikiHow: 1,871,468 (datasets 1of11_knowhow_wikihow to 9of11_knowhow_wikihow). From Snapguide: 737,768 (datasets 10of11_knowhow_snapguide to 11of11_knowhow_snapguide). * 255,101: number of process inputs linked to 8,453 distinct DBpedia concepts (dataset Process - Inputs) * 4,467: number of process outputs linked to 3,439 distinct DBpedia concepts (dataset Process - Outputs) * 376,795: number of step links between 114,166 different sets of instructions (dataset Process - Step Links)

Clear search

Close search

Google apps

Main menu

The Human Know-How Dataset

Reading habit Dataset

Context

Acknowledgements

US Broadband Usage Across Counties

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use the US Broadband Usage Dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

LinkedIn Dataset - US People Profiles

male-selfie-image-dataset

Image Dataset of Accessibility Barriers

Multi-Camera Action Dataset (MCAD)

ORBIT: A real-world few-shot dataset for teachable object recognition...

Effect of suicide rates on life expectancy dataset

Downsampled Open Images V4 Dataset

Geonames - All Cities with a population > 1000

Empathy dataset

USA Name Data

Context

Content

Acknowledgements

Inspiration

Data from: stereoset

intersentence

Traffic Crashes - People

Data from: Facial Expression Image Dataset for Computer Vision Algorithms

Dataset: Deenz Dark Triad Scale – Poland

Overtone Journalistic Content Bot/Human Indicator Dataset

diffusiondb

Face Dataset Of People That Don't Exist

Context

Copyrighting of AI Generated images

Tagging

How it works

Github implementation of website

How I gathered the images

The Human Know-How Dataset