100+ datasets found

Google Landmarks Dataset v2
github.com
opendatalab.com
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
Data from: San Francisco Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasf/san-francisco
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
DataSF
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
San Francisco
Description
Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.

This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.

This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).

This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

https://cloud.google.com/bigquery/public-data/sfo-311

https://cloud.google.com/bigquery/public-data/sffd-service-calls

https://cloud.google.com/bigquery/public-data/sfpd-reports

https://cloud.google.com/bigquery/public-data/sfo-trees

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?
USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
Human V1 Dataset
universe.roboflow.com
zip
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
aarongo.socialusername@gmail.com (2025). Human V1 Dataset [Dataset]. https://universe.roboflow.com/aarongo-socialusername-gmail-com/human-dataset-v1
Explore at:
zipAvailable download formats
Dataset updated
Mar 24, 2025
Dataset provided by
Gmailhttp://gmail.com/
Authors
aarongo.socialusername@gmail.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Humans Bounding Boxes
Description
Here are a few use cases for this project:

Human Presence Detection: This computer vision model can be incorporated into security systems and smart home devices to identify the presence of humans in an area, allowing for customized responses, room automation, and improved safety.

Crowd Size Estimation: The "human dataset v1" can be used by event organizers or city planners to estimate the size of gatherings or crowds at public events, helping them better allocate resources and manage these events more efficiently.

Surveillance and Security Enhancement: The model can be integrated into video surveillance systems to more accurately identify humans, helping to filter out false alarms caused by animals and other non-human entities.

Collaborative Robotics: Robots equipped with this computer vision model can more easily identify and differentiate humans from their surroundings, allowing them to more effectively collaborate with people in shared spaces while ensuring human safety.

Smart Advertising: The "human dataset v1" can be utilized by digital signage and advertising systems to detect and count the number of human viewers, enabling targeted advertising and measuring the effectiveness of marketing campaigns.
i
Online Learning Global Queries Dataset: A Comprehensive Dataset of What...
ieee-dataport.org
Updated May 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isabella Hall (2022). Online Learning Global Queries Dataset: A Comprehensive Dataset of What People from Different Countries ask Google about Online Learning [Dataset]. https://ieee-dataport.org/documents/online-learning-global-queries-dataset-comprehensive-dataset-what-people-different
Explore at:
Dataset updated
May 11, 2022
Authors
Isabella Hall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Any work using this dataset should cite the following paper:
h
ai-vs-human-google-gemma-2-2b-it
huggingface.co
Updated Dec 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
camil (2024). ai-vs-human-google-gemma-2-2b-it [Dataset]. https://huggingface.co/datasets/zcamz/ai-vs-human-google-gemma-2-2b-it
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2024
Authors
camil
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
AI vs Human dataset on the CNN Daily mails

Dataset Description

This dataset showcases pairs of truncated articles and their respective completions, crafted either by humans or an AI language model. Each article was randomly truncated between 25% and 50% of its length. The language model was then tasked with generating a completion that mirrored the characters count of the original human-written continuation.

Data Fields

'human': The original human-authored… See the full description on the dataset page: https://huggingface.co/datasets/zcamz/ai-vs-human-google-gemma-2-2b-it.
Context Tracking dataset
giter.site
github.com
Updated Mar 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2022). Context Tracking dataset [Dataset]. https://giter.site/google-research-datasets/contrack
Explore at:
Dataset updated
Mar 11, 2022
Dataset provided by
Googlehttp://google.com/
Description
The dataset consists of annotated human-human conversations in various social settings. Along with the conversations, the dataset contains annotations for people and location entities present in the conversation along with the properties of those entities and their relationships. The annotated data enables several subtasks like slot tagging, coreference resolution, resolving plural mentions and entity linking.
R
Person Detection Dataset
universe.roboflow.com
zip
Updated May 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Titulacin (2023). Person Detection Dataset [Dataset]. https://universe.roboflow.com/titulacin/person-detection-9a6mk/model/12
Explore at:
zipAvailable download formats
Dataset updated
May 28, 2023
Dataset authored and provided by
Titulacin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
People Bounding Boxes
Description
Most of the images were obtained from https://unsplash.com/es and google images. We don't own most of the images in this dataset, all of them were obtained for free and will be used only with academic purposes.

This dataset contains images of people labeled as Persona.
Trust level French people have in Google and Facebook to ensure data...
statista.com
Updated Mar 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Trust level French people have in Google and Facebook to ensure data protection 2019 [Dataset]. https://www.statista.com/statistics/1010095/trust-google-facebook-developing-better-data-protection-france/
Explore at:
Dataset updated
Mar 21, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 15, 2019 - May 16, 2019
Area covered
France
Description
This pie chart displays the level of trust people have in Google and Facebook to develop better tools for personal data protection on the Internet in France in a survey from 2019. It shows that 37 percent of the respondents rather did not trust those companies to ensure data protection, while 35 percent declared they rather trusted them.
R
Accident Detection Model Dataset
universe.roboflow.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Accident detection model
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accident Bounding Boxes
Description
Accident-Detection-Model

Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

Problem Statement

Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.

According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.

The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

Accidents survey

https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

Literature Survey

Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.

Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

Research Gap

Lack of real-world data - We trained model for more then 3200 images.

Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.

Outdated Versions of previous works - We aer using Latest version of Yolo v8.

Proposed methodology

We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.

This model after training with 25 iterations and is ready to detect an accident with a significant probability.

Model Set-up

Preparing Custom dataset

We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.

Then we annotated all of them individually on a tool called roboflow.

During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident

Then we divided the data set into train, val, test in the ratio of 8:1:1

At the final step we downloaded the dataset in yolov8 format.
#### Using Google Collab

We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.

You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.

Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.

In Google collab, First of all we Changed runtime from TPU to GPU.

We cross checked it by running command ‘!nvidia-smi’
#### Coding

First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’

Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’

Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’

Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’

After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’

Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’

The results are stored in the runs/detect/predict folder.
Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

Challenges I ran into

I majorly ran into 3 problems while making this model

I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.

I was facing problem on cvat website because i was not sure what
f
Datasheet1_Mobility data shows effectiveness of control strategies for...
frontiersin.figshare.com
pdf
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small (2024). Datasheet1_Mobility data shows effectiveness of control strategies for COVID-19 in remote, sparse and diffuse populations.pdf [Dataset]. http://doi.org/10.3389/fepid.2023.1201810.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fepid.2023.1201810.s001
Dataset updated
Mar 7, 2024
Dataset provided by
Frontiers
Authors
Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.
census-bureau-usa
kaggle.com
zip
Updated May 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset authored and provided by
Google BigQuery
Area covered
United States
Description
Context :

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

Dataset source

United States Census Bureau

Sample Query

SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

Terms of use

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data
USA Names
console.cloud.google.com
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=pt-BR&inv=1&invt=Ab4Asw (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=pt-BR
Explore at:
Dataset updated
Aug 15, 2023
Dataset provided by
Googlehttp://google.com/
Area covered
United States
Description
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Forest proximate people - 5km cutoff distance (Global - 100m)
data.amerigeoss.org
http, wmts
Updated Oct 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Food and Agriculture Organization (2022). Forest proximate people - 5km cutoff distance (Global - 100m) [Dataset]. https://data.amerigeoss.org/dataset/8ed893bd-842a-4866-a655-a0a0c02b79b5
Explore at:
http, wmtsAvailable download formats
Dataset updated
Oct 24, 2022
Dataset provided by
Food and Agriculture Organizationhttp://fao.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.

For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.

Contact points:

Maintainer: Leticia Pina

Maintainer: Sarah E., Castle

Data lineage:

The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.

References:

Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.

Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.

WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645

Online resources:

GEE asset for "Forest proximate people - 5km cutoff distance"
G
GPWv411: Population Density (Gridded Population of the World Version 4.11)
developers.google.com
Updated Aug 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA SEDAC at the Center for International Earth Science Information Network (2019). GPWv411: Population Density (Gridded Population of the World Version 4.11) [Dataset]. http://doi.org/10.7927/H49C6VHW
Explore at:
Unique identifier
https://doi.org/10.7927/H49C6VHW
Dataset updated
Aug 11, 2019
Dataset provided by
NASA SEDAC at the Center for International Earth Science Information Network
Time period covered
Jan 1, 2000 - Jan 1, 2020
Area covered
Earth
Description
This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.
United States Census
kaggle.com
zip
Updated Apr 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Census Bureau (2018). United States Census [Dataset]. https://www.kaggle.com/census/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 17, 2018
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
US Census Bureau
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census

Content

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.

The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa

https://cloud.google.com/bigquery/public-data/us-census

Dataset Source: United States Census Bureau

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by Steve Richey from Unsplash.

Inspiration

What are the ten most populous zip codes in the US in the 2010 census?

What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?

https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png"> https://cloud.google.com/bigquery/images/census-population-map.png
imageinwords
huggingface.co
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2024). imageinwords [Dataset]. https://huggingface.co/datasets/google/imageinwords
Explore at:
Dataset updated
May 7, 2024
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process. We validate the framework through evaluations focused on the quality of the dataset and its utility for fine-tuning with considerations for readability, comprehensiveness, specificity, hallucinations, and human-likeness.
Z
Human Interaction Image (HII) dataset
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohan S. Kankanhalli (2020). Human Interaction Image (HII) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_832379
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Qi Zhao
Yongkang Wong
Junnan Li
Mohan S. Kankanhalli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Human Interaction Image (HII) dataset is a new dataset containing Web images from Commercial Search Engines (Google, Bing and Flickr). We use keyword search to collect images corresponding to four types of interactions: handshake, highfive, hug, kiss. Then we manually filter the irrelevant images. The dataset contains 2410 images with at least 550 images per interaction.

The dataset can be applied, but not limited to the following research areas:

interaction recognition/prediction

action recognition

video analysis

transfer learning

Please cite the following paper if you use the HII dataset in your work (papers, articles, reports, books, software, etc):

J. Li, Y. Wong, Q.Zhao, M. Kankanhalli Attention Transfer from Web Images for Video Recognition ACM Multimedia, 2017. http://doi.org/10.1145/3123266.3123432
boolq
huggingface.co
Updated Dec 15, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2014). boolq [Dataset]. https://huggingface.co/datasets/google/boolq
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2014
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for Boolq

Dataset Summary

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.

Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/boolq.
Keras video classification example with a subset of UCF101 - Action...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikolaj Buchwald; Mikolaj Buchwald (2023). Keras video classification example with a subset of UCF101 - Action Recognition Data Set (top 5 videos) [Dataset]. http://doi.org/10.5281/zenodo.7924745
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7924745
Dataset updated
May 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mikolaj Buchwald; Mikolaj Buchwald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classify video clips with natural scenes of actions performed by people visible in the videos.

See the UCF101 Dataset web page: https://www.crcv.ucf.edu/data/UCF101.php#Results_on_UCF101

This example datasets consists of the 5 most numerous video from the UCF101 dataset. For the top 10 version see: https://doi.org/10.5281/zenodo.7882861 .

Based on this code: https://keras.io/examples/vision/video_classification/ (needs to be updated, if has not yet been already; see the issue: https://github.com/keras-team/keras-io/issues/1342).

Testing if data can be downloaded from figshare with `wget`, see: https://github.com/mojaveazure/angsd-wrapper/issues/10

For generating the subset, see this notebook: https://colab.research.google.com/github/sayakpaul/Action-Recognition-in-TensorFlow/blob/main/Data_Preparation_UCF101.ipynb -- however, it also needs to be adjusted (if has not yet been already - then I will post a link to the notebook here or elsewhere, e.g., in the corrected notebook with Keras example).

I would like to thank Sayak Paul for contacting me about his example at Keras documentation being out of date.

Cite this dataset as:

Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. https://doi.org/10.48550/arXiv.1212.0402

To download the dataset via the command line, please use:

wget -q https://zenodo.org/record/7924745/files/ucf101_top5.tar.gz -O ucf101_top5.tar.gz tar xf ucf101_top5.tar.gz

Facebook

Twitter

Click to copy link

Link copied

Cite

Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark

Google Landmarks Dataset v2

Explore at:

292 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 27, 2019

Dataset provided by

Googlehttp://google.com/

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.

Clear search

Close search

Google apps

Main menu

Google Landmarks Dataset v2

Data from: San Francisco Open Data

Context

Content

Acknowledgements

Inspiration

USA Name Data

Context

Content

Acknowledgements

Inspiration

Human V1 Dataset

Online Learning Global Queries Dataset: A Comprehensive Dataset of What...

ai-vs-human-google-gemma-2-2b-it

Context Tracking dataset

Person Detection Dataset

Trust level French people have in Google and Facebook to ensure data...

Accident Detection Model Dataset

Accident-Detection-Model

Problem Statement

Accidents survey

Literature Survey

Research Gap

Proposed methodology

Model Set-up

Preparing Custom dataset

Challenges I ran into

I majorly ran into 3 problems while making this model

Datasheet1_Mobility data shows effectiveness of control strategies for...

census-bureau-usa

Context :

Dataset source

Sample Query

Terms of use

USA Names

Forest proximate people - 5km cutoff distance (Global - 100m)

GPWv411: Population Density (Gridded Population of the World Version 4.11)

United States Census

Context

Content

Acknowledgements

Inspiration

imageinwords

Human Interaction Image (HII) dataset

boolq

Keras video classification example with a subset of UCF101 - Action...

Google Landmarks Dataset v2