100+ datasets found
  1. Google Landmarks Dataset v2

    • github.com
    • opendatalab.com
    Updated Sep 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
    Explore at:
    Dataset updated
    Sep 27, 2019
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.

  2. Data from: San Francisco Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasf/san-francisco
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    DataSF
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    San Francisco
    Description

    Context

    DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

    https://datasf.org/about/

    Content

    This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

    • This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.
    • This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.
    • This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).
    • This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    http://datasf.org/

    Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @meric from Unplash.

    Inspiration

    Which neighborhoods have the highest proportion of offensive graffiti?

    Which complaint is most likely to be made using Twitter and in which neighborhood?

    What are the most complained about Muni stops in San Francisco?

    What are the top 10 incident types that the San Francisco Fire Department responds to?

    How many medical incidents and structure fires are there in each neighborhood?

    What’s the average response time for each type of dispatched vehicle?

    Which category of police incidents have historically been the most common in San Francisco?

    What were the most common police incidents in the category of LARCENY/THEFT in 2016?

    Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

    What is the average tree diameter?

    What is the highest number of a particular species of tree planted in a single year?

    Which San Francisco locations feature the largest number of trees?

  3. USA Name Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

    Content

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

    All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

    https://cloud.google.com/bigquery/public-data/usa-names

    Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @dcp from Unplash.

    Inspiration

    What are the most common names?

    What are the most common female names?

    Are there more female or male names?

    Female names by a wide margin?

  4. Human V1 Dataset

    • universe.roboflow.com
    zip
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aarongo.socialusername@gmail.com (2025). Human V1 Dataset [Dataset]. https://universe.roboflow.com/aarongo-socialusername-gmail-com/human-dataset-v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Gmailhttp://gmail.com/
    Authors
    aarongo.socialusername@gmail.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Humans Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Human Presence Detection: This computer vision model can be incorporated into security systems and smart home devices to identify the presence of humans in an area, allowing for customized responses, room automation, and improved safety.

    2. Crowd Size Estimation: The "human dataset v1" can be used by event organizers or city planners to estimate the size of gatherings or crowds at public events, helping them better allocate resources and manage these events more efficiently.

    3. Surveillance and Security Enhancement: The model can be integrated into video surveillance systems to more accurately identify humans, helping to filter out false alarms caused by animals and other non-human entities.

    4. Collaborative Robotics: Robots equipped with this computer vision model can more easily identify and differentiate humans from their surroundings, allowing them to more effectively collaborate with people in shared spaces while ensuring human safety.

    5. Smart Advertising: The "human dataset v1" can be utilized by digital signage and advertising systems to detect and count the number of human viewers, enabling targeted advertising and measuring the effectiveness of marketing campaigns.

  5. i

    Online Learning Global Queries Dataset: A Comprehensive Dataset of What...

    • ieee-dataport.org
    Updated May 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isabella Hall (2022). Online Learning Global Queries Dataset: A Comprehensive Dataset of What People from Different Countries ask Google about Online Learning [Dataset]. https://ieee-dataport.org/documents/online-learning-global-queries-dataset-comprehensive-dataset-what-people-different
    Explore at:
    Dataset updated
    May 11, 2022
    Authors
    Isabella Hall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Any work using this dataset should cite the following paper:

  6. h

    ai-vs-human-google-gemma-2-2b-it

    • huggingface.co
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    camil (2024). ai-vs-human-google-gemma-2-2b-it [Dataset]. https://huggingface.co/datasets/zcamz/ai-vs-human-google-gemma-2-2b-it
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2024
    Authors
    camil
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    AI vs Human dataset on the CNN Daily mails

      Dataset Description
    

    This dataset showcases pairs of truncated articles and their respective completions, crafted either by humans or an AI language model. Each article was randomly truncated between 25% and 50% of its length. The language model was then tasked with generating a completion that mirrored the characters count of the original human-written continuation.

      Data Fields
    

    'human': The original human-authored… See the full description on the dataset page: https://huggingface.co/datasets/zcamz/ai-vs-human-google-gemma-2-2b-it.

  7. Context Tracking dataset

    • giter.site
    • github.com
    Updated Mar 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2022). Context Tracking dataset [Dataset]. https://giter.site/google-research-datasets/contrack
    Explore at:
    Dataset updated
    Mar 11, 2022
    Dataset provided by
    Googlehttp://google.com/
    Description

    The dataset consists of annotated human-human conversations in various social settings. Along with the conversations, the dataset contains annotations for people and location entities present in the conversation along with the properties of those entities and their relationships. The annotated data enables several subtasks like slot tagging, coreference resolution, resolving plural mentions and entity linking.

  8. R

    Person Detection Dataset

    • universe.roboflow.com
    zip
    Updated May 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Titulacin (2023). Person Detection Dataset [Dataset]. https://universe.roboflow.com/titulacin/person-detection-9a6mk/model/12
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 28, 2023
    Dataset authored and provided by
    Titulacin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    People Bounding Boxes
    Description

    Most of the images were obtained from https://unsplash.com/es and google images. We don't own most of the images in this dataset, all of them were obtained for free and will be used only with academic purposes.

    This dataset contains images of people labeled as Persona.

  9. Trust level French people have in Google and Facebook to ensure data...

    • statista.com
    Updated Mar 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Trust level French people have in Google and Facebook to ensure data protection 2019 [Dataset]. https://www.statista.com/statistics/1010095/trust-google-facebook-developing-better-data-protection-france/
    Explore at:
    Dataset updated
    Mar 21, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 15, 2019 - May 16, 2019
    Area covered
    France
    Description

    This pie chart displays the level of trust people have in Google and Facebook to develop better tools for personal data protection on the Internet in France in a survey from 2019. It shows that 37 percent of the respondents rather did not trust those companies to ensure data protection, while 35 percent declared they rather trusted them.

  10. R

    Accident Detection Model Dataset

    • universe.roboflow.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2024
    Dataset authored and provided by
    Accident detection model
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Accident Bounding Boxes
    Description

    Accident-Detection-Model

    Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

    Problem Statement

    • Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.
    • According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.
    • The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

    Accidents survey

    https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

    Literature Survey

    • Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.
    • Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

    Research Gap

    • Lack of real-world data - We trained model for more then 3200 images.
    • Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.
    • Outdated Versions of previous works - We aer using Latest version of Yolo v8.

    Proposed methodology

    • We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.
    • This model after training with 25 iterations and is ready to detect an accident with a significant probability.

    Model Set-up

    Preparing Custom dataset

    • We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.
    • Then we annotated all of them individually on a tool called roboflow.
    • During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident
    • Then we divided the data set into train, val, test in the ratio of 8:1:1
    • At the final step we downloaded the dataset in yolov8 format.
      #### Using Google Collab
    • We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.
    • You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.
    • Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.
    • In Google collab, First of all we Changed runtime from TPU to GPU.
    • We cross checked it by running command ‘!nvidia-smi’
      #### Coding
    • First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’
    • Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’
    • Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’
    • Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’
    • After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’
    • Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’
    • The results are stored in the runs/detect/predict folder.
      Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

    Challenges I ran into

    I majorly ran into 3 problems while making this model

    • I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.
    • I was facing problem on cvat website because i was not sure what
  11. f

    Datasheet1_Mobility data shows effectiveness of control strategies for...

    • frontiersin.figshare.com
    pdf
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small (2024). Datasheet1_Mobility data shows effectiveness of control strategies for COVID-19 in remote, sparse and diffuse populations.pdf [Dataset]. http://doi.org/10.3389/fepid.2023.1201810.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Frontiers
    Authors
    Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.

  12. census-bureau-usa

    • kaggle.com
    zip
    Updated May 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 18, 2020
    Dataset authored and provided by
    Google BigQuery
    Area covered
    United States
    Description

    Context :

    The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

    Dataset source

    United States Census Bureau

    Sample Query

    SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

    Terms of use

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data

  13. USA Names

    • console.cloud.google.com
    Updated Aug 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=pt-BR&inv=1&invt=Ab4Asw (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=pt-BR
    Explore at:
    Dataset updated
    Aug 15, 2023
    Dataset provided by
    Googlehttp://google.com/
    Area covered
    United States
    Description

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  14. Forest proximate people - 5km cutoff distance (Global - 100m)

    • data.amerigeoss.org
    http, wmts
    Updated Oct 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Food and Agriculture Organization (2022). Forest proximate people - 5km cutoff distance (Global - 100m) [Dataset]. https://data.amerigeoss.org/dataset/8ed893bd-842a-4866-a655-a0a0c02b79b5
    Explore at:
    http, wmtsAvailable download formats
    Dataset updated
    Oct 24, 2022
    Dataset provided by
    Food and Agriculture Organizationhttp://fao.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The "Forest Proximate People" (FPP) dataset is one of the data layers contributing to the development of indicator #13, “number of forest-dependent people in extreme poverty,” of the Collaborative Partnership on Forests (CPF) Global Core Set of forest-related indicators (GCS). The FPP dataset provides an estimate of the number of people living in or within 5 kilometers of forests (forest-proximate people) for the year 2019 with a spatial resolution of 100 meters at a global level.

    For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L. Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: A new methodology and global estimates. Background Paper to The State of the World’s Forests 2022 report. Rome, FAO.

    Contact points:

    Maintainer: Leticia Pina

    Maintainer: Sarah E., Castle

    Data lineage:

    The FPP data are generated using Google Earth Engine. Forests are defined by the Copernicus Global Land Cover (CGLC) (Buchhorn et al. 2020) classification system’s definition of forests: tree cover ranging from 15-100%, with or without understory of shrubs and grassland, and including both open and closed forests. Any area classified as forest sized ≥ 1 ha in 2019 was included in this definition. Population density was defined by the WorldPop global population data for 2019 (WorldPop 2018). High density urban populations were excluded from the analysis. High density urban areas were defined as any contiguous area with a total population (using 2019 WorldPop data for population) of at least 50,000 people and comprised of pixels all of which met at least one of two criteria: either the pixel a) had at least 1,500 people per square km, or b) was classified as “built-up” land use by the CGLC dataset (where “built-up” was defined as land covered by buildings and other manmade structures) (Dijkstra et al. 2020). Using these datasets, any rural people living in or within 5 kilometers of forests in 2019 were classified as forest proximate people. Euclidean distance was used as the measure to create a 5-kilometer buffer zone around each forest cover pixel. The scripts for generating the forest-proximate people and the rural-urban datasets using different parameters or for different years are published and available to users. For more detail, such as the theory behind this indicator and the definition of parameters, and to cite this data, see: Newton, P., Castle, S.E., Kinzer, A.T., Miller, D.C., Oldekop, J.A., Linhares-Juvenal, T., Pina, L., Madrid, M., & de Lamo, J. 2022. The number of forest- and tree-proximate people: a new methodology and global estimates. Background Paper to The State of the World’s Forests 2022. Rome, FAO.

    References:

    Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.E., Herold, M., Fritz, S., 2020. Copernicus Global Land Service: Land Cover 100m: collection 3 epoch 2019. Globe.

    Dijkstra, L., Florczyk, A.J., Freire, S., Kemper, T., Melchiorri, M., Pesaresi, M. and Schiavina, M., 2020. Applying the degree of urbanisation to the globe: A new harmonised definition reveals a different picture of global urbanisation. Journal of Urban Economics, p.103312.

    WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University, 2018. Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645

    Online resources:

    GEE asset for "Forest proximate people - 5km cutoff distance"

  15. G

    GPWv411: Population Density (Gridded Population of the World Version 4.11)

    • developers.google.com
    Updated Aug 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA SEDAC at the Center for International Earth Science Information Network (2019). GPWv411: Population Density (Gridded Population of the World Version 4.11) [Dataset]. http://doi.org/10.7927/H49C6VHW
    Explore at:
    Dataset updated
    Aug 11, 2019
    Dataset provided by
    NASA SEDAC at the Center for International Earth Science Information Network
    Time period covered
    Jan 1, 2000 - Jan 1, 2020
    Area covered
    Earth
    Description

    This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.

  16. United States Census

    • kaggle.com
    zip
    Updated Apr 17, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Census Bureau (2018). United States Census [Dataset]. https://www.kaggle.com/census/census-bureau-usa
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 17, 2018
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    US Census Bureau
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
    Source: https://en.wikipedia.org/wiki/United_States_Census

    Content

    The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.

    The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.

    Fork this kernel to get started.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa

    https://cloud.google.com/bigquery/public-data/us-census

    Dataset Source: United States Census Bureau

    Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by Steve Richey from Unsplash.

    Inspiration

    What are the ten most populous zip codes in the US in the 2010 census?

    What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?

    https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png"> https://cloud.google.com/bigquery/images/census-population-map.png

  17. imageinwords

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2024). imageinwords [Dataset]. https://huggingface.co/datasets/google/imageinwords
    Explore at:
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process. We validate the framework through evaluations focused on the quality of the dataset and its utility for fine-tuning with considerations for readability, comprehensiveness, specificity, hallucinations, and human-likeness.

  18. Z

    Human Interaction Image (HII) dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohan S. Kankanhalli (2020). Human Interaction Image (HII) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_832379
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Qi Zhao
    Yongkang Wong
    Junnan Li
    Mohan S. Kankanhalli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Human Interaction Image (HII) dataset is a new dataset containing Web images from Commercial Search Engines (Google, Bing and Flickr). We use keyword search to collect images corresponding to four types of interactions: handshake, highfive, hug, kiss. Then we manually filter the irrelevant images. The dataset contains 2410 images with at least 550 images per interaction.

    The dataset can be applied, but not limited to the following research areas:

    interaction recognition/prediction

    action recognition

    video analysis

    transfer learning

    Please cite the following paper if you use the HII dataset in your work (papers, articles, reports, books, software, etc):

    J. Li, Y. Wong, Q.Zhao, M. Kankanhalli Attention Transfer from Web Images for Video Recognition ACM Multimedia, 2017. http://doi.org/10.1145/3123266.3123432

  19. boolq

    • huggingface.co
    Updated Dec 15, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2014). boolq [Dataset]. https://huggingface.co/datasets/google/boolq
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2014
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset Card for Boolq

      Dataset Summary
    

    BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.

      Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/boolq.
    
  20. Keras video classification example with a subset of UCF101 - Action...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated May 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikolaj Buchwald; Mikolaj Buchwald (2023). Keras video classification example with a subset of UCF101 - Action Recognition Data Set (top 5 videos) [Dataset]. http://doi.org/10.5281/zenodo.7924745
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mikolaj Buchwald; Mikolaj Buchwald
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classify video clips with natural scenes of actions performed by people visible in the videos.

    See the UCF101 Dataset web page: https://www.crcv.ucf.edu/data/UCF101.php#Results_on_UCF101

    This example datasets consists of the 5 most numerous video from the UCF101 dataset. For the top 10 version see: https://doi.org/10.5281/zenodo.7882861 .

    Based on this code: https://keras.io/examples/vision/video_classification/ (needs to be updated, if has not yet been already; see the issue: https://github.com/keras-team/keras-io/issues/1342).

    Testing if data can be downloaded from figshare with `wget`, see: https://github.com/mojaveazure/angsd-wrapper/issues/10

    For generating the subset, see this notebook: https://colab.research.google.com/github/sayakpaul/Action-Recognition-in-TensorFlow/blob/main/Data_Preparation_UCF101.ipynb -- however, it also needs to be adjusted (if has not yet been already - then I will post a link to the notebook here or elsewhere, e.g., in the corrected notebook with Keras example).

    I would like to thank Sayak Paul for contacting me about his example at Keras documentation being out of date.

    Cite this dataset as:

    Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. https://doi.org/10.48550/arXiv.1212.0402

    To download the dataset via the command line, please use:

    wget -q https://zenodo.org/record/7924745/files/ucf101_top5.tar.gz -O ucf101_top5.tar.gz
    tar xf ucf101_top5.tar.gz
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Organization logo

Google Landmarks Dataset v2

Explore at:
292 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.

Search
Clear search
Close search
Google apps
Main menu