67 datasets found

American Community Survey (ACS)
console.cloud.google.com
Updated Jul 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&inv=1&invt=Abyneg (2018). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs
Explore at:
Dataset updated
Jul 16, 2018
Dataset provided by
Googlehttp://google.com/
Description
The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
d
Google Address Data, Google Address API, Google location API, Google Map...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
APISCRAPY, Google Address Data, Google Address API, Google location API, Google Map API, Business Location Data- 100 M Google Address Data Available [Dataset]. https://datarade.ai/data-products/google-address-data-google-address-api-google-location-api-apiscrapy
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset authored and provided by
APISCRAPY
Area covered
Luxembourg, Andorra, Liechtenstein, Moldova (Republic of), China, Åland Islands, Spain, United Kingdom, Estonia, Monaco
Description
Welcome to Apiscrapy, your ultimate destination for comprehensive location-based intelligence. As an AI-driven web scraping and automation platform, Apiscrapy excels in converting raw web data into polished, ready-to-use data APIs. With a unique capability to collect Google Address Data, Google Address API, Google Location API, Google Map, and Google Location Data with 100% accuracy, we redefine possibilities in location intelligence.

Key Features:

Unparalleled Data Variety: Apiscrapy offers a diverse range of address-related datasets, including Google Address Data and Google Location Data. Whether you seek B2B address data or detailed insights for various industries, we cover it all.

Integration with Google Address API: Seamlessly integrate our datasets with the powerful Google Address API. This collaboration ensures not just accessibility but a robust combination that amplifies the precision of your location-based insights.

Business Location Precision: Experience a new level of precision in business decision-making with our address data. Apiscrapy delivers accurate and up-to-date business locations, enhancing your strategic planning and expansion efforts.

Tailored B2B Marketing: Customize your B2B marketing strategies with precision using our detailed B2B address data. Target specific geographic areas, refine your approach, and maximize the impact of your marketing efforts.

Use Cases:

Location-Based Services: Companies use Google Address Data to provide location-based services such as navigation, local search, and location-aware advertisements.

Logistics and Transportation: Logistics companies utilize Google Address Data for route optimization, fleet management, and delivery tracking.

E-commerce: Online retailers integrate address autocomplete features powered by Google Address Data to simplify the checkout process and ensure accurate delivery addresses.

Real Estate: Real estate agents and property websites leverage Google Address Data to provide accurate property listings, neighborhood information, and proximity to amenities.

Urban Planning and Development: City planners and developers utilize Google Address Data to analyze population density, traffic patterns, and infrastructure needs for urban planning and development projects.

Market Analysis: Businesses use Google Address Data for market analysis, including identifying target demographics, analyzing competitor locations, and selecting optimal locations for new stores or offices.

Geographic Information Systems (GIS): GIS professionals use Google Address Data as a foundational layer for mapping and spatial analysis in fields such as environmental science, public health, and natural resource management.

Government Services: Government agencies utilize Google Address Data for census enumeration, voter registration, tax assessment, and planning public infrastructure projects.

Tourism and Hospitality: Travel agencies, hotels, and tourism websites incorporate Google Address Data to provide location-based recommendations, itinerary planning, and booking services for travelers.

Discover the difference with Apiscrapy – where accuracy meets diversity in address-related datasets, including Google Address Data, Google Address API, Google Location API, and more. Redefine your approach to location intelligence and make data-driven decisions with confidence. Revolutionize your business strategies today!
United States Census
kaggle.com
zip
Updated Apr 17, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Census Bureau (2018). United States Census [Dataset]. https://www.kaggle.com/census/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 17, 2018
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
US Census Bureau
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census

Content

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.

The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa

https://cloud.google.com/bigquery/public-data/us-census

Dataset Source: United States Census Bureau

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by Steve Richey from Unsplash.

Inspiration

What are the ten most populous zip codes in the US in the 2010 census?

What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?

https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png"> https://cloud.google.com/bigquery/images/census-population-map.png
census-bureau-usa
kaggle.com
zip
Updated May 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset authored and provided by
Google BigQuery
Area covered
United States
Description
Context :

The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

Dataset source

United States Census Bureau

Sample Query

SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

Terms of use

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data
f
Datasheet1_Mobility data shows effectiveness of control strategies for...
frontiersin.figshare.com
pdf
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small (2024). Datasheet1_Mobility data shows effectiveness of control strategies for COVID-19 in remote, sparse and diffuse populations.pdf [Dataset]. http://doi.org/10.3389/fepid.2023.1201810.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fepid.2023.1201810.s001
Dataset updated
Mar 7, 2024
Dataset provided by
Frontiers
Authors
Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.
d
Google Map Data, Google Map Data Scraper, Business location Data- Scrape All...
datarade.ai
Updated May 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
APISCRAPY (2022). Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms [Dataset]. https://datarade.ai/data-products/google-map-data-google-map-data-scraper-business-location-d-apiscrapy
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
May 23, 2022
Dataset authored and provided by
APISCRAPY
Area covered
Albania, Serbia, Gibraltar, Svalbard and Jan Mayen, Denmark, Switzerland, Bulgaria, Japan, United States of America, Macedonia (the former Yugoslav Republic of)
Description
APISCRAPY, your premier provider of Map Data solutions. Map Data encompasses various information related to geographic locations, including Google Map Data, Location Data, Address Data, and Business Location Data. Our advanced Google Map Data Scraper sets us apart by extracting comprehensive and accurate data from Google Maps and other platforms.

What sets APISCRAPY's Map Data apart are its key benefits:

Accuracy: Our scraping technology ensures the highest level of accuracy, providing reliable data for informed decision-making. We employ advanced algorithms to filter out irrelevant or outdated information, ensuring that you receive only the most relevant and up-to-date data.

Accessibility: With our data readily available through APIs, integration into existing systems is seamless, saving time and resources. Our APIs are easy to use and well-documented, allowing for quick implementation into your workflows. Whether you're a developer building a custom application or a business analyst conducting market research, our APIs provide the flexibility and accessibility you need.

Customization: We understand that every business has unique needs and requirements. That's why we offer tailored solutions to meet specific business needs. Whether you need data for a one-time project or ongoing monitoring, we can customize our services to suit your needs. Our team of experts is always available to provide support and guidance, ensuring that you get the most out of our Map Data solutions.

Our Map Data solutions cater to various use cases:

B2B Marketing: Gain insights into customer demographics and behavior for targeted advertising and personalized messaging. Identify potential customers based on their geographic location, interests, and purchasing behavior.

Logistics Optimization: Utilize Location Data to optimize delivery routes and improve operational efficiency. Identify the most efficient routes based on factors such as traffic patterns, weather conditions, and delivery deadlines.

Real Estate Development: Identify prime locations for new ventures using Business Location Data for market analysis. Analyze factors such as population density, income levels, and competition to identify opportunities for growth and expansion.

Geospatial Analysis: Leverage Map Data for spatial analysis, urban planning, and environmental monitoring. Identify trends and patterns in geographic data to inform decision-making in areas such as land use planning, resource management, and disaster response.

Retail Expansion: Determine optimal locations for new stores or franchises using Location Data and Address Data. Analyze factors such as foot traffic, proximity to competitors, and demographic characteristics to identify locations with the highest potential for success.

Competitive Analysis: Analyze competitors' business locations and market presence for strategic planning. Identify areas of opportunity and potential threats to your business by analyzing competitors' geographic footprint, market share, and customer demographics.

Experience the power of APISCRAPY's Map Data solutions today and unlock new opportunities for your business. With our accurate and accessible data, you can make informed decisions, drive growth, and stay ahead of the competition.

[ Related tags: Map Data, Google Map Data, Google Map Data Scraper, B2B Marketing, Location Data, Map Data, Google Data, Location Data, Address Data, Business location data, map scraping data, Google map data extraction, Transport and Logistic Data, Mobile Location Data, Mobility Data, and IP Address Data, business listings APIs, map data, map datasets, map APIs, poi dataset, GPS, Location Intelligence, Retail Site Selection, Sentiment Analysis, Marketing Data Enrichment, Point of Interest (POI) Mapping]
G
GPWv411: Population Density (Gridded Population of the World Version 4.11)
developers.google.com
Updated Aug 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA SEDAC at the Center for International Earth Science Information Network (2019). GPWv411: Population Density (Gridded Population of the World Version 4.11) [Dataset]. http://doi.org/10.7927/H49C6VHW
Explore at:
Unique identifier
https://doi.org/10.7927/H49C6VHW
Dataset updated
Aug 11, 2019
Dataset provided by
NASA SEDAC at the Center for International Earth Science Information Network
Time period covered
Jan 1, 2000 - Jan 1, 2020
Area covered
Earth
Description
This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.
c
Colorado Springs Google Maps User Psychology Dataset
caseysseo.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casey Miller (2025). Colorado Springs Google Maps User Psychology Dataset [Dataset]. https://caseysseo.com/the-psychology-behind-google-maps-user-behavior-in-colorado-springs/
Explore at:
Dataset updated
Jul 1, 2025
Dataset provided by
Casey's SEO
Authors
Casey Miller
Time period covered
2025
Area covered

Variables measured
Decision-making speed, Cognitive load tolerance, Trust formation patterns, Visual attention sequences, Demographic psychology preferences
Description
Comprehensive dataset analyzing psychological patterns, cognitive triggers, and behavioral preferences of Google Maps users in Colorado Springs, including demographic psychology, seasonal patterns, and decision-making frameworks.
G
GPWv411: Basic Demographic Characteristics (Gridded Population of the World...
developers.google.com
Updated Aug 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA SEDAC at the Center for International Earth Science Information Network (2019). GPWv411: Basic Demographic Characteristics (Gridded Population of the World Version 4.11) [Dataset]. http://doi.org/10.7927/H46M34XX
Explore at:
Unique identifier
https://doi.org/10.7927/H46M34XX
Dataset updated
Aug 11, 2019
Dataset provided by
NASA SEDAC at the Center for International Earth Science Information Network
Time period covered
Jan 1, 2000 - Jan 1, 2020
Area covered
Earth
Description
This dataset contains population estimates, by age and sex, per 30 arc-second grid cell consistent with national censuses and population registers. There is one image for each modeled age and sex category based on the 2010 round of Census. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision …
s
Fostering cultures of open qualitative research: Dataset 2 – Interview...
orda.shef.ac.uk
xlsx
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hanchard; Itzel San Roman Pineda (2023). Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts [Dataset]. http://doi.org/10.15131/shef.data.23567223.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.23567223.v2
Dataset updated
Jun 28, 2023
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard; Itzel San Roman Pineda
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.

ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:

· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.

All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.

Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.

For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.

· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.

The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.

The project was undertaken by two staff:

Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset

Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset
A
‘Credit Risk Classification Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Credit Risk Classification Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-credit-risk-classification-dataset-a5f6/76e42b23/?iid=035-990&v=presentation
Explore at:
Dataset updated
Nov 13, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Credit Risk Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/praveengovi/credit-risk-classification-dataset on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Context

This is Customer Transaction and Demographic related data , It holds Risky and Not Risky customer for specific banking products

Content

Dataset is small in nature , It helps budding data scientist 👨‍🔬 👩‍🔬& Data Analyst to experiment Machine Learning and Statistical modelling concept

Data:

payment_data.csv:

payment_data.csv: customer’s card payment history. id: customer id OVD_t1: number of times overdue type 1 OVD_t2: number of times overdue type 2 OVD_t3: number of times overdue type 3 OVD_sum: total overdue days pay_normal: number of times normal payment prod_code: credit product code prod_limit: credit limit of product update_date: account update date new_balance: current balance of product highest_balance: highest balance in history report_date: date of recent payment

customer_data.csv:

customer’s demographic data and category attributes which have been encoded. Category features are fea_1, fea_3, fea_5, fea_6, fea_7, fea_9. label is 1, the customer is in high credit risk label is 0, the customer is in low credit risk

Acknowledgements

Thanks to Google Datasets search

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

This dataset help to find out weather customer is Credit Risky or Credit Worthy in Banking perspective

Q1 - What are the factors contributing to Credit Risky customer ? Q2 - Behaviour of Credit Worthy Customer ?

--- Original source retains full ownership of the source dataset ---
Z
Sentinel2 RGB chips over BENELUX with JRC GHSL Population Density 2015 for...
data.niaid.nih.gov
Updated May 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raúl Ramos-Pollan (2023). Sentinel2 RGB chips over BENELUX with JRC GHSL Population Density 2015 for Learning with Label Proportions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7939347
Explore at:
Dataset updated
May 18, 2023
Dataset provided by
Fabio A. González
Raúl Ramos-Pollan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Benelux
Description
Region of Interest (ROI) is comprised of the Belgium, the Netherlands and Luxembourg

We use the communes adminitrative division which is standardized across Europe by EUROSTAT at: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units This is roughly equivalent to the notion municipalities in most countries.

From the link above, communes definition are taken from COMM_RG_01M_2016_4326.shp and country borders are taken from NUTS_RG_01M_2021_3035.shp.

images: Sentinel2 RGB from 2020-01-01 to 2020-31-12 filtered out pixels with clouds acoording to QA60 band following the example given in GEE dataset info page at: see https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED

see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/sentinel2rgbmedian2020.py

labels: Global Human Settlement Layers, Population Grid 2015

labels range from 0 to 31, with the following meaning: label value original value in GEE dataset 0 0 1 1-10 2 11-20 3 21-30 ... 31 >=291 see https://developers.google.com/earth-engine/datasets/catalog/JRC_GHSL_P2016_POP_GPW_GLOBE_V1 see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/humanpop2015.py

_aschips.geojson the image chips geometries along with label proportions for easy visualization with QGIS, GeoPandas, etc.

_communes.geojson the communes geometries with their label prortions for easy visualization with QGIS, GeoPandas, etc.

splits.csv contains two splits of image chips in train, test, val - with geographical bands at 45° angles in nw-se direction - the same as above reorganized to that all chips within the same commune fall within the same split.

data/ a pickle file for each image chip containing a dict with - the 100x100 RGB sentinel 2 chip image - the 100x100 chip level lavels - the label proportions of the chip - the aggregated label proportions of the commune the chip belongs to
H
Replication Data for: Field Evidence of the Effects of Pro-sociality and...
dataverse.harvard.edu
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuel Dooley; John P Dickerson; Elissa Redmiles (2022). Replication Data for: Field Evidence of the Effects of Pro-sociality and Transparency on COVID-19 App Attractiveness [Dataset]. http://doi.org/10.7910/DVN/OT36PX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/OT36PX
Dataset updated
Feb 14, 2022
Dataset provided by
Harvard Dataverse
Authors
Samuel Dooley; John P Dickerson; Elissa Redmiles
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These data and associated R analysis file are associated with the paper: "Field Evidence of the Effects of Pro-sociality and Transparency on COVID-19 App Attractiveness" We ran 14 separate Google display ad campaigns from February 1 to 26. These were the only Google Display ads run for CovidDefense. Each campaign was targeted at people who reside in Louisiana via IP address. All campaigns used the same settings, ad destination, and ad image from the state of Louisiana's CovidDefense marketing materials. The 14 ads varied only in their text data in alignment with the 14 conditions summarized in this file (ads.csv). There are two primary datasets: one (data_demo.csv) which has all 7,010,271 impressions and demographic data, and another (data_geo.csv) with just the impressions that have associated geographic information. The former includes columns for Google-estimated demographics like Age and Gender, with many impressions having values of ``Unknown''. These two data tables for demographic and geographic impressions were represented by a row for each impression with columns for whether that impression resulted in a click; the age and gender or geography of the impression; as well as indicator variables for the presence or absence of ad information (appeals, privacy transparency -- broad privacy reassurance, non-technical control, and technical control -- and data transparency). An associated R file is included which includes functions to reproduce each model and associated statistics.
f
Travel time to cities and ports in the year 2015
figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Nelson (2023). Travel time to cities and ports in the year 2015 [Dataset]. http://doi.org/10.6084/m9.figshare.7638134.v4
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7638134.v4
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Andy Nelson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5

If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD

The following text is a summary of the information in the above Data Descriptor.

The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.

The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.

These maps represent a unique global representation of physical access to essential services offered by cities and ports.

The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).

travel_time_to_ports_x (x ranges from 1 to 5)

The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.

Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes

Data type Byte (16 bit Unsigned Integer)

No data value 65535

Flags None

Spatial resolution 30 arc seconds

Spatial extent

Upper left -180, 85

Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)

Temporal resolution 2015

Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.

Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.

The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.

Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points

The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).

Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.

Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.

This process and results are included in the validation zip file.

Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.

The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.

The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.

The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.
n
A dataset of 5 million city trees from 63 US cities: species, location,...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Aug 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz (2022). A dataset of 5 million city trees from 63 US cities: species, location, nativity status, health, and more. [Dataset]. http://doi.org/10.5061/dryad.2jm63xsrf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2jm63xsrf
Dataset updated
Aug 31, 2022
Dataset provided by
Stanford University
Harvard University
The Biota of North America Program (BONAP)
Worcester Polytechnic Institute
Cornell University
Authors
Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United States
Description
Sustainable cities depend on urban forests. City trees -- a pillar of urban forests -- improve our health, clean the air, store CO2, and cool local temperatures. Comparatively less is known about urban forests as ecosystems, particularly their spatial composition, nativity statuses, biodiversity, and tree health. Here, we assembled and standardized a new dataset of N=5,660,237 trees from 63 of the largest US cities. The data comes from tree inventories conducted at the level of cities and/or neighborhoods. Each data sheet includes detailed information on tree location, species, nativity status (whether a tree species is naturally occurring or introduced), health, size, whether it is in a park or urban area, and more (comprising 28 standardized columns per datasheet). This dataset could be analyzed in combination with citizen-science datasets on bird, insect, or plant biodiversity; social and demographic data; or data on the physical environment. Urban forests offer a rare opportunity to intentionally design biodiverse, heterogenous, rich ecosystems. Methods See eLife manuscript for full details. Below, we provide a summary of how the dataset was collected and processed.

Data Acquisition We limited our search to the 150 largest cities in the USA (by census population). To acquire raw data on street tree communities, we used a search protocol on both Google and Google Datasets Search (https://datasetsearch.research.google.com/). We first searched the city name plus each of the following: street trees, city trees, tree inventory, urban forest, and urban canopy (all combinations totaled 20 searches per city, 10 each in Google and Google Datasets Search). We then read the first page of google results and the top 20 results from Google Datasets Search. If the same named city in the wrong state appeared in the results, we redid the 20 searches adding the state name. If no data were found, we contacted a relevant state official via email or phone with an inquiry about their street tree inventory. Datasheets were received and transformed to .csv format (if they were not already in that format). We received data on street trees from 64 cities. One city, El Paso, had data only in summary format and was therefore excluded from analyses.

Data Cleaning All code used is in the zipped folder Data S5 in the eLife publication. Before cleaning the data, we ensured that all reported trees for each city were located within the greater metropolitan area of the city (for certain inventories, many suburbs were reported - some within the greater metropolitan area, others not). First, we renamed all columns in the received .csv sheets, referring to the metadata and according to our standardized definitions (Table S4). To harmonize tree health and condition data across different cities, we inspected metadata from the tree inventories and converted all numeric scores to a descriptive scale including “excellent,” “good”, “fair”, “poor”, “dead”, and “dead/dying”. Some cities included only three points on this scale (e.g., “good”, “poor”, “dead/dying”) while others included five (e.g., “excellent,” “good”, “fair”, “poor”, “dead”). Second, we used pandas in Python (W. McKinney & Others, 2011) to correct typos, non-ASCII characters, variable spellings, date format, units used (we converted all units to metric), address issues, and common name format. In some cases, units were not specified for tree diameter at breast height (DBH) and tree height; we determined the units based on typical sizes for trees of a particular species. Wherever diameter was reported, we assumed it was DBH. We standardized health and condition data across cities, preserving the highest granularity available for each city. For our analysis, we converted this variable to a binary (see section Condition and Health). We created a column called “location_type” to label whether a given tree was growing in the built environment or in green space. All of the changes we made, and decision points, are preserved in Data S9. Third, we checked the scientific names reported using gnr_resolve in the R library taxize (Chamberlain & Szöcs, 2013), with the option Best_match_only set to TRUE (Data S9). Through an iterative process, we manually checked the results and corrected typos in the scientific names until all names were either a perfect match (n=1771 species) or partial match with threshold greater than 0.75 (n=453 species). BGS manually reviewed all partial matches to ensure that they were the correct species name, and then we programmatically corrected these partial matches (for example, Magnolia grandifolia-- which is not a species name of a known tree-- was corrected to Magnolia grandiflora, and Pheonix canariensus was corrected to its proper spelling of Phoenix canariensis). Because many of these tree inventories were crowd-sourced or generated in part through citizen science, such typos and misspellings are to be expected. Some tree inventories reported species by common names only. Therefore, our fourth step in data cleaning was to convert common names to scientific names. We generated a lookup table by summarizing all pairings of common and scientific names in the inventories for which both were reported. We manually reviewed the common to scientific name pairings, confirming that all were correct. Then we programmatically assigned scientific names to all common names (Data S9). Fifth, we assigned native status to each tree through reference to the Biota of North America Project (Kartesz, 2018), which has collected data on all native and non-native species occurrences throughout the US states. Specifically, we determined whether each tree species in a given city was native to that state, not native to that state, or that we did not have enough information to determine nativity (for cases where only the genus was known). Sixth, some cities reported only the street address but not latitude and longitude. For these cities, we used the OpenCageGeocoder (https://opencagedata.com/) to convert addresses to latitude and longitude coordinates (Data S9). OpenCageGeocoder leverages open data and is used by many academic institutions (see https://opencagedata.com/solutions/academia). Seventh, we trimmed each city dataset to include only the standardized columns we identified in Table S4. After each stage of data cleaning, we performed manual spot checking to identify any issues.
Student oriented subset of the Open University Learning Analytics dataset
zenodo.org
data.niaid.nih.gov
csv
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gennaro Vessio; Gennaro Vessio (2021). Student oriented subset of the Open University Learning Analytics dataset [Dataset]. http://doi.org/10.5281/zenodo.4264397
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4264397
Dataset updated
Sep 30, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gennaro Vessio; Gennaro Vessio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Open University (OU) dataset is an open database containing student demographic and click-stream interaction with the virtual learning platform. The available data are structured in different CSV files. You can find more information about the original dataset at the following link: https://analyse.kmi.open.ac.uk/open_dataset.

We extracted a subset of the original dataset that focuses on student information. 25,819 records were collected referring to a specific student, course and semester. Each record is described by the following 20 attributes: code_module, code_presentation, gender, highest_education, imd_band, age_band, num_of_prev_attempts, studies_credits, disability, resource, homepage, forum, glossary, outcontent, subpage, url, outcollaborate, quiz, AvgScore, count.

Two target classes were considered, namely Fail and Pass, combining the original four classes (Fail and Withdrawn and Pass and Distinction, respectively). The final_result attribute contains the target values.

All features have been converted to numbers for automatic processing.

Below is the mapping used to convert categorical values to numeric:

code_module: 'AAA'=0, 'BBB'=1, 'CCC'=2, 'DDD'=3, 'EEE'=4, 'FFF'=5, 'GGG'=6

code_presentation: '2013B'=0, '2013J'=1, '2014B'=2, '2014J'=3

gender: 'F'=0, 'M'=1

highest_education: 'No_Formal_quals'=0, 'Post_Graduate_Qualification'=1, 'HE_Qualification'=2, 'Lower_Than_A_Level'=3, 'A_level_or_Equivalent'=4

IMBD_band: 'unknown'=0, 'between_0_and_10_percent'=1, 'between_10_and_20_percent'=2, 'between_20_and_30_percent'=3, 'between_30_and_40_percent'=4, 'between_40_and_50_percent'=5, 'between_50_and_60_percent'=6, 'between_60_and_70_percent'=7, 'between_70_and_80_percent'=8, 'between_80_and_90_percent'=9, 'between_90_and_100_percent'=10

age_band: 'between_0_and_35'=0, 'between_35_and_55'=1, 'higher_than_55'=2

disability: 'N'=0, 'Y'=1

student's outcome: 'Fail'=0, 'Pass'=1

For more detailed information, please refer to:

Casalino G., Castellano G., Vessio G. (2021) Exploiting Time in Adaptive Learning from Educational Data. In: Agrati L.S. et al. (eds) Bridges and Mediation in Higher Distance Education. HELMeTO 2020. Communications in Computer and Information Science, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-67435-9_1
n
COVID-19 Pandemic: A Dataset from Khyber Pakhtunkhwa, Pakistan
narcis.nl
data.mendeley.com
Updated Aug 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qureshi, W (via Mendeley Data) (2020). COVID-19 Pandemic: A Dataset from Khyber Pakhtunkhwa, Pakistan [Dataset]. http://doi.org/10.17632/nzcrfhgfh4.1
Explore at:
Unique identifier
https://doi.org/10.17632/nzcrfhgfh4.1
Dataset updated
Aug 30, 2020
Dataset provided by
Data Archiving and Networked Services (DANS)
Authors
Qureshi, W (via Mendeley Data)
Area covered
Pakistan, Khyber Pakhtunkhwa
Description
This dataset demonstrates the fear of Coronavirus (COVID-19) among the people of Khyber Pakhtunkhwa (Pakistan), their preventive behaviour, mental health condition, and level of anxiety during the pandemic. To gauge these constructs, a questionnaire was developed with the help of various scales – Fear of COVID-19 Scale (FCV-19S), Positive Mental Health Scale (PMHS), and General Anxiety Disorder Scale (GAD). At the time of data collection, the COVID-19 cases were emerging rapidly in the country due to which the KPK province was also facing lock-down and other mobility restrictions to limit the spread of viral infection. Keeping in view the prevalent emergency conditions, the research tool was developed in Google form and disseminated online for the collection of data. The informed consent of the respondents was obtained electronically, and they participated voluntarily in this survey research. Social media apps like Facebook, WhatsApp, LinkedIn, and personal contacts were also used for speedy collection of data. All the questions in the questionnaire were mandatory and the respondents could not send their responses by skipping any of them, so there is no missing value in the dataset. A total of 501 responses were received out of which 208 were females. For the convenience of the participants, every question in the questionnaire was translated into the Urdu language. All the responses were automatically saved online into a .xlsx spreadsheet and later that data was converted to digitized form by developing a coding frame. There are two main sections in this dataset, first is about the socio-demographic information (gender, age, marital status, employment status, area of residence and education) of the participants and the second demonstrates the fear, mental health, preventive behaviour, and anxiety while in the second section, the responses were rated on Likert scale. This dataset could be beneficial to the researchers and policymakers as they can get further insight to develop better skills and practices from a rapidly evolving situation.
International Census Data
console.cloud.google.com
Updated Nov 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&hl=sl&inv=1&invt=Ab4Bdw (2019). International Census Data [Dataset]. https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data?hl=sl
Explore at:
Dataset updated
Nov 19, 2019
Dataset provided by
Googlehttp://google.com/
Description
The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates. Note: The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Facebook: distribution of global audiences 2024, by age and gender

statista.com
es.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

              Facebook connects the world

              Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
              as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

Additional resources for Kiva Crowdfunding
kaggle.com
zip
Updated Apr 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke (2018). Additional resources for Kiva Crowdfunding [Dataset]. https://www.kaggle.com/forums/f/26443/additional-resources-for-kiva-crowdfunding/t/54374/dataset-suggestion
Explore at:
zip(104671314 bytes)Available download formats
Dataset updated
Apr 12, 2018
Authors
Luke
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset contains the locations found in the Kiva datasets included in an administrative or geographical region. You can also find poverty data about this region. This facilitates answering some of the tough questions about a region's poverty.

Content

In the interest of preserving the original names and spelling for the locations/countries/regions all the data is in Excel format and has no preview (I think only the Kaggle recommended file types have preview - if anyone can show me how to do this for an xlsx file, it will be greatly appreciated)

The Tables datasets contain the most recent analysis of the MPI on countries and regions. These datasets are updated regularly. In unique regions_names_from_google_api you will find 3 levels of inclusion for every geocode provided in Kiva datasets. (village/town, administrative region, sub-national region - which can be administrative or geographical). These are the results from the Google API Geocoding process.

Files:

all_kiva_loans.csv

Dropped multiple columns, kept all the rows from loans.csv with names, tags, descriptions and got a csv file of 390MB instead of 2.13 GB. Basically is a simplified version of loans.csv (originally included in the analysis by beluga)

country_stats.csv

population source: https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)

population_below_poverty_line: Percentage

hdi: Human Development Index

life_expectancy: Life expectancy at birth

expected_years_of_schooling: Expected years of schooling

mean_years_of_schooling: Mean years of schooling

gni: Gross national income (GNI) per capita This dataset was originally created by beluga.

all_loan_theme_merged_with_geo_mpi_regions.xlsx

This is the loan_themes_by_region left joined with Tables_5.3_Contribution_of_Deprivations. (all the original entries from loan_themes and only the entries that match from Tables_5; for the regions that lack MPI data, you will find Nan)

These are the columns in the database:

Partner ID

Field Partner

Name

sector

Loan Theme ID

Loan Theme Type

Country

forkiva

number

amount

geo

rural_pct

City

Administrative region

Sub-national region

ISO

World region

Population Share of the Region (%)

region MPI

Education (%)

Health (%)

Living standards (%)

Schooling (%)

Child school attendance (%)

Child Mortality (%)

Nutrition (%)

Electricity (%)

Improved sanitation (%)

Drinking water (%)

Floor (%)

Cooking fuel (%)

Asset ownership (%)

mpi_on_regions.xlsx

Matched the loans in loan_themes_by_region with the regions that have info regarding MPI. This dataset brings together the amount invested in a region and the biggest problems the said region has to deal with. It is a join between the loan_themes_by_region provided by Kiva and Tables 5.3 Contribution_of_Deprivations.

It is a subset of the all_loan_theme_merged_with_geo_mpi_regions.xlsx, which contains only the entries that I could match with poverty decomposition data. It has the same columns.

Tables_5_SubNational_Decomposition_MPI_2017-18.xlsx

Multidimensional poverty index decomposition for over 1000 regions part of 79 countries.

Table 5.3: Contribution of deprivations to the MPI, by sub-national regions
This table shows which dimensions and indicators contribute most to a region's MPI, which is useful for understanding the major source(s) of deprivation in a sub-national region.

Source: http://ophi.org.uk/multidimensional-poverty-index/global-mpi-2016/

Tables_7_MPI_estimations_country_levels.xlsx

MPI decomposition for 120 countries.

Table 7 All Published MPI Results since 2010
The table presents an archive of all MPI estimations published over the past 5 years, together with MPI, H, A and censored headcount ratios. For comparisons over time please use Table 6, which is strictly harmonised. The full set of data tables for each year published (Column A), is found on the 'data tables' page under 'Archive'.

The data in this file is shown in interactive plots on Oxford Poverty and Human Development Initiative website. http://www.dataforall.org/dashboard/ophi/index.php/

unique_regions_from_kiva_loan_themes.xlsx

These are all the regions corresponding to the geocodes found in Kiva's loan_themes_by_region. There are 718 unique entries, that you can join with any database from Kiva that has either a coordinates or region column.
Columns:

geo: pair of Lat, Lon (from loan_themes_by_region)

City: name of the city (has the most NaN's)

Administrative region: first level of administrative inclusion for the city/location; (the equivalent of county for US)

Sub-national region: second level of administrative inclusion for the geo pair. (like state for US)

Country: name of the country

Acknowledgements

Thanks to Shane Lynn for the batch geocoding and to Joseph Deferio for reverse geocoding:

https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/

https://github.com/jdeferio/Reverse_Geocode

The MPI datasets you can find on the Oxford website (http://ophi.org.uk/) under Research.

"Citation: Alkire, S. and Kanagaratnam, U. (2018)

“Multidimensional Poverty Index Winter 2017-18: Brief methodological note and results.” Oxford Poverty and Human Development Initiative, University of Oxford, OPHI Methodological Notes 45."

Facebook

Twitter

Click to copy link

Link copied

Cite

https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&inv=1&invt=Abyneg (2018). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs

American Community Survey (ACS)

Explore at:

Dataset updated

Jul 16, 2018

Dataset provided by

Googlehttp://google.com/

Description

The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Clear search

Close search

Google apps

Main menu

American Community Survey (ACS)

Google Address Data, Google Address API, Google location API, Google Map...

United States Census

Context

Content

Acknowledgements

Inspiration

census-bureau-usa

Context :

Dataset source

Sample Query

Terms of use

Datasheet1_Mobility data shows effectiveness of control strategies for...

Google Map Data, Google Map Data Scraper, Business location Data- Scrape All...

GPWv411: Population Density (Gridded Population of the World Version 4.11)

Colorado Springs Google Maps User Psychology Dataset

GPWv411: Basic Demographic Characteristics (Gridded Population of the World...

Fostering cultures of open qualitative research: Dataset 2 – Interview...

‘Credit Risk Classification Dataset’ analyzed by Analyst-2

Context

Content

Data:

payment_data.csv:

customer_data.csv:

Acknowledgements

Inspiration

Sentinel2 RGB chips over BENELUX with JRC GHSL Population Density 2015 for...

Replication Data for: Field Evidence of the Effects of Pro-sociality and...

Travel time to cities and ports in the year 2015

A dataset of 5 million city trees from 63 US cities: species, location,...

Student oriented subset of the Open University Learning Analytics dataset

COVID-19 Pandemic: A Dataset from Khyber Pakhtunkhwa, Pakistan

International Census Data

Facebook: distribution of global audiences 2024, by age and gender

Additional resources for Kiva Crowdfunding

Context

Content

Acknowledgements

American Community Survey (ACS)