67 datasets found
  1. American Community Survey (ACS)

    • console.cloud.google.com
    Updated Jul 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&inv=1&invt=Abyneg (2018). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs
    Explore at:
    Dataset updated
    Jul 16, 2018
    Dataset provided by
    Googlehttp://google.com/
    Description

    The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. d

    Google Address Data, Google Address API, Google location API, Google Map...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APISCRAPY, Google Address Data, Google Address API, Google location API, Google Map API, Business Location Data- 100 M Google Address Data Available [Dataset]. https://datarade.ai/data-products/google-address-data-google-address-api-google-location-api-apiscrapy
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset authored and provided by
    APISCRAPY
    Area covered
    Luxembourg, Andorra, Liechtenstein, Moldova (Republic of), China, Åland Islands, Spain, United Kingdom, Estonia, Monaco
    Description

    Welcome to Apiscrapy, your ultimate destination for comprehensive location-based intelligence. As an AI-driven web scraping and automation platform, Apiscrapy excels in converting raw web data into polished, ready-to-use data APIs. With a unique capability to collect Google Address Data, Google Address API, Google Location API, Google Map, and Google Location Data with 100% accuracy, we redefine possibilities in location intelligence.

    Key Features:

    Unparalleled Data Variety: Apiscrapy offers a diverse range of address-related datasets, including Google Address Data and Google Location Data. Whether you seek B2B address data or detailed insights for various industries, we cover it all.

    Integration with Google Address API: Seamlessly integrate our datasets with the powerful Google Address API. This collaboration ensures not just accessibility but a robust combination that amplifies the precision of your location-based insights.

    Business Location Precision: Experience a new level of precision in business decision-making with our address data. Apiscrapy delivers accurate and up-to-date business locations, enhancing your strategic planning and expansion efforts.

    Tailored B2B Marketing: Customize your B2B marketing strategies with precision using our detailed B2B address data. Target specific geographic areas, refine your approach, and maximize the impact of your marketing efforts.

    Use Cases:

    Location-Based Services: Companies use Google Address Data to provide location-based services such as navigation, local search, and location-aware advertisements.

    Logistics and Transportation: Logistics companies utilize Google Address Data for route optimization, fleet management, and delivery tracking.

    E-commerce: Online retailers integrate address autocomplete features powered by Google Address Data to simplify the checkout process and ensure accurate delivery addresses.

    Real Estate: Real estate agents and property websites leverage Google Address Data to provide accurate property listings, neighborhood information, and proximity to amenities.

    Urban Planning and Development: City planners and developers utilize Google Address Data to analyze population density, traffic patterns, and infrastructure needs for urban planning and development projects.

    Market Analysis: Businesses use Google Address Data for market analysis, including identifying target demographics, analyzing competitor locations, and selecting optimal locations for new stores or offices.

    Geographic Information Systems (GIS): GIS professionals use Google Address Data as a foundational layer for mapping and spatial analysis in fields such as environmental science, public health, and natural resource management.

    Government Services: Government agencies utilize Google Address Data for census enumeration, voter registration, tax assessment, and planning public infrastructure projects.

    Tourism and Hospitality: Travel agencies, hotels, and tourism websites incorporate Google Address Data to provide location-based recommendations, itinerary planning, and booking services for travelers.

    Discover the difference with Apiscrapy – where accuracy meets diversity in address-related datasets, including Google Address Data, Google Address API, Google Location API, and more. Redefine your approach to location intelligence and make data-driven decisions with confidence. Revolutionize your business strategies today!

  3. United States Census

    • kaggle.com
    zip
    Updated Apr 17, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Census Bureau (2018). United States Census [Dataset]. https://www.kaggle.com/census/census-bureau-usa
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 17, 2018
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    US Census Bureau
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
    Source: https://en.wikipedia.org/wiki/United_States_Census

    Content

    The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.

    The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.

    Fork this kernel to get started.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa

    https://cloud.google.com/bigquery/public-data/us-census

    Dataset Source: United States Census Bureau

    Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by Steve Richey from Unsplash.

    Inspiration

    What are the ten most populous zip codes in the US in the 2010 census?

    What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?

    https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png"> https://cloud.google.com/bigquery/images/census-population-map.png

  4. census-bureau-usa

    • kaggle.com
    zip
    Updated May 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). census-bureau-usa [Dataset]. https://www.kaggle.com/datasets/bigquery/census-bureau-usa
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 18, 2020
    Dataset authored and provided by
    Google BigQuery
    Area covered
    United States
    Description

    Context :

    The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. Update frequency: Historic (none)

    Dataset source

    United States Census Bureau

    Sample Query

    SELECT zipcode, population FROM bigquery-public-data.census_bureau_usa.population_by_zip_2010 WHERE gender = '' ORDER BY population DESC LIMIT 10

    Terms of use

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/us-census-data

  5. f

    Datasheet1_Mobility data shows effectiveness of control strategies for...

    • frontiersin.figshare.com
    pdf
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small (2024). Datasheet1_Mobility data shows effectiveness of control strategies for COVID-19 in remote, sparse and diffuse populations.pdf [Dataset]. http://doi.org/10.3389/fepid.2023.1201810.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Frontiers
    Authors
    Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.

  6. d

    Google Map Data, Google Map Data Scraper, Business location Data- Scrape All...

    • datarade.ai
    Updated May 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APISCRAPY (2022). Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms [Dataset]. https://datarade.ai/data-products/google-map-data-google-map-data-scraper-business-location-d-apiscrapy
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    May 23, 2022
    Dataset authored and provided by
    APISCRAPY
    Area covered
    Albania, Serbia, Gibraltar, Svalbard and Jan Mayen, Denmark, Switzerland, Bulgaria, Japan, United States of America, Macedonia (the former Yugoslav Republic of)
    Description

    APISCRAPY, your premier provider of Map Data solutions. Map Data encompasses various information related to geographic locations, including Google Map Data, Location Data, Address Data, and Business Location Data. Our advanced Google Map Data Scraper sets us apart by extracting comprehensive and accurate data from Google Maps and other platforms.

    What sets APISCRAPY's Map Data apart are its key benefits:

    1. Accuracy: Our scraping technology ensures the highest level of accuracy, providing reliable data for informed decision-making. We employ advanced algorithms to filter out irrelevant or outdated information, ensuring that you receive only the most relevant and up-to-date data.

    2. Accessibility: With our data readily available through APIs, integration into existing systems is seamless, saving time and resources. Our APIs are easy to use and well-documented, allowing for quick implementation into your workflows. Whether you're a developer building a custom application or a business analyst conducting market research, our APIs provide the flexibility and accessibility you need.

    3. Customization: We understand that every business has unique needs and requirements. That's why we offer tailored solutions to meet specific business needs. Whether you need data for a one-time project or ongoing monitoring, we can customize our services to suit your needs. Our team of experts is always available to provide support and guidance, ensuring that you get the most out of our Map Data solutions.

    Our Map Data solutions cater to various use cases:

    1. B2B Marketing: Gain insights into customer demographics and behavior for targeted advertising and personalized messaging. Identify potential customers based on their geographic location, interests, and purchasing behavior.

    2. Logistics Optimization: Utilize Location Data to optimize delivery routes and improve operational efficiency. Identify the most efficient routes based on factors such as traffic patterns, weather conditions, and delivery deadlines.

    3. Real Estate Development: Identify prime locations for new ventures using Business Location Data for market analysis. Analyze factors such as population density, income levels, and competition to identify opportunities for growth and expansion.

    4. Geospatial Analysis: Leverage Map Data for spatial analysis, urban planning, and environmental monitoring. Identify trends and patterns in geographic data to inform decision-making in areas such as land use planning, resource management, and disaster response.

    5. Retail Expansion: Determine optimal locations for new stores or franchises using Location Data and Address Data. Analyze factors such as foot traffic, proximity to competitors, and demographic characteristics to identify locations with the highest potential for success.

    6. Competitive Analysis: Analyze competitors' business locations and market presence for strategic planning. Identify areas of opportunity and potential threats to your business by analyzing competitors' geographic footprint, market share, and customer demographics.

    Experience the power of APISCRAPY's Map Data solutions today and unlock new opportunities for your business. With our accurate and accessible data, you can make informed decisions, drive growth, and stay ahead of the competition.

    [ Related tags: Map Data, Google Map Data, Google Map Data Scraper, B2B Marketing, Location Data, Map Data, Google Data, Location Data, Address Data, Business location data, map scraping data, Google map data extraction, Transport and Logistic Data, Mobile Location Data, Mobility Data, and IP Address Data, business listings APIs, map data, map datasets, map APIs, poi dataset, GPS, Location Intelligence, Retail Site Selection, Sentiment Analysis, Marketing Data Enrichment, Point of Interest (POI) Mapping]

  7. G

    GPWv411: Population Density (Gridded Population of the World Version 4.11)

    • developers.google.com
    Updated Aug 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA SEDAC at the Center for International Earth Science Information Network (2019). GPWv411: Population Density (Gridded Population of the World Version 4.11) [Dataset]. http://doi.org/10.7927/H49C6VHW
    Explore at:
    Dataset updated
    Aug 11, 2019
    Dataset provided by
    NASA SEDAC at the Center for International Earth Science Information Network
    Time period covered
    Jan 1, 2000 - Jan 1, 2020
    Area covered
    Earth
    Description

    This dataset contains estimates of the number of persons per square kilometer consistent with national censuses and population registers. There is one image for each modeled year. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision 11 models the distribution of global human population for the years 2000, 2005, 2010, 2015, and 2020 on 30 arc-second (approximately 1 km) grid cells. Population is distributed to cells using proportional allocation of population from census and administrative units. Population input data are collected at the most detailed spatial resolution available from the results of the 2010 round of censuses, which occurred between 2005 and 2014. The input data are extrapolated to produce population estimates for each modeled year.

  8. c

    Colorado Springs Google Maps User Psychology Dataset

    • caseysseo.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Miller (2025). Colorado Springs Google Maps User Psychology Dataset [Dataset]. https://caseysseo.com/the-psychology-behind-google-maps-user-behavior-in-colorado-springs/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Casey's SEO
    Authors
    Casey Miller
    Time period covered
    2025
    Area covered
    Variables measured
    Decision-making speed, Cognitive load tolerance, Trust formation patterns, Visual attention sequences, Demographic psychology preferences
    Description

    Comprehensive dataset analyzing psychological patterns, cognitive triggers, and behavioral preferences of Google Maps users in Colorado Springs, including demographic psychology, seasonal patterns, and decision-making frameworks.

  9. G

    GPWv411: Basic Demographic Characteristics (Gridded Population of the World...

    • developers.google.com
    Updated Aug 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA SEDAC at the Center for International Earth Science Information Network (2019). GPWv411: Basic Demographic Characteristics (Gridded Population of the World Version 4.11) [Dataset]. http://doi.org/10.7927/H46M34XX
    Explore at:
    Dataset updated
    Aug 11, 2019
    Dataset provided by
    NASA SEDAC at the Center for International Earth Science Information Network
    Time period covered
    Jan 1, 2000 - Jan 1, 2020
    Area covered
    Earth
    Description

    This dataset contains population estimates, by age and sex, per 30 arc-second grid cell consistent with national censuses and population registers. There is one image for each modeled age and sex category based on the 2010 round of Census. General Documentation The Gridded Population of World Version 4 (GPWv4), Revision …

  10. s

    Fostering cultures of open qualitative research: Dataset 2 – Interview...

    • orda.shef.ac.uk
    xlsx
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Hanchard; Itzel San Roman Pineda (2023). Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts [Dataset]. http://doi.org/10.15131/shef.data.23567223.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    The University of Sheffield
    Authors
    Matthew Hanchard; Itzel San Roman Pineda
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

    · Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

    The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

    The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.

    ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:

    · 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.

    All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.

    Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.

    For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.

    · 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.

    The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.

    The project was undertaken by two staff:

    Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset

    Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset

  11. A

    ‘Credit Risk Classification Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Credit Risk Classification Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-credit-risk-classification-dataset-a5f6/76e42b23/?iid=035-990&v=presentation
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Credit Risk Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/praveengovi/credit-risk-classification-dataset on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    This is Customer Transaction and Demographic related data , It holds Risky and Not Risky customer for specific banking products

    Content

    Dataset is small in nature , It helps budding data scientist 👨‍🔬 👩‍🔬& Data Analyst to experiment Machine Learning and Statistical modelling concept

    Data:

    payment_data.csv:

    payment_data.csv: customer’s card payment history. id: customer id OVD_t1: number of times overdue type 1 OVD_t2: number of times overdue type 2 OVD_t3: number of times overdue type 3 OVD_sum: total overdue days pay_normal: number of times normal payment prod_code: credit product code prod_limit: credit limit of product update_date: account update date new_balance: current balance of product highest_balance: highest balance in history report_date: date of recent payment

    customer_data.csv:

    customer’s demographic data and category attributes which have been encoded. Category features are fea_1, fea_3, fea_5, fea_6, fea_7, fea_9. label is 1, the customer is in high credit risk label is 0, the customer is in low credit risk

    Acknowledgements

    Thanks to Google Datasets search

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    This dataset help to find out weather customer is Credit Risky or Credit Worthy in Banking perspective

    Q1 - What are the factors contributing to Credit Risky customer ? Q2 - Behaviour of Credit Worthy Customer ?

    --- Original source retains full ownership of the source dataset ---

  12. Z

    Sentinel2 RGB chips over BENELUX with JRC GHSL Population Density 2015 for...

    • data.niaid.nih.gov
    Updated May 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raúl Ramos-Pollan (2023). Sentinel2 RGB chips over BENELUX with JRC GHSL Population Density 2015 for Learning with Label Proportions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7939347
    Explore at:
    Dataset updated
    May 18, 2023
    Dataset provided by
    Fabio A. González
    Raúl Ramos-Pollan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Benelux
    Description

    Region of Interest (ROI) is comprised of the Belgium, the Netherlands and Luxembourg

    We use the communes adminitrative division which is standardized across Europe by EUROSTAT at: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units This is roughly equivalent to the notion municipalities in most countries.

    From the link above, communes definition are taken from COMM_RG_01M_2016_4326.shp and country borders are taken from NUTS_RG_01M_2021_3035.shp.

    images: Sentinel2 RGB from 2020-01-01 to 2020-31-12 filtered out pixels with clouds acoording to QA60 band following the example given in GEE dataset info page at: see https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED

      see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/sentinel2rgbmedian2020.py
    

    labels: Global Human Settlement Layers, Population Grid 2015

      labels range from 0 to 31, with the following meaning:
        label value   original value in GEE dataset
        0        0
        1        1-10
        2        11-20
        3        21-30
        ...
        31       >=291 
    
    
      see https://developers.google.com/earth-engine/datasets/catalog/JRC_GHSL_P2016_POP_GPW_GLOBE_V1
    
    
      see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/humanpop2015.py
    

    _aschips.geojson the image chips geometries along with label proportions for easy visualization with QGIS, GeoPandas, etc.

    _communes.geojson the communes geometries with their label prortions for easy visualization with QGIS, GeoPandas, etc.

    splits.csv contains two splits of image chips in train, test, val - with geographical bands at 45° angles in nw-se direction - the same as above reorganized to that all chips within the same commune fall within the same split.

    data/ a pickle file for each image chip containing a dict with - the 100x100 RGB sentinel 2 chip image - the 100x100 chip level lavels - the label proportions of the chip - the aggregated label proportions of the commune the chip belongs to

  13. H

    Replication Data for: Field Evidence of the Effects of Pro-sociality and...

    • dataverse.harvard.edu
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Dooley; John P Dickerson; Elissa Redmiles (2022). Replication Data for: Field Evidence of the Effects of Pro-sociality and Transparency on COVID-19 App Attractiveness [Dataset]. http://doi.org/10.7910/DVN/OT36PX
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Samuel Dooley; John P Dickerson; Elissa Redmiles
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These data and associated R analysis file are associated with the paper: "Field Evidence of the Effects of Pro-sociality and Transparency on COVID-19 App Attractiveness" We ran 14 separate Google display ad campaigns from February 1 to 26. These were the only Google Display ads run for CovidDefense. Each campaign was targeted at people who reside in Louisiana via IP address. All campaigns used the same settings, ad destination, and ad image from the state of Louisiana's CovidDefense marketing materials. The 14 ads varied only in their text data in alignment with the 14 conditions summarized in this file (ads.csv). There are two primary datasets: one (data_demo.csv) which has all 7,010,271 impressions and demographic data, and another (data_geo.csv) with just the impressions that have associated geographic information. The former includes columns for Google-estimated demographics like Age and Gender, with many impressions having values of ``Unknown''. These two data tables for demographic and geographic impressions were represented by a row for each impression with columns for whether that impression resulted in a click; the age and gender or geography of the impression; as well as indicator variables for the presence or absence of ad information (appeals, privacy transparency -- broad privacy reassurance, non-technical control, and technical control -- and data transparency). An associated R file is included which includes functions to reproduce each model and associated statistics.

  14. f

    Travel time to cities and ports in the year 2015

    • figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Nelson (2023). Travel time to cities and ports in the year 2015 [Dataset]. http://doi.org/10.6084/m9.figshare.7638134.v4
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Andy Nelson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5

    If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD

    The following text is a summary of the information in the above Data Descriptor.

    The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.

    The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.

    These maps represent a unique global representation of physical access to essential services offered by cities and ports.

    The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).

    travel_time_to_ports_x (x ranges from 1 to 5)

    The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.

    Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes

    Data type Byte (16 bit Unsigned Integer)

    No data value 65535

    Flags None

    Spatial resolution 30 arc seconds

    Spatial extent

    Upper left -180, 85

    Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)

    Temporal resolution 2015

    Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.

    Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.

    The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.

    Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points

    The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).

    Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.

    Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.

    This process and results are included in the validation zip file.

    Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.

    The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.

    The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.

    The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.

  15. n

    A dataset of 5 million city trees from 63 US cities: species, location,...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Aug 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz (2022). A dataset of 5 million city trees from 63 US cities: species, location, nativity status, health, and more. [Dataset]. http://doi.org/10.5061/dryad.2jm63xsrf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 31, 2022
    Dataset provided by
    Stanford University
    Harvard University
    The Biota of North America Program (BONAP)
    Worcester Polytechnic Institute
    Cornell University
    Authors
    Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    United States
    Description

    Sustainable cities depend on urban forests. City trees -- a pillar of urban forests -- improve our health, clean the air, store CO2, and cool local temperatures. Comparatively less is known about urban forests as ecosystems, particularly their spatial composition, nativity statuses, biodiversity, and tree health. Here, we assembled and standardized a new dataset of N=5,660,237 trees from 63 of the largest US cities. The data comes from tree inventories conducted at the level of cities and/or neighborhoods. Each data sheet includes detailed information on tree location, species, nativity status (whether a tree species is naturally occurring or introduced), health, size, whether it is in a park or urban area, and more (comprising 28 standardized columns per datasheet). This dataset could be analyzed in combination with citizen-science datasets on bird, insect, or plant biodiversity; social and demographic data; or data on the physical environment. Urban forests offer a rare opportunity to intentionally design biodiverse, heterogenous, rich ecosystems. Methods See eLife manuscript for full details. Below, we provide a summary of how the dataset was collected and processed.

    Data Acquisition We limited our search to the 150 largest cities in the USA (by census population). To acquire raw data on street tree communities, we used a search protocol on both Google and Google Datasets Search (https://datasetsearch.research.google.com/). We first searched the city name plus each of the following: street trees, city trees, tree inventory, urban forest, and urban canopy (all combinations totaled 20 searches per city, 10 each in Google and Google Datasets Search). We then read the first page of google results and the top 20 results from Google Datasets Search. If the same named city in the wrong state appeared in the results, we redid the 20 searches adding the state name. If no data were found, we contacted a relevant state official via email or phone with an inquiry about their street tree inventory. Datasheets were received and transformed to .csv format (if they were not already in that format). We received data on street trees from 64 cities. One city, El Paso, had data only in summary format and was therefore excluded from analyses.

    Data Cleaning All code used is in the zipped folder Data S5 in the eLife publication. Before cleaning the data, we ensured that all reported trees for each city were located within the greater metropolitan area of the city (for certain inventories, many suburbs were reported - some within the greater metropolitan area, others not). First, we renamed all columns in the received .csv sheets, referring to the metadata and according to our standardized definitions (Table S4). To harmonize tree health and condition data across different cities, we inspected metadata from the tree inventories and converted all numeric scores to a descriptive scale including “excellent,” “good”, “fair”, “poor”, “dead”, and “dead/dying”. Some cities included only three points on this scale (e.g., “good”, “poor”, “dead/dying”) while others included five (e.g., “excellent,” “good”, “fair”, “poor”, “dead”). Second, we used pandas in Python (W. McKinney & Others, 2011) to correct typos, non-ASCII characters, variable spellings, date format, units used (we converted all units to metric), address issues, and common name format. In some cases, units were not specified for tree diameter at breast height (DBH) and tree height; we determined the units based on typical sizes for trees of a particular species. Wherever diameter was reported, we assumed it was DBH. We standardized health and condition data across cities, preserving the highest granularity available for each city. For our analysis, we converted this variable to a binary (see section Condition and Health). We created a column called “location_type” to label whether a given tree was growing in the built environment or in green space. All of the changes we made, and decision points, are preserved in Data S9. Third, we checked the scientific names reported using gnr_resolve in the R library taxize (Chamberlain & Szöcs, 2013), with the option Best_match_only set to TRUE (Data S9). Through an iterative process, we manually checked the results and corrected typos in the scientific names until all names were either a perfect match (n=1771 species) or partial match with threshold greater than 0.75 (n=453 species). BGS manually reviewed all partial matches to ensure that they were the correct species name, and then we programmatically corrected these partial matches (for example, Magnolia grandifolia-- which is not a species name of a known tree-- was corrected to Magnolia grandiflora, and Pheonix canariensus was corrected to its proper spelling of Phoenix canariensis). Because many of these tree inventories were crowd-sourced or generated in part through citizen science, such typos and misspellings are to be expected. Some tree inventories reported species by common names only. Therefore, our fourth step in data cleaning was to convert common names to scientific names. We generated a lookup table by summarizing all pairings of common and scientific names in the inventories for which both were reported. We manually reviewed the common to scientific name pairings, confirming that all were correct. Then we programmatically assigned scientific names to all common names (Data S9). Fifth, we assigned native status to each tree through reference to the Biota of North America Project (Kartesz, 2018), which has collected data on all native and non-native species occurrences throughout the US states. Specifically, we determined whether each tree species in a given city was native to that state, not native to that state, or that we did not have enough information to determine nativity (for cases where only the genus was known). Sixth, some cities reported only the street address but not latitude and longitude. For these cities, we used the OpenCageGeocoder (https://opencagedata.com/) to convert addresses to latitude and longitude coordinates (Data S9). OpenCageGeocoder leverages open data and is used by many academic institutions (see https://opencagedata.com/solutions/academia). Seventh, we trimmed each city dataset to include only the standardized columns we identified in Table S4. After each stage of data cleaning, we performed manual spot checking to identify any issues.

  16. Student oriented subset of the Open University Learning Analytics dataset

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gennaro Vessio; Gennaro Vessio (2021). Student oriented subset of the Open University Learning Analytics dataset [Dataset]. http://doi.org/10.5281/zenodo.4264397
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 30, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gennaro Vessio; Gennaro Vessio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Open University (OU) dataset is an open database containing student demographic and click-stream interaction with the virtual learning platform. The available data are structured in different CSV files. You can find more information about the original dataset at the following link: https://analyse.kmi.open.ac.uk/open_dataset.

    We extracted a subset of the original dataset that focuses on student information. 25,819 records were collected referring to a specific student, course and semester. Each record is described by the following 20 attributes: code_module, code_presentation, gender, highest_education, imd_band, age_band, num_of_prev_attempts, studies_credits, disability, resource, homepage, forum, glossary, outcontent, subpage, url, outcollaborate, quiz, AvgScore, count.

    Two target classes were considered, namely Fail and Pass, combining the original four classes (Fail and Withdrawn and Pass and Distinction, respectively). The final_result attribute contains the target values.

    All features have been converted to numbers for automatic processing.

    Below is the mapping used to convert categorical values to numeric:

    • code_module: 'AAA'=0, 'BBB'=1, 'CCC'=2, 'DDD'=3, 'EEE'=4, 'FFF'=5, 'GGG'=6
    • code_presentation: '2013B'=0, '2013J'=1, '2014B'=2, '2014J'=3
    • gender: 'F'=0, 'M'=1
    • highest_education: 'No_Formal_quals'=0, 'Post_Graduate_Qualification'=1, 'HE_Qualification'=2, 'Lower_Than_A_Level'=3, 'A_level_or_Equivalent'=4
    • IMBD_band: 'unknown'=0, 'between_0_and_10_percent'=1, 'between_10_and_20_percent'=2, 'between_20_and_30_percent'=3, 'between_30_and_40_percent'=4, 'between_40_and_50_percent'=5, 'between_50_and_60_percent'=6, 'between_60_and_70_percent'=7, 'between_70_and_80_percent'=8, 'between_80_and_90_percent'=9, 'between_90_and_100_percent'=10
    • age_band: 'between_0_and_35'=0, 'between_35_and_55'=1, 'higher_than_55'=2
    • disability: 'N'=0, 'Y'=1
    • student's outcome: 'Fail'=0, 'Pass'=1

    For more detailed information, please refer to:


    Casalino G., Castellano G., Vessio G. (2021) Exploiting Time in Adaptive Learning from Educational Data. In: Agrati L.S. et al. (eds) Bridges and Mediation in Higher Distance Education. HELMeTO 2020. Communications in Computer and Information Science, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-67435-9_1

  17. n

    COVID-19 Pandemic: A Dataset from Khyber Pakhtunkhwa, Pakistan

    • narcis.nl
    • data.mendeley.com
    Updated Aug 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qureshi, W (via Mendeley Data) (2020). COVID-19 Pandemic: A Dataset from Khyber Pakhtunkhwa, Pakistan [Dataset]. http://doi.org/10.17632/nzcrfhgfh4.1
    Explore at:
    Dataset updated
    Aug 30, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Qureshi, W (via Mendeley Data)
    Area covered
    Pakistan, Khyber Pakhtunkhwa
    Description

    This dataset demonstrates the fear of Coronavirus (COVID-19) among the people of Khyber Pakhtunkhwa (Pakistan), their preventive behaviour, mental health condition, and level of anxiety during the pandemic. To gauge these constructs, a questionnaire was developed with the help of various scales – Fear of COVID-19 Scale (FCV-19S), Positive Mental Health Scale (PMHS), and General Anxiety Disorder Scale (GAD). At the time of data collection, the COVID-19 cases were emerging rapidly in the country due to which the KPK province was also facing lock-down and other mobility restrictions to limit the spread of viral infection. Keeping in view the prevalent emergency conditions, the research tool was developed in Google form and disseminated online for the collection of data. The informed consent of the respondents was obtained electronically, and they participated voluntarily in this survey research. Social media apps like Facebook, WhatsApp, LinkedIn, and personal contacts were also used for speedy collection of data. All the questions in the questionnaire were mandatory and the respondents could not send their responses by skipping any of them, so there is no missing value in the dataset. A total of 501 responses were received out of which 208 were females. For the convenience of the participants, every question in the questionnaire was translated into the Urdu language. All the responses were automatically saved online into a .xlsx spreadsheet and later that data was converted to digitized form by developing a coding frame. There are two main sections in this dataset, first is about the socio-demographic information (gender, age, marital status, employment status, area of residence and education) of the participants and the second demonstrates the fear, mental health, preventive behaviour, and anxiety while in the second section, the responses were rated on Likert scale. This dataset could be beneficial to the researchers and policymakers as they can get further insight to develop better skills and practices from a rapidly evolving situation.

  18. International Census Data

    • console.cloud.google.com
    Updated Nov 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&hl=sl&inv=1&invt=Ab4Bdw (2019). International Census Data [Dataset]. https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data?hl=sl
    Explore at:
    Dataset updated
    Nov 19, 2019
    Dataset provided by
    Googlehttp://google.com/
    Description

    The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates. Note: The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  19. Facebook: distribution of global audiences 2024, by age and gender

    • statista.com
    • es.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

                  Facebook connects the world
    
                  Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
                  as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.
    
  20. Additional resources for Kiva Crowdfunding

    • kaggle.com
    zip
    Updated Apr 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke (2018). Additional resources for Kiva Crowdfunding [Dataset]. https://www.kaggle.com/forums/f/26443/additional-resources-for-kiva-crowdfunding/t/54374/dataset-suggestion
    Explore at:
    zip(104671314 bytes)Available download formats
    Dataset updated
    Apr 12, 2018
    Authors
    Luke
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains the locations found in the Kiva datasets included in an administrative or geographical region. You can also find poverty data about this region. This facilitates answering some of the tough questions about a region's poverty.

    Content

    In the interest of preserving the original names and spelling for the locations/countries/regions all the data is in Excel format and has no preview (I think only the Kaggle recommended file types have preview - if anyone can show me how to do this for an xlsx file, it will be greatly appreciated)

    The Tables datasets contain the most recent analysis of the MPI on countries and regions. These datasets are updated regularly. In unique regions_names_from_google_api you will find 3 levels of inclusion for every geocode provided in Kiva datasets. (village/town, administrative region, sub-national region - which can be administrative or geographical). These are the results from the Google API Geocoding process.

    Files:

    • all_kiva_loans.csv

    Dropped multiple columns, kept all the rows from loans.csv with names, tags, descriptions and got a csv file of 390MB instead of 2.13 GB. Basically is a simplified version of loans.csv (originally included in the analysis by beluga)

    • country_stats.csv
    1. population source: https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)
    2. population_below_poverty_line: Percentage
    3. hdi: Human Development Index
    4. life_expectancy: Life expectancy at birth
    5. expected_years_of_schooling: Expected years of schooling
    6. mean_years_of_schooling: Mean years of schooling
    7. gni: Gross national income (GNI) per capita This dataset was originally created by beluga.
    • all_loan_theme_merged_with_geo_mpi_regions.xlsx

    This is the loan_themes_by_region left joined with Tables_5.3_Contribution_of_Deprivations. (all the original entries from loan_themes and only the entries that match from Tables_5; for the regions that lack MPI data, you will find Nan)

    These are the columns in the database:

    1. Partner ID
    2. Field Partner
    3. Name
    4. sector
    5. Loan Theme ID
    6. Loan Theme Type
    7. Country
    8. forkiva
    9. number
    10. amount
    11. geo
    12. rural_pct
    13. City
    14. Administrative region
    15. Sub-national region
    16. ISO
    17. World region
    18. Population Share of the Region (%)
    19. region MPI
    20. Education (%)
    21. Health (%)
    22. Living standards (%)
    23. Schooling (%)
    24. Child school attendance (%)
    25. Child Mortality (%)
    26. Nutrition (%)
    27. Electricity (%)
    28. Improved sanitation (%)
    29. Drinking water (%)
    30. Floor (%)
    31. Cooking fuel (%)
    32. Asset ownership (%)
    • mpi_on_regions.xlsx

    Matched the loans in loan_themes_by_region with the regions that have info regarding MPI. This dataset brings together the amount invested in a region and the biggest problems the said region has to deal with. It is a join between the loan_themes_by_region provided by Kiva and Tables 5.3 Contribution_of_Deprivations.

    It is a subset of the all_loan_theme_merged_with_geo_mpi_regions.xlsx, which contains only the entries that I could match with poverty decomposition data. It has the same columns.

    • Tables_5_SubNational_Decomposition_MPI_2017-18.xlsx

    Multidimensional poverty index decomposition for over 1000 regions part of 79 countries.

    Table 5.3: Contribution of deprivations to the MPI, by sub-national regions
    This table shows which dimensions and indicators contribute most to a region's MPI, which is useful for understanding the major source(s) of deprivation in a sub-national region.

    Source: http://ophi.org.uk/multidimensional-poverty-index/global-mpi-2016/

    • Tables_7_MPI_estimations_country_levels.xlsx

    MPI decomposition for 120 countries.

    Table 7 All Published MPI Results since 2010
    The table presents an archive of all MPI estimations published over the past 5 years, together with MPI, H, A and censored headcount ratios. For comparisons over time please use Table 6, which is strictly harmonised. The full set of data tables for each year published (Column A), is found on the 'data tables' page under 'Archive'.

    The data in this file is shown in interactive plots on Oxford Poverty and Human Development Initiative website. http://www.dataforall.org/dashboard/ophi/index.php/

    • unique_regions_from_kiva_loan_themes.xlsx

    These are all the regions corresponding to the geocodes found in Kiva's loan_themes_by_region. There are 718 unique entries, that you can join with any database from Kiva that has either a coordinates or region column.
    Columns:

    • geo: pair of Lat, Lon (from loan_themes_by_region)

    • City: name of the city (has the most NaN's)

    • Administrative region: first level of administrative inclusion for the city/location; (the equivalent of county for US)

    • Sub-national region: second level of administrative inclusion for the geo pair. (like state for US)

    • Country: name of the country

    Acknowledgements

    Thanks to Shane Lynn for the batch geocoding and to Joseph Deferio for reverse geocoding:

    https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/

    https://github.com/jdeferio/Reverse_Geocode

    The MPI datasets you can find on the Oxford website (http://ophi.org.uk/) under Research.

    "Citation: Alkire, S. and Kanagaratnam, U. (2018)

    “Multidimensional Poverty Index Winter 2017-18: Brief methodological note and results.” Oxford Poverty and Human Development Initiative, University of Oxford, OPHI Methodological Notes 45."

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&inv=1&invt=Abyneg (2018). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs
Organization logo

American Community Survey (ACS)

Explore at:
Dataset updated
Jul 16, 2018
Dataset provided by
Googlehttp://google.com/
Description

The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu