19 datasets found
  1. Geospatial Data Pack for Visualization

    • kaggle.com
    zip
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vega Datasets (2025). Geospatial Data Pack for Visualization [Dataset]. https://www.kaggle.com/datasets/vega-datasets/geospatial-data-pack
    Explore at:
    zip(1422109 bytes)Available download formats
    Dataset updated
    Oct 21, 2025
    Dataset authored and provided by
    Vega Datasets
    Description

    Geospatial Data Pack for Visualization 🗺️

    Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

    Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.

    Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.

    Why Use This Dataset? 🤔

    • Comprehensive Geospatial Types: Explore a variety of core geospatial data models:
      • Vector Data: Includes points (like airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).
      • Raster-like Data: Work with gridded datasets (like windvectors.csv, annual-precip.json).
    • Diverse Formats: Gain experience with standard and efficient geospatial formats like GeoJSON (see Table 1, 2, 4), compressed TopoJSON (see Table 1), and plain CSV/TSV (see Table 2, 3, 4) for point data and attribute tables ready for joining.
    • Multi-Scale Coverage: Practice visualization across different geographic scales, from global and national (Table 1, 4) down to the city level (Table 1).
    • Rich Thematic Mapping: Includes multiple datasets (Table 3) specifically designed for joining attributes to geographic boundaries (like states or counties from Table 1) to create insightful choropleth maps.
    • Ready-to-Use & Example-Driven: Cleaned datasets tightly integrated with 31+ official examples (see Appendix) from Altair, Vega-Lite, and Vega, allowing you to immediately practice techniques like projections, point maps, network maps, and interactive displays.
    • Python Friendly: Works seamlessly with essential Python libraries like Altair (which can directly read TopoJSON/GeoJSON), Pandas, and GeoPandas, fitting perfectly into the Kaggle notebook environment.

    Table of Contents

    Dataset Inventory 🗂️

    This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.

    1. BASE MAP BOUNDARIES (Topological Data)

    DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
    US Map (1:10m)us-10m.json627 KBTopoJSONCC-BY-4.0US state and county boundaries. Contains states and counties objects. Ideal for choropleths.id (FIPS code) property on geometries
    World Map (1:110m)world-110m.json117 KBTopoJSONCC-BY-4.0World country boundaries. Contains countries object. Suitable for world-scale viz.id property on geometries
    London BoroughslondonBoroughs.json14 KBTopoJSONCC-BY-4.0London borough boundaries.properties.BOROUGHN (name)
    London CentroidslondonCentroids.json2 KBGeoJSONCC-BY-4.0Center points for London boroughs.properties.id, properties.name
    London Tube LineslondonTubeLines.json78 KBGeoJSONCC-BY-4.0London Underground network lines.properties.name, properties.color

    2. GEOGRAPHIC REFERENCE POINTS (Point Data) 📍

    DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
    US Airportsairports.csv205 KBCSVPublic DomainUS airports with codes and coordinates.iata, state, `l...
  2. Data from: A concentration-based approach to data classification for...

    • tandf.figshare.com
    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert G. Cromley; Shuowei Zhang; Natalia Vorotyntseva (2023). A concentration-based approach to data classification for choropleth mapping [Dataset]. http://doi.org/10.6084/m9.figshare.1456086.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Robert G. Cromley; Shuowei Zhang; Natalia Vorotyntseva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The choropleth map is a device used for the display of socioeconomic data associated with an areal partition of geographic space. Cartographers emphasize the need to standardize any raw count data by an area-based total before displaying the data in a choropleth map. The standardization process converts the raw data from an absolute measure into a relative measure. However, there is recognition that the standardizing process does not enable the map reader to distinguish between low–low and high–high numerator/denominator differences. This research uses concentration-based classification schemes using Lorenz curves to address some of these issues. A test data set of nonwhite birth rate by county in North Carolina is used to demonstrate how this approach differs from traditional mean–variance-based systems such as the Jenks’ optimal classification scheme.

  3. Natural Earth 1:110m Countries

    • kaggle.com
    zip
    Updated Mar 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Poznyakovskiy (2020). Natural Earth 1:110m Countries [Dataset]. https://www.kaggle.com/datasets/poznyakovskiy/natural-earth-1110m-countries
    Explore at:
    zip(197544 bytes)Available download formats
    Dataset updated
    Mar 14, 2020
    Authors
    Anton Poznyakovskiy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains geometry data for the countries of the world together with their names and country codes in various formats. The primary use case is choropleths, color-coded maps. The data can be read as a pandas DataFrame with geopandas and plotted with matplotlib. See the starter notebook for an example how to do it.

    The data was created by Natural Earth. It is in public domain and free to use for any purpose at the time of this writing; you might want to check their Terms of Use.

    Photo by KOBU Agency on Unsplash

  4. n

    Data from: Cartogram and Choropleth communicative effectiveness participant...

    • data-search.nerc.ac.uk
    • ckan.publishing.service.gov.uk
    • +2more
    Updated Nov 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Cartogram and Choropleth communicative effectiveness participant test results 2015 [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/resources/registries/vocabularies/BGS%2520Thesaurus%2520of%2520Geosciences/concepts/Maps
    Explore at:
    Dataset updated
    Nov 4, 2025
    Description

    These are the results obtained from an empirical test looking at the communicative effectiveness between two types of two dimensional (2D) map formats (Choropleth maps, and Cartograms) of the Greater London area of the United Kingdom. Participants were interviewed and observed individually during the procedure. The results contain the recorded measurements of spatial accuracy, and the time taken for each participant to answers 3 test questions. A post-test qualitative reaction of each participants' preference between the two map types is recorded, along with their gender, age, visual impediments, and self-assessed map reading ability.

  5. Pakistan Cities— 1,513 locations with lat/lon/pop

    • kaggle.com
    zip
    Updated Aug 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ikram Ul Hassan (2025). Pakistan Cities— 1,513 locations with lat/lon/pop [Dataset]. https://www.kaggle.com/datasets/ikramshah512/pakistan-cities-wikidata-linked-1513-locations
    Explore at:
    zip(42829 bytes)Available download formats
    Dataset updated
    Aug 17, 2025
    Authors
    Ikram Ul Hassan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Pakistan
    Description

    A comprehensive dataset of 1,513 Pakistani cities, towns, tehsils, districts and places with latitude/longitude, administrative region, population (when available) and Wikidata IDs — ideal for mapping, geospatial analysis, enrichment, and location-based ML.

    Why this dataset is valuable:

    • Full geocoordinates for every entry (100% coverage) — ready for mapping and spatial joins.
    • Wide geographic coverage across all 7 major regions of Pakistan (provinces / administrative regions).
    • Wikidata IDs included for reliable cross-referencing and automatic enrichment from external knowledge bases.
    • Useful for data scientists, GIS engineers, civic tech projects, academic research, and startups building Pakistan-focused location services.

    Highlights (fetched from the data):

    • Total rows: 1,513
    • Unique places (city field): 1,497
    • Rows with population > 0: 526 (≈34.8%)
    • Coordinate coverage: 1513 / 1513 (100%) — directly usable with mapping libraries.

    Column definitions (short):

    • id — Internal numeric row id (unique integer).
    • wikiDataId — Wikidata QID (e.g., Q####) for the place; use to fetch rich metadata.
    • type — Administrative/place type (e.g., ADM1, ADM2, city, district, tehsil).
    • city — Common/local city/place name (short label).
    • name — Full name / official name of the place (may include “District”, “Tehsil”, etc.).
    • country — Country name (Pakistan).
    • countryCode — ISO country code (e.g., PK).
    • region — Primary administrative region / province (e.g., Punjab, Sindh).
    • regionCode — Short code for region (e.g., PB, KP depending on your encoding).
    • regionWdId — Wikidata QID for the region.
    • latitude — Latitude in decimal degrees (float).
    • longitude — Longitude in decimal degrees (float).
    • population — Integer population (0 or NA where unknown).

    Typical & high-value use cases:

    • Mapping & visualization: choropleth maps, point overlays, heatmaps of population or density.
    • Geospatial analysis: distance calculations, nearest-neighbor queries, clustering of urban centers.
    • Data enrichment: join with other datasets (OpenStreetMap, Wikidata, census data) using wikiDataId and coordinates.
    • Machine learning & NLP: training geolocation models, geoparsing, toponym resolution, place name disambiguation.
    • Urban planning & research: analyze distribution of population-ready places vs administrative units.
    • Mobile / location-based apps: lookup & reverse geocoding fallback, seeding POI databases for Pakistan.
    • Humanitarian & disaster response: baseline location lists for logistics and situational awareness.
  6. FOLIUM_INDIA

    • kaggle.com
    zip
    Updated Jun 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KD007 (2020). FOLIUM_INDIA [Dataset]. https://www.kaggle.com/krishcross/india-shape-map
    Explore at:
    zip(16183750 bytes)Available download formats
    Dataset updated
    Jun 15, 2020
    Authors
    KD007
    Area covered
    India
    Description

    Folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. These files can be used to mark the state boundaries on the map of INDIA using folium library and the CSV also contains the state data and how to use it in our notebooks. I have used it in one of my kernels which can be viewed.

    The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON, and TopoJSON overlays. Due to extensible functionalities I find folium the best map plotting library in python. Do give it a try and use it in your kernels.

  7. d

    Data from: CrimeMapTutorial Workbooks and Sample Data for ArcView and...

    • catalog.data.gov
    • icpsr.umich.edu
    • +1more
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). CrimeMapTutorial Workbooks and Sample Data for ArcView and MapInfo, 2000 [Dataset]. https://catalog.data.gov/dataset/crimemaptutorial-workbooks-and-sample-data-for-arcview-and-mapinfo-2000-3c9be
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    National Institute of Justice
    Description

    CrimeMapTutorial is a step-by-step tutorial for learning crime mapping using ArcView GIS or MapInfo Professional GIS. It was designed to give users a thorough introduction to most of the knowledge and skills needed to produce daily maps and spatial data queries that uniformed officers and detectives find valuable for crime prevention and enforcement. The tutorials can be used either for self-learning or in a laboratory setting. The geographic information system (GIS) and police data were supplied by the Rochester, New York, Police Department. For each mapping software package, there are three PDF tutorial workbooks and one WinZip archive containing sample data and maps. Workbook 1 was designed for GIS users who want to learn how to use a crime-mapping GIS and how to generate maps and data queries. Workbook 2 was created to assist data preparers in processing police data for use in a GIS. This includes address-matching of police incidents to place them on pin maps and aggregating crime counts by areas (like car beats) to produce area or choropleth maps. Workbook 3 was designed for map makers who want to learn how to construct useful crime maps, given police data that have already been address-matched and preprocessed by data preparers. It is estimated that the three tutorials take approximately six hours to complete in total, including exercises.

  8. H

    Foreign-born Population in Boston Area Communities, 1870–2010

    • dataverse.harvard.edu
    Updated May 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marilynn S. Johnson (2018). Foreign-born Population in Boston Area Communities, 1870–2010 [Dataset]. http://doi.org/10.7910/DVN/IC42Z8
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Marilynn S. Johnson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1870 - 2010
    Area covered
    Boston Metropolitan Area
    Description

    This dataset underlies a choropleth map of Boston area communities in which areas are shaded according to the percentage of the population that was foreign-born during each decade. The data was drawn from the US Census of Population, as well as the American Community Survey.

  9. Synthetic population for JOR

    • zenodo.org
    bin, csv, pdf, zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie (2024). Synthetic population for JOR [Dataset]. http://doi.org/10.5281/zenodo.6503398
    Explore at:
    pdf, zip, csv, binAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic populations for regions of the World (SPW) | Jordan

    Dataset information

    A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).

    License

    CC-BY-4.0

    Acknowledgment

    This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).

    Contact information

    Henning.Mortveit@virginia.edu

    Identifiers

    Region nameJordan
    Region IDjor
    Modelcoarse
    Version0_9_0

    Statistics

    NameValue
    Population5723567.0
    Average age23.5
    Households1235755.0
    Average household size4.6
    Residence locations1235755.0
    Activity locations131978.0
    Average number of activities6.4
    Average travel distance44.5

    Sources

    DescriptionNameVersionUrl
    Activity template dataWorld Bank2021https://data.worldbank.org
    Administrative boundariesADCW7.6https://www.adci.com/adc-worldmap
    Curated POIs based on OSMSLIPO/OSM POIshttp://slipo.eu/?p=1551 https://www.openstreetmap.org/
    Household dataDHShttps://dhsprogram.com
    Population count with demographic attributesGPWv4.11https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11

    Files description

    Base data files (jor_data_v_0_9.zip)

    FilenameDescription
    jor_person_v_0_9.csvData for each person including attributes such as age, gender, and household ID.
    jor_household_v_0_9.csvData at household level.
    jor_residence_locations_v_0_9.csvData about residence locations
    jor_activity_locations_v_0_9.csvData about activity locations, including what activity types are supported at these locations
    jor_activity_location_assignment_v_0_9.csvFor each person and for each of their activities, this file specifies the location where the activity takes place

    Derived data files

    FilenameDescription
    jor_contact_matrix_v_0_9.csvA POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model.

    Validation and measures files

    FilenameDescription
    jor_household_grouping_validation_v_0_9.pdfValidation plots for household construction
    jor_activity_durations_{adult,child}_v_0_9.pdfComparison of time spent on generated activities with survey data
    jor_activity_patterns_{adult,child}_v_0_9.pdfComparison of generated activity patterns by the time of day with survey data
    jor_location_construction_0_9.pdfValidation plots for location construction
    jor_location_assignement_0_9.pdfValidation plots for location assignment, including travel distribution plots
    jor_jor_ver_0_9_0_avg_travel_distance.pdfChoropleth map visualizing average travel distance
    jor_jor_ver_0_9_0_travel_distr_combined.pdfTravel distance distribution
    jor_jor_ver_0_9_0_num_activity_loc.pdfChoropleth map visualizing number of activity locations
    jor_jor_ver_0_9_0_avg_age.pdfChoropleth map visualizing average age
    jor_jor_ver_0_9_0_pop_density_per_sqkm.pdfChoropleth map visualizing population density
    jor_jor_ver_0_9_0_pop_size.pdfChoropleth map visualizing population size

  10. a

    Multiple Hazard Index for United States Counties

    • hub.arcgis.com
    • gis-fema.hub.arcgis.com
    Updated Jul 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jjs2154_columbia (2016). Multiple Hazard Index for United States Counties [Dataset]. https://hub.arcgis.com/items/800f684ebadf423bae4c669cb0a1d7da
    Explore at:
    Dataset updated
    Jul 29, 2016
    Dataset authored and provided by
    jjs2154_columbia
    Area covered
    Description

    OverviewThe multiple hazard index for the United States Counties was designed to map natural hazard relating to exposure to multiple natural disasters. The index was created to provide communities and public health officials with an overview of the risks that are prominent in their county, and to facilitate the comparison of hazard level between counties. Most existing hazard maps focus on a single disaster type. By creating a measure that aggregates the hazard from individual disasters, the increased hazard that results from exposure to multiple natural disasters can be better understood. The multiple hazard index represents the aggregate of hazard from eleven individual disasters. Layers displaying the hazard from each individual disaster are also included.

    The hazard index is displayed visually as a choropleth map, with the color blue representing areas with less hazard and red representing areas with higher hazard. Users can click on each county to view its hazard index value, and the level of hazard for each individual disaster. Layers describing the relative level of hazard from each individual disaster are also available as choropleth maps with red areas representing high, orange representing medium, and yellow representing low levels of hazard.Methodology and Data CitationsMultiple Hazard Index

    The multiple hazard index was created by coding the individual hazard classifications and summing the coded values for each United States County. Each individual hazard is weighted equally in the multiple hazard index. Alaska and Hawaii were excluded from analysis because one third of individual hazard datasets only describe the coterminous United States.

    Avalanche Hazard

    University of South Carolina Hazards and Vulnerability Research Institute. “Spatial Hazard Events and Losses Database”. United States Counties. “Avalanches United States 2001-2009”. < http://hvri.geog.sc.edu/SHELDUS/

    Downloaded 06/2016.

    Classification

    Avalanche hazard was classified by dividing counties based upon the number of avalanches they experienced over the nine year period in the dataset. Avalanche hazard was not normalized by total county area because it caused an over-emphasis on small counties, and because avalanches are a highly local hazard.

    None = 0 AvalanchesLow = 1 AvalancheMedium = 2-5 AvalanchesHigh = 6-10 Avalanches

    Earthquake Hazard

    United States Geological Survey. “Earthquake Hazard Maps”. 1:2,000,000. “Peak Ground Acceleration 2% in 50 Years”. < http://earthquake.usgs.gov/hazards/products/conterminous/

    . Downloaded 07/2016.

    Classification

    Peak ground acceleration (% gravity) with a 2% likelihood in 50 years was averaged by United States County, and the earthquake hazard of counties was classified based upon this average.

    Low = 0 - 14.25 % gravity peak ground accelerationMedium = 14.26 - 47.5 % gravity peak ground accelerationHigh = 47.5+ % gravity peak ground acceleration

    Flood Hazard

    United States Federal Emergency Management Administration. “National Flood Hazard Layer”. 1:10,000. “0.2 Percent Annual Flood Area”. < https://data.femadata.com/FIMA/Risk_MAP/NFHL/

    . Downloaded 07/2016.

    Classification

    The National Flood Hazard Layer 0.2 Percent Annual Flood Area was spatially intersected with the United States Counties layer, splitting flood areas by county and adding county information to flood areas. Flood area was aggregated by county, expressed as a fraction of the total county land area, and flood hazard was classified based upon percentage of land that is susceptible to flooding. National Flood Hazard Layer does not cover the entire United States; coverage is focused on populated areas. Areas not included in National Flood Hazard Layer were assigned flood risk of Low in order to include these areas in further analysis.

    Low = 0-.001% area susceptibleMedium = .00101 % - .005 % area susceptibleHigh = .00501+ % area susceptible

    Heat Wave Hazard

    United States Center for Disease Control and Prevention. “National Climate Assessment”. Contiguous United States Counties. “Extreme Heat Events: Heat Wave Days in May - September for years 1981-2010”. Downloaded 06/2016.

    Classification

    Heat wave was classified by dividing counties based upon the number of heat wave days they experienced over the 30 year time period described in the dataset.

    Low = 126 - 171 Heat wave DaysMedium = 172 – 187 Heat wave DaysHigh = 188 – 255 Heat wave Days

    Hurricane Hazard

    National Oceanic and Atmospheric Administration. Coastal Services Center. “Historical North Atlantic Tropical Cyclone Tracks, 1851-2004”. 1: 2,000,000. < https://catalog.data.gov/dataset/historical-north-atlantic-tropical-cyclone-tracks-1851-2004-direct-download

    . Downloaded 06/2016.

    National Oceanic and Atmospheric Administration. Coastal Services Center. “Historical North Pacific Tropical Cyclone Tracks, 1851-2004”. 1: 2,000,000. < https://catalog.data.gov/dataset/historical-north-atlantic-tropical-cyclone-tracks-1851-2004-direct-download

    . Downloaded 06/2016.

    Classification

    Atlantic and Pacific datasets were merged. Tropical storm and disturbance tracks were filtered out leaving hurricane tracks. Each hurricane track was assigned the value of the category number that describes that event. Weighting each event by intensity ensures that areas with higher intensity events are characterized as being more hazardous. Values describing each hurricane event were aggregated by United States County, normalized by total county area, and the hurricane hazard of counties was classified based upon the normalized value.

    Landslide Hazard

    United States Geological Survey. “Landslide Overview Map of the United States”. 1:4,000,000. “Landslide Incidence and Susceptibility in the Conterminous United States”. < https://catalog.data.gov/dataset/landslide-incidence-and-susceptibility-in-the-conterminous-united-states-direct-download

    . Downloaded 07/2016.

    Classification

    The classifications of High, Moderate, and Low landslide susceptibility and incidence from the study were numerically coded, the average value was computed for each county, and the landslide hazard was classified based upon the average value.

    Long-Term Drought Hazard

    United States Drought Monitor, Drought Mitigation Center, United States Department of Agriculture, National Oceanic and Atmospheric Administration. “Drought Monitor Summary Map”. “Long-Term Drought Impact”. < http://droughtmonitor.unl.edu/MapsAndData/GISData.aspx >. Downloaded 06/2016.

    Classification

    Short-term drought areas were filtered from the data; leaving only long-term drought areas. United States Counties were assigned the average U.S. Drought Monitor Classification Scheme Drought Severity Classification value that characterizes the county area. County long-term drought hazard was classified based upon average Drought Severity Classification value.

    Low = 1 – 1.75 average Drought Severity Classification valueMedium = 1.76 -3.0 average Drought Severity Classification valueHigh = 3.0+ average Drought Severity Classification value

    Snowfall Hazard

    United States National Oceanic and Atmospheric Administration. “1981-2010 U.S. Climate Normals”. 1: 2,000,000. “Annual Snow Normal”. < http://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/products/precipitation/

    . Downloaded 08/2016.

    Classification

    Average yearly snowfall was joined with point location of weather measurement stations, and stations without valid snowfall measurements were filtered out (leaving 6233 stations). Snowfall was interpolated using least squared distance interpolation to create a .05 degree raster describing an estimate of yearly snowfall for the United States. The average yearly snowfall raster was aggregated by county to yield the average yearly snowfall per United States County. The snowfall risk of counties was classified by average snowfall.

    None = 0 inchesLow = .01- 10 inchesMedium = 10.01- 50 inchesHigh = 50.01+ inches

    Tornado Hazard

    United States National Oceanic and Atmospheric Administration Storm Prediction Center. “Severe Thunderstorm Database and Storm Data Publication”. 1: 2,000,000. “United States Tornado Touchdown Points 1950-2004”. < https://catalog.data.gov/dataset/united-states-tornado-touchdown-points-1950-2004-direct-download

    . Downloaded 07/2016.

    Classification

    Each tornado touchdown point was assigned the value of the Fujita Scale that describes that event. Weighting each event by intensity ensures that areas with higher intensity events are characterized as more hazardous. Values describing each tornado event were aggregated by United States County, normalized by total county area, and the tornado hazard of counties was classified based upon the normalized value.

    Volcano Hazard

    Smithsonian Institution National Volcanism Program. “Volcanoes of the World”. “Holocene Volcanoes”. < http://volcano.si.edu/search_volcano.cfm

    . Downloaded 07/2016.

    Classification

    Volcano coordinate locations from spreadsheet were mapped and aggregated by United States County. Volcano count was normalized by county area, and the volcano hazard of counties was classified based upon the number of volcanoes present per unit area.

    None = 0 volcanoes/100 kilometersLow = 0.000915 - 0.007611 volcanoes / 100 kilometersMedium = 0.007612 - 0.018376 volcanoes / 100 kilometersHigh = 0.018377- 0.150538 volcanoes / 100 kilometers

    Wildfire Hazard

    United States Department of Agriculture, Forest Service, Fire, Fuel, and Smoke Science Program. “Classified 2014 Wildfire Hazard Potential”. 270 meters. < http://www.firelab.org/document/classified-2014-whp-gis-data-and-maps

    . Downloaded 06/2016.

    Classification

    The classifications of Very High, High, Moderate, Low, Very Low, and Non-Burnable/Water wildfire hazard from the study were numerically coded, the average value was computed for each county, and the wildfire hazard was classified based upon the average value.

  11. Synthetic population for IND_DELHI

    • zenodo.org
    bin, pdf, zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie (2024). Synthetic population for IND_DELHI [Dataset]. http://doi.org/10.5281/zenodo.6505994
    Explore at:
    pdf, zip, binAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Delhi, India
    Description

    Synthetic populations for regions of the World (SPW) | Delhi

    Dataset information

    A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).

    License

    CC-BY-4.0

    Acknowledgment

    This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).

    Contact information

    Henning.Mortveit@virginia.edu

    Identifiers

    Region nameDelhi
    Region IDind_140001944
    Modelcoarse
    Version0_9_0

    Statistics

    NameValue
    Population15951510
    Average age28.2
    Households3625935
    Average household size4.4
    Residence locations3625935
    Activity locations1309377
    Average number of activities5.5
    Average travel distance26.6

    Sources

    DescriptionNameVersionUrl
    Activity template dataWorld Bank2021https://data.worldbank.org
    Administrative boundariesADCW7.6https://www.adci.com/adc-worldmap
    Curated POIs based on OSMSLIPO/OSM POIshttp://slipo.eu/?p=1551 https://www.openstreetmap.org/
    Household dataDHShttps://dhsprogram.com
    Population count with demographic attributesGPWv4.11https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11

    Files description

    Base data files (ind_140001944_data_v_0_9.zip)

    FilenameDescription
    ind_140001944_person_v_0_9.csvData for each person including attributes such as age, gender, and household ID.
    ind_140001944_household_v_0_9.csvData at household level.
    ind_140001944_residence_locations_v_0_9.csvData about residence locations
    ind_140001944_activity_locations_v_0_9.csvData about activity locations, including what activity types are supported at these locations
    ind_140001944_activity_location_assignment_v_0_9.csvFor each person and for each of their activities, this file specifies the location where the activity takes place

    Derived data files

    FilenameDescription
    ind_140001944_contact_matrix_v_0_9.csvA POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model.

    Validation and measures files

    FilenameDescription
    ind_140001944_household_grouping_validation_v_0_9.pdfValidation plots for household construction
    ind_140001944_activity_durations_{adult,child}_v_0_9.pdfComparison of time spent on generated activities with survey data
    ind_140001944_activity_patterns_{adult,child}_v_0_9.pdfComparison of generated activity patterns by the time of day with survey data
    ind_140001944_location_construction_0_9.pdfValidation plots for location construction
    ind_140001944_location_assignement_0_9.pdfValidation plots for location assignment, including travel distribution plots
    ind_140001944_ind_140001944_ver_0_9_0_avg_travel_distance.pdfChoropleth map visualizing average travel distance
    ind_140001944_ind_140001944_ver_0_9_0_travel_distr_combined.pdfTravel distance distribution
    ind_140001944_ind_140001944_ver_0_9_0_num_activity_loc.pdfChoropleth map visualizing number of activity locations
    ind_140001944_ind_140001944_ver_0_9_0_avg_age.pdfChoropleth map visualizing average age
    ind_140001944_ind_140001944_ver_0_9_0_pop_density_per_sqkm.pdfChoropleth map visualizing population density
    ind_140001944_ind_140001944_ver_0_9_0_pop_size.pdfChoropleth map visualizing population size

  12. S

    PostGIS data for London and Greater London ward boundaries as of 2018

    • splitgraph.com
    Updated Aug 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    splitgraph (2020). PostGIS data for London and Greater London ward boundaries as of 2018 [Dataset]. https://www.splitgraph.com/splitgraph/london_wards/
    Explore at:
    application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
    Dataset updated
    Aug 19, 2020
    Authors
    splitgraph
    Area covered
    Greater London, London
    Description

    PostGIS data for London and Greater London ward boundaries as of 2018.

    This dataset is used in the london_votes sample Splitfile in which the 2017 General Election results and London Ward geodata are joined through the ONS UK Ward-Constituency lookup table to build a dataset of London constituencies and Conservative/Labour votes in each, ready for plotting as a Choropleth map.

    https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london

    Contains National Statistics data © Crown copyright and database right 2012

    Contains Ordnance Survey data © Crown copyright and database right 2012

  13. Synthetic population for IND_MANIPUR

    • zenodo.org
    bin, pdf, zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie (2024). Synthetic population for IND_MANIPUR [Dataset]. http://doi.org/10.5281/zenodo.6506020
    Explore at:
    pdf, bin, zipAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie; Abhijin Adiga; Hannah Baek; Stephen Eubank; Przemyslaw Porebski; Madhav Marathe; Henning Mortveit; Samarth Swarup; Mandy Wilson; Dawen Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Manipur, India
    Description

    Synthetic populations for regions of the World (SPW) | Manipur

    Dataset information

    A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).

    License

    CC-BY-4.0

    Acknowledgment

    This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).

    Contact information

    Henning.Mortveit@virginia.edu

    Identifiers

    Region nameManipur
    Region IDind_140001942
    Modelcoarse
    Version0_9_0

    Statistics

    NameValue
    Population2796700
    Average age27.5
    Households635806
    Average household size4.4
    Residence locations635806
    Activity locations192709
    Average number of activities5.5
    Average travel distance78.3

    Sources

    DescriptionNameVersionUrl
    Activity template dataWorld Bank2021https://data.worldbank.org
    Administrative boundariesADCW7.6https://www.adci.com/adc-worldmap
    Curated POIs based on OSMSLIPO/OSM POIshttp://slipo.eu/?p=1551 https://www.openstreetmap.org/
    Household dataDHShttps://dhsprogram.com
    Population count with demographic attributesGPWv4.11https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11

    Files description

    Base data files (ind_140001942_data_v_0_9.zip)

    FilenameDescription
    ind_140001942_person_v_0_9.csvData for each person including attributes such as age, gender, and household ID.
    ind_140001942_household_v_0_9.csvData at household level.
    ind_140001942_residence_locations_v_0_9.csvData about residence locations
    ind_140001942_activity_locations_v_0_9.csvData about activity locations, including what activity types are supported at these locations
    ind_140001942_activity_location_assignment_v_0_9.csvFor each person and for each of their activities, this file specifies the location where the activity takes place

    Derived data files

    FilenameDescription
    ind_140001942_contact_matrix_v_0_9.csvA POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model.

    Validation and measures files

    FilenameDescription
    ind_140001942_household_grouping_validation_v_0_9.pdfValidation plots for household construction
    ind_140001942_activity_durations_{adult,child}_v_0_9.pdfComparison of time spent on generated activities with survey data
    ind_140001942_activity_patterns_{adult,child}_v_0_9.pdfComparison of generated activity patterns by the time of day with survey data
    ind_140001942_location_construction_0_9.pdfValidation plots for location construction
    ind_140001942_location_assignement_0_9.pdfValidation plots for location assignment, including travel distribution plots
    ind_140001942_ind_140001942_ver_0_9_0_avg_travel_distance.pdfChoropleth map visualizing average travel distance
    ind_140001942_ind_140001942_ver_0_9_0_travel_distr_combined.pdfTravel distance distribution
    ind_140001942_ind_140001942_ver_0_9_0_num_activity_loc.pdfChoropleth map visualizing number of activity locations
    ind_140001942_ind_140001942_ver_0_9_0_avg_age.pdfChoropleth map visualizing average age
    ind_140001942_ind_140001942_ver_0_9_0_pop_density_per_sqkm.pdfChoropleth map visualizing population density
    ind_140001942_ind_140001942_ver_0_9_0_pop_size.pdfChoropleth map visualizing population size

  14. Great Britain Local Authority Boundaries GeoJSON

    • kaggle.com
    zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ireneusz Imiolek (2023). Great Britain Local Authority Boundaries GeoJSON [Dataset]. https://www.kaggle.com/datasets/ireneuszimiolek/great-britain-local-authority-boundaries-geojson
    Explore at:
    zip(1785780 bytes)Available download formats
    Dataset updated
    Jul 5, 2023
    Authors
    Ireneusz Imiolek
    Area covered
    Great Britain
    Description

    Dataset info

    The dataset contains Local Authority Boundaries for Great Britain (England, Scotland and Wales) as of December 2021. A total of 363 Local Authority objects are included. Created for future use in folium choropleth maps when combined with other datasets that contain the matching Local Authority Codes. Additionally, subsets were created for convenience holding the boundaries of local authorities in England and Wales together, and in each individual country, i.e., England, Scotland and Wales on their own.

    Methodology

    The original dataset was downloaded from ONS. Since the dataset was too large for most use cases (129.4MB) due to the level of detail, it was simplified with https://mapshaper.org/ using the default method (Visvalingam / weighted area) with 'prevent shape removal' enabled. The simplification was set to 1.4%, followed by intersection repair and export back to geojson. The shape coordinates were originally in British National Grid (BNG) format, which had to be converted to WGS84 (latitude and longitude) format. Finally, the coordinates were rounded to 6 decimal places, resulting in a file containing 2.2MB of uncompressed data with a sensible level of detail. The individual country data were extracted, based on the LAD21CD property, to create the additional files.

    Licence

    https://www.ons.gov.uk/methodology/geography/licences

    Digital boundary products and reference maps are supplied under the Open Government Licence. You must use the following copyright statements when you reproduce or use this material:

    • Source: Office for National Statistics licensed under the Open Government Licence v.3.0
    • Contains OS data © Crown copyright and database right 2023
  15. Continent Boundaries as GPKG files

    • kaggle.com
    zip
    Updated Mar 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Narro (2024). Continent Boundaries as GPKG files [Dataset]. https://www.kaggle.com/ericnarro/continent-boundaries-as-gpkg-files
    Explore at:
    zip(4955392 bytes)Available download formats
    Dataset updated
    Mar 3, 2024
    Authors
    Eric Narro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are 2 files with Geographic vector layers with Continent Boundaries. You can use this data to create Choropleth maps or for any map visualization that requires vectors at the Continent level.

    Citation of the data source:

    Shepherd, Stephanie (2020). Continent Polygons. figshare. Dataset. https://doi.org/10.6084/m9.figshare.12555170.v3

    I transformed the data lightly to make a GPKG file and to add the documents to Kaggle.

    The original data is similar to the file continent_boundaries_8.gpkg, which includes the following continents:

    • Africa
    • Antarctica
    • Asia
    • Australia
    • Europe
    • North America
    • Oceania
    • South America

    I also created a file called continent_boundaries_7.gpkg that merges Australia and Oceania as a single continent.

    You can find the transformations in the following Kaggle Notebook: https://www.kaggle.com/code/ericnarro/create-continents-geodataframe-and-file/notebook

  16. Population of Cities in Ecuador 2022

    • kaggle.com
    zip
    Updated Nov 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kei (2022). Population of Cities in Ecuador 2022 [Dataset]. https://www.kaggle.com/datasets/kokitashiro/population-of-cities-in-ecuador-2022
    Explore at:
    zip(1384 bytes)Available download formats
    Dataset updated
    Nov 13, 2022
    Authors
    Kei
    Area covered
    Ecuador
    Description

    This is dataset which you can find population of Ecuadorian cities in 2022 . The data downloaded from this website. In my case, I utilize this data for making choropleth map for analyzing data of "Store Sales - Time Series Forecasting" data and please freely utilize this data for such use. (Thank you very much for "World Population Review"!)

  17. Malaysia COVID-19 Data - Apr 2021

    • kaggle.com
    zip
    Updated May 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ansonnn (2021). Malaysia COVID-19 Data - Apr 2021 [Dataset]. https://www.kaggle.com/ansonnn/malaysia-covid19-data-apr-2021
    Explore at:
    zip(1141528 bytes)Available download formats
    Dataset updated
    May 5, 2021
    Authors
    Ansonnn
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Area covered
    Malaysia
    Description

    Context

    The dataset consists of COVID-19 cases in Malaysia from 27 March 2020 to 15 April 2021. This dataset is collected for the purpose of creating better visualizations for the COVID-19 cases in Malaysia. All of the data is web scraped from https://kpkesihatan.com/ by using BeautifulSoup library.

    The data is also available in GitHub, along with the scripts made to scrape the data. There is also a Web Application made to show the visualizations.

    Originally I planned to update the data daily but I find that it seems too tedious for me to do this alone without some sort of automated scripts or schedulers. I have been wondering how to do this efficiently with automation or schedulers, if someone knows how to do this efficiently, please reach out to me by emailing or message in LinkedIn, the links can be found in my GitHub, thank you very much.

    Content

    There are three CSV files and one GeoJSON file: - all_2020-03-27_2021-04-15.csv: all daily cases excluding state data - state_all.csv: all daily cases for each state - state_cumu.csv: all daily cumulative cases for each state - malaysia_state_province_boundary.geojson: Malaysia's GeoJSON map file

    The columns consist of: 1. Date 2. Recovered 3. Cumulative Recovered 4. Imported Case (many NaN values till the end of 2020) 5. Local Case (many NaN values) 6. Active Case (many NaN values but can be inferred) 7. New Case 8. Cumulative Case 9. ICU - Number of patients admitted into Intensive Care Unit 10. Ventilator - Number of patients who need ventilator in ICU 11. Death 12. Cumulative Death 13. URL - link to the original webpage

    Acknowledgements

    Thanks to Info GIS MAP.com that provides Malaysia's GeoJSON file to create Choropleth maps.

    Inspiration

    Hopefully, there will be people utilizing the scripts or the data to create better visualizations.

  18. Punjab Stubble Burning Crop Fire Data

    • kaggle.com
    zip
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu (2024). Punjab Stubble Burning Crop Fire Data [Dataset]. https://www.kaggle.com/datasets/waitasecant/punjab-stubble-burning-crop-fire-data
    Explore at:
    zip(194956 bytes)Available download formats
    Dataset updated
    Sep 28, 2024
    Authors
    Himanshu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Punjab
    Description

    How was the data collected? The dataset was collected from Crop Residue Burning Information and Management System (CRBIMS), Punjab Remote Sensing Centre (PRSC). The data is collected for all the districts of Punjab from April 2016 to June 2024.

    What can be done with this data?

    Visualization: This data can be used for visualization purposes. Choropleth maps can be made using the coordinates available.

    https://github.com/waitasecant/CRBIMS/blob/main/trends.png?raw=true" alt="">

    Source: http://gis-prsc.punjab.gov.in/residue/Index.aspx

  19. COVID-19 in Tokyo

    • kaggle.com
    zip
    Updated Feb 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaito (2021). COVID-19 in Tokyo [Dataset]. https://www.kaggle.com/japandata509/covid19-in-tokyo-japan
    Explore at:
    zip(465280 bytes)Available download formats
    Dataset updated
    Feb 3, 2021
    Authors
    Kaito
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Tokyo
    Description

    About Datasets

    Tokyo is the largest prefecture and has the largest number of cases of COVID-19 in Japan. The number of total confirmed cases in Tokyo is about 73000 (as of January 9th, 2021). In this dataset, data about COVID-19 in Tokyo contain. If you want to download it, please consider upvoting.

    Data Source

    Data was collected from Tokyo Metropolitan Government Open Data Catalog Site and Updates on COVID-19 in Tokyo.

    Columns

    tokyo_covid19_patients.csv file in this dataset has 7 columns. | Column | Description | | --- | --- | | Number | | | Date | Published Date | | Date (Onset) | Date of onset of symptoms | | Region | Region where patients live in | | Age | Patients age| | Gender | Patients gender| | Situation | This columns shows whether the patient was discharged (include death) or not.|

    tokyo_cases_byarea.csv has 4 columns. | Column | Description | | --- | --- | | Area | This column shows that which area the municipality belong. | | Municipality | Municipality name | | Positive Cases | The number of total cases | | Code | Code required to draw a choropleth map |

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vega Datasets (2025). Geospatial Data Pack for Visualization [Dataset]. https://www.kaggle.com/datasets/vega-datasets/geospatial-data-pack
Organization logo

Geospatial Data Pack for Visualization

Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

Explore at:
zip(1422109 bytes)Available download formats
Dataset updated
Oct 21, 2025
Dataset authored and provided by
Vega Datasets
Description

Geospatial Data Pack for Visualization 🗺️

Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.

Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.

Why Use This Dataset? 🤔

  • Comprehensive Geospatial Types: Explore a variety of core geospatial data models:
    • Vector Data: Includes points (like airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).
    • Raster-like Data: Work with gridded datasets (like windvectors.csv, annual-precip.json).
  • Diverse Formats: Gain experience with standard and efficient geospatial formats like GeoJSON (see Table 1, 2, 4), compressed TopoJSON (see Table 1), and plain CSV/TSV (see Table 2, 3, 4) for point data and attribute tables ready for joining.
  • Multi-Scale Coverage: Practice visualization across different geographic scales, from global and national (Table 1, 4) down to the city level (Table 1).
  • Rich Thematic Mapping: Includes multiple datasets (Table 3) specifically designed for joining attributes to geographic boundaries (like states or counties from Table 1) to create insightful choropleth maps.
  • Ready-to-Use & Example-Driven: Cleaned datasets tightly integrated with 31+ official examples (see Appendix) from Altair, Vega-Lite, and Vega, allowing you to immediately practice techniques like projections, point maps, network maps, and interactive displays.
  • Python Friendly: Works seamlessly with essential Python libraries like Altair (which can directly read TopoJSON/GeoJSON), Pandas, and GeoPandas, fitting perfectly into the Kaggle notebook environment.

Table of Contents

Dataset Inventory 🗂️

This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.

1. BASE MAP BOUNDARIES (Topological Data)

DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
US Map (1:10m)us-10m.json627 KBTopoJSONCC-BY-4.0US state and county boundaries. Contains states and counties objects. Ideal for choropleths.id (FIPS code) property on geometries
World Map (1:110m)world-110m.json117 KBTopoJSONCC-BY-4.0World country boundaries. Contains countries object. Suitable for world-scale viz.id property on geometries
London BoroughslondonBoroughs.json14 KBTopoJSONCC-BY-4.0London borough boundaries.properties.BOROUGHN (name)
London CentroidslondonCentroids.json2 KBGeoJSONCC-BY-4.0Center points for London boroughs.properties.id, properties.name
London Tube LineslondonTubeLines.json78 KBGeoJSONCC-BY-4.0London Underground network lines.properties.name, properties.color

2. GEOGRAPHIC REFERENCE POINTS (Point Data) 📍

DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
US Airportsairports.csv205 KBCSVPublic DomainUS airports with codes and coordinates.iata, state, `l...
Search
Clear search
Close search
Google apps
Main menu