Facebook
TwitterLearn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets
Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.
Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.
airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).windvectors.csv, annual-precip.json).This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Map (1:10m) | us-10m.json | 627 KB | TopoJSON | CC-BY-4.0 | US state and county boundaries. Contains states and counties objects. Ideal for choropleths. | id (FIPS code) property on geometries |
| World Map (1:110m) | world-110m.json | 117 KB | TopoJSON | CC-BY-4.0 | World country boundaries. Contains countries object. Suitable for world-scale viz. | id property on geometries |
| London Boroughs | londonBoroughs.json | 14 KB | TopoJSON | CC-BY-4.0 | London borough boundaries. | properties.BOROUGHN (name) |
| London Centroids | londonCentroids.json | 2 KB | GeoJSON | CC-BY-4.0 | Center points for London boroughs. | properties.id, properties.name |
| London Tube Lines | londonTubeLines.json | 78 KB | GeoJSON | CC-BY-4.0 | London Underground network lines. | properties.name, properties.color |
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Airports | airports.csv | 205 KB | CSV | Public Domain | US airports with codes and coordinates. | iata, state, `l... |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The choropleth map is a device used for the display of socioeconomic data associated with an areal partition of geographic space. Cartographers emphasize the need to standardize any raw count data by an area-based total before displaying the data in a choropleth map. The standardization process converts the raw data from an absolute measure into a relative measure. However, there is recognition that the standardizing process does not enable the map reader to distinguish between low–low and high–high numerator/denominator differences. This research uses concentration-based classification schemes using Lorenz curves to address some of these issues. A test data set of nonwhite birth rate by county in North Carolina is used to demonstrate how this approach differs from traditional mean–variance-based systems such as the Jenks’ optimal classification scheme.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains geometry data for the countries of the world together with their names and country codes in various formats. The primary use case is choropleths, color-coded maps. The data can be read as a pandas DataFrame with geopandas and plotted with matplotlib. See the starter notebook for an example how to do it.
The data was created by Natural Earth. It is in public domain and free to use for any purpose at the time of this writing; you might want to check their Terms of Use.
Photo by KOBU Agency on Unsplash
Facebook
TwitterThese are the results obtained from an empirical test looking at the communicative effectiveness between two types of two dimensional (2D) map formats (Choropleth maps, and Cartograms) of the Greater London area of the United Kingdom. Participants were interviewed and observed individually during the procedure. The results contain the recorded measurements of spatial accuracy, and the time taken for each participant to answers 3 test questions. A post-test qualitative reaction of each participants' preference between the two map types is recorded, along with their gender, age, visual impediments, and self-assessed map reading ability.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A comprehensive dataset of 1,513 Pakistani cities, towns, tehsils, districts and places with latitude/longitude, administrative region, population (when available) and Wikidata IDs — ideal for mapping, geospatial analysis, enrichment, and location-based ML.
Why this dataset is valuable:
Highlights (fetched from the data):
Column definitions (short):
Typical & high-value use cases:
Facebook
TwitterFolium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. These files can be used to mark the state boundaries on the map of INDIA using folium library and the CSV also contains the state data and how to use it in our notebooks. I have used it in one of my kernels which can be viewed.
The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON, and TopoJSON overlays. Due to extensible functionalities I find folium the best map plotting library in python. Do give it a try and use it in your kernels.
Facebook
TwitterCrimeMapTutorial is a step-by-step tutorial for learning crime mapping using ArcView GIS or MapInfo Professional GIS. It was designed to give users a thorough introduction to most of the knowledge and skills needed to produce daily maps and spatial data queries that uniformed officers and detectives find valuable for crime prevention and enforcement. The tutorials can be used either for self-learning or in a laboratory setting. The geographic information system (GIS) and police data were supplied by the Rochester, New York, Police Department. For each mapping software package, there are three PDF tutorial workbooks and one WinZip archive containing sample data and maps. Workbook 1 was designed for GIS users who want to learn how to use a crime-mapping GIS and how to generate maps and data queries. Workbook 2 was created to assist data preparers in processing police data for use in a GIS. This includes address-matching of police incidents to place them on pin maps and aggregating crime counts by areas (like car beats) to produce area or choropleth maps. Workbook 3 was designed for map makers who want to learn how to construct useful crime maps, given police data that have already been address-matched and preprocessed by data preparers. It is estimated that the three tutorials take approximately six hours to complete in total, including exercises.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset underlies a choropleth map of Boston area communities in which areas are shaded according to the percentage of the population that was foreign-born during each decade. The data was drawn from the US Census of Population, as well as the American Community Survey.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetic populations for regions of the World (SPW) | Jordan
Dataset information
A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).
License
Acknowledgment
This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).
Contact information
Henning.Mortveit@virginia.edu
Identifiers
| Region name | Jordan |
| Region ID | jor |
| Model | coarse |
| Version | 0_9_0 |
Statistics
| Name | Value |
|---|---|
| Population | 5723567.0 |
| Average age | 23.5 |
| Households | 1235755.0 |
| Average household size | 4.6 |
| Residence locations | 1235755.0 |
| Activity locations | 131978.0 |
| Average number of activities | 6.4 |
| Average travel distance | 44.5 |
Sources
| Description | Name | Version | Url |
|---|---|---|---|
| Activity template data | World Bank | 2021 | https://data.worldbank.org |
| Administrative boundaries | ADCW | 7.6 | https://www.adci.com/adc-worldmap |
| Curated POIs based on OSM | SLIPO/OSM POIs | http://slipo.eu/?p=1551 https://www.openstreetmap.org/ | |
| Household data | DHS | https://dhsprogram.com | |
| Population count with demographic attributes | GPW | v4.11 | https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11 |
Files description
Base data files (jor_data_v_0_9.zip)
| Filename | Description |
|---|---|
jor_person_v_0_9.csv | Data for each person including attributes such as age, gender, and household ID. |
jor_household_v_0_9.csv | Data at household level. |
jor_residence_locations_v_0_9.csv | Data about residence locations |
jor_activity_locations_v_0_9.csv | Data about activity locations, including what activity types are supported at these locations |
jor_activity_location_assignment_v_0_9.csv | For each person and for each of their activities, this file specifies the location where the activity takes place |
Derived data files
| Filename | Description |
|---|---|
jor_contact_matrix_v_0_9.csv | A POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model. |
Validation and measures files
| Filename | Description |
|---|---|
jor_household_grouping_validation_v_0_9.pdf | Validation plots for household construction |
jor_activity_durations_{adult,child}_v_0_9.pdf | Comparison of time spent on generated activities with survey data |
jor_activity_patterns_{adult,child}_v_0_9.pdf | Comparison of generated activity patterns by the time of day with survey data |
jor_location_construction_0_9.pdf | Validation plots for location construction |
jor_location_assignement_0_9.pdf | Validation plots for location assignment, including travel distribution plots |
jor_jor_ver_0_9_0_avg_travel_distance.pdf | Choropleth map visualizing average travel distance |
jor_jor_ver_0_9_0_travel_distr_combined.pdf | Travel distance distribution |
jor_jor_ver_0_9_0_num_activity_loc.pdf | Choropleth map visualizing number of activity locations |
jor_jor_ver_0_9_0_avg_age.pdf | Choropleth map visualizing average age |
jor_jor_ver_0_9_0_pop_density_per_sqkm.pdf | Choropleth map visualizing population density |
jor_jor_ver_0_9_0_pop_size.pdf | Choropleth map visualizing population size |
Facebook
TwitterOverviewThe multiple hazard index for the United States Counties was designed to map natural hazard relating to exposure to multiple natural disasters. The index was created to provide communities and public health officials with an overview of the risks that are prominent in their county, and to facilitate the comparison of hazard level between counties. Most existing hazard maps focus on a single disaster type. By creating a measure that aggregates the hazard from individual disasters, the increased hazard that results from exposure to multiple natural disasters can be better understood. The multiple hazard index represents the aggregate of hazard from eleven individual disasters. Layers displaying the hazard from each individual disaster are also included.
The hazard index is displayed visually as a choropleth map, with the color blue representing areas with less hazard and red representing areas with higher hazard. Users can click on each county to view its hazard index value, and the level of hazard for each individual disaster. Layers describing the relative level of hazard from each individual disaster are also available as choropleth maps with red areas representing high, orange representing medium, and yellow representing low levels of hazard.Methodology and Data CitationsMultiple Hazard Index
The multiple hazard index was created by coding the individual hazard classifications and summing the coded values for each United States County. Each individual hazard is weighted equally in the multiple hazard index. Alaska and Hawaii were excluded from analysis because one third of individual hazard datasets only describe the coterminous United States.
Avalanche Hazard
University of South Carolina Hazards and Vulnerability Research Institute. “Spatial Hazard Events and Losses Database”. United States Counties. “Avalanches United States 2001-2009”. < http://hvri.geog.sc.edu/SHELDUS/
Downloaded 06/2016.
Classification
Avalanche hazard was classified by dividing counties based upon the number of avalanches they experienced over the nine year period in the dataset. Avalanche hazard was not normalized by total county area because it caused an over-emphasis on small counties, and because avalanches are a highly local hazard.
None = 0 AvalanchesLow = 1 AvalancheMedium = 2-5 AvalanchesHigh = 6-10 Avalanches
Earthquake Hazard
United States Geological Survey. “Earthquake Hazard Maps”. 1:2,000,000. “Peak Ground Acceleration 2% in 50 Years”. < http://earthquake.usgs.gov/hazards/products/conterminous/
. Downloaded 07/2016.
Classification
Peak ground acceleration (% gravity) with a 2% likelihood in 50 years was averaged by United States County, and the earthquake hazard of counties was classified based upon this average.
Low = 0 - 14.25 % gravity peak ground accelerationMedium = 14.26 - 47.5 % gravity peak ground accelerationHigh = 47.5+ % gravity peak ground acceleration
Flood Hazard
United States Federal Emergency Management Administration. “National Flood Hazard Layer”. 1:10,000. “0.2 Percent Annual Flood Area”. < https://data.femadata.com/FIMA/Risk_MAP/NFHL/
. Downloaded 07/2016.
Classification
The National Flood Hazard Layer 0.2 Percent Annual Flood Area was spatially intersected with the United States Counties layer, splitting flood areas by county and adding county information to flood areas. Flood area was aggregated by county, expressed as a fraction of the total county land area, and flood hazard was classified based upon percentage of land that is susceptible to flooding. National Flood Hazard Layer does not cover the entire United States; coverage is focused on populated areas. Areas not included in National Flood Hazard Layer were assigned flood risk of Low in order to include these areas in further analysis.
Low = 0-.001% area susceptibleMedium = .00101 % - .005 % area susceptibleHigh = .00501+ % area susceptible
Heat Wave Hazard
United States Center for Disease Control and Prevention. “National Climate Assessment”. Contiguous United States Counties. “Extreme Heat Events: Heat Wave Days in May - September for years 1981-2010”. Downloaded 06/2016.
Classification
Heat wave was classified by dividing counties based upon the number of heat wave days they experienced over the 30 year time period described in the dataset.
Low = 126 - 171 Heat wave DaysMedium = 172 – 187 Heat wave DaysHigh = 188 – 255 Heat wave Days
Hurricane Hazard
National Oceanic and Atmospheric Administration. Coastal Services Center. “Historical North Atlantic Tropical Cyclone Tracks, 1851-2004”. 1: 2,000,000. < https://catalog.data.gov/dataset/historical-north-atlantic-tropical-cyclone-tracks-1851-2004-direct-download
. Downloaded 06/2016.
National Oceanic and Atmospheric Administration. Coastal Services Center. “Historical North Pacific Tropical Cyclone Tracks, 1851-2004”. 1: 2,000,000. < https://catalog.data.gov/dataset/historical-north-atlantic-tropical-cyclone-tracks-1851-2004-direct-download
. Downloaded 06/2016.
Classification
Atlantic and Pacific datasets were merged. Tropical storm and disturbance tracks were filtered out leaving hurricane tracks. Each hurricane track was assigned the value of the category number that describes that event. Weighting each event by intensity ensures that areas with higher intensity events are characterized as being more hazardous. Values describing each hurricane event were aggregated by United States County, normalized by total county area, and the hurricane hazard of counties was classified based upon the normalized value.
Landslide Hazard
United States Geological Survey. “Landslide Overview Map of the United States”. 1:4,000,000. “Landslide Incidence and Susceptibility in the Conterminous United States”. < https://catalog.data.gov/dataset/landslide-incidence-and-susceptibility-in-the-conterminous-united-states-direct-download
. Downloaded 07/2016.
Classification
The classifications of High, Moderate, and Low landslide susceptibility and incidence from the study were numerically coded, the average value was computed for each county, and the landslide hazard was classified based upon the average value.
Long-Term Drought Hazard
United States Drought Monitor, Drought Mitigation Center, United States Department of Agriculture, National Oceanic and Atmospheric Administration. “Drought Monitor Summary Map”. “Long-Term Drought Impact”. < http://droughtmonitor.unl.edu/MapsAndData/GISData.aspx >. Downloaded 06/2016.
Classification
Short-term drought areas were filtered from the data; leaving only long-term drought areas. United States Counties were assigned the average U.S. Drought Monitor Classification Scheme Drought Severity Classification value that characterizes the county area. County long-term drought hazard was classified based upon average Drought Severity Classification value.
Low = 1 – 1.75 average Drought Severity Classification valueMedium = 1.76 -3.0 average Drought Severity Classification valueHigh = 3.0+ average Drought Severity Classification value
Snowfall Hazard
United States National Oceanic and Atmospheric Administration. “1981-2010 U.S. Climate Normals”. 1: 2,000,000. “Annual Snow Normal”. < http://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/products/precipitation/
. Downloaded 08/2016.
Classification
Average yearly snowfall was joined with point location of weather measurement stations, and stations without valid snowfall measurements were filtered out (leaving 6233 stations). Snowfall was interpolated using least squared distance interpolation to create a .05 degree raster describing an estimate of yearly snowfall for the United States. The average yearly snowfall raster was aggregated by county to yield the average yearly snowfall per United States County. The snowfall risk of counties was classified by average snowfall.
None = 0 inchesLow = .01- 10 inchesMedium = 10.01- 50 inchesHigh = 50.01+ inches
Tornado Hazard
United States National Oceanic and Atmospheric Administration Storm Prediction Center. “Severe Thunderstorm Database and Storm Data Publication”. 1: 2,000,000. “United States Tornado Touchdown Points 1950-2004”. < https://catalog.data.gov/dataset/united-states-tornado-touchdown-points-1950-2004-direct-download
. Downloaded 07/2016.
Classification
Each tornado touchdown point was assigned the value of the Fujita Scale that describes that event. Weighting each event by intensity ensures that areas with higher intensity events are characterized as more hazardous. Values describing each tornado event were aggregated by United States County, normalized by total county area, and the tornado hazard of counties was classified based upon the normalized value.
Volcano Hazard
Smithsonian Institution National Volcanism Program. “Volcanoes of the World”. “Holocene Volcanoes”. < http://volcano.si.edu/search_volcano.cfm
. Downloaded 07/2016.
Classification
Volcano coordinate locations from spreadsheet were mapped and aggregated by United States County. Volcano count was normalized by county area, and the volcano hazard of counties was classified based upon the number of volcanoes present per unit area.
None = 0 volcanoes/100 kilometersLow = 0.000915 - 0.007611 volcanoes / 100 kilometersMedium = 0.007612 - 0.018376 volcanoes / 100 kilometersHigh = 0.018377- 0.150538 volcanoes / 100 kilometers
Wildfire Hazard
United States Department of Agriculture, Forest Service, Fire, Fuel, and Smoke Science Program. “Classified 2014 Wildfire Hazard Potential”. 270 meters. < http://www.firelab.org/document/classified-2014-whp-gis-data-and-maps
. Downloaded 06/2016.
Classification
The classifications of Very High, High, Moderate, Low, Very Low, and Non-Burnable/Water wildfire hazard from the study were numerically coded, the average value was computed for each county, and the wildfire hazard was classified based upon the average value.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetic populations for regions of the World (SPW) | Delhi
Dataset information
A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).
License
Acknowledgment
This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).
Contact information
Henning.Mortveit@virginia.edu
Identifiers
| Region name | Delhi |
| Region ID | ind_140001944 |
| Model | coarse |
| Version | 0_9_0 |
Statistics
| Name | Value |
|---|---|
| Population | 15951510 |
| Average age | 28.2 |
| Households | 3625935 |
| Average household size | 4.4 |
| Residence locations | 3625935 |
| Activity locations | 1309377 |
| Average number of activities | 5.5 |
| Average travel distance | 26.6 |
Sources
| Description | Name | Version | Url |
|---|---|---|---|
| Activity template data | World Bank | 2021 | https://data.worldbank.org |
| Administrative boundaries | ADCW | 7.6 | https://www.adci.com/adc-worldmap |
| Curated POIs based on OSM | SLIPO/OSM POIs | http://slipo.eu/?p=1551 https://www.openstreetmap.org/ | |
| Household data | DHS | https://dhsprogram.com | |
| Population count with demographic attributes | GPW | v4.11 | https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11 |
Files description
Base data files (ind_140001944_data_v_0_9.zip)
| Filename | Description |
|---|---|
ind_140001944_person_v_0_9.csv | Data for each person including attributes such as age, gender, and household ID. |
ind_140001944_household_v_0_9.csv | Data at household level. |
ind_140001944_residence_locations_v_0_9.csv | Data about residence locations |
ind_140001944_activity_locations_v_0_9.csv | Data about activity locations, including what activity types are supported at these locations |
ind_140001944_activity_location_assignment_v_0_9.csv | For each person and for each of their activities, this file specifies the location where the activity takes place |
Derived data files
| Filename | Description |
|---|---|
ind_140001944_contact_matrix_v_0_9.csv | A POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model. |
Validation and measures files
| Filename | Description |
|---|---|
ind_140001944_household_grouping_validation_v_0_9.pdf | Validation plots for household construction |
ind_140001944_activity_durations_{adult,child}_v_0_9.pdf | Comparison of time spent on generated activities with survey data |
ind_140001944_activity_patterns_{adult,child}_v_0_9.pdf | Comparison of generated activity patterns by the time of day with survey data |
ind_140001944_location_construction_0_9.pdf | Validation plots for location construction |
ind_140001944_location_assignement_0_9.pdf | Validation plots for location assignment, including travel distribution plots |
ind_140001944_ind_140001944_ver_0_9_0_avg_travel_distance.pdf | Choropleth map visualizing average travel distance |
ind_140001944_ind_140001944_ver_0_9_0_travel_distr_combined.pdf | Travel distance distribution |
ind_140001944_ind_140001944_ver_0_9_0_num_activity_loc.pdf | Choropleth map visualizing number of activity locations |
ind_140001944_ind_140001944_ver_0_9_0_avg_age.pdf | Choropleth map visualizing average age |
ind_140001944_ind_140001944_ver_0_9_0_pop_density_per_sqkm.pdf | Choropleth map visualizing population density |
ind_140001944_ind_140001944_ver_0_9_0_pop_size.pdf | Choropleth map visualizing population size |
Facebook
TwitterPostGIS data for London and Greater London ward boundaries as of 2018.
This dataset is used in the london_votes sample Splitfile in which the 2017 General Election results and London Ward geodata are joined through the ONS UK Ward-Constituency lookup table to build a dataset of London constituencies and Conservative/Labour votes in each, ready for plotting as a Choropleth map.
https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london
Contains National Statistics data © Crown copyright and database right 2012
Contains Ordnance Survey data © Crown copyright and database right 2012
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetic populations for regions of the World (SPW) | Manipur
Dataset information
A synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).
License
Acknowledgment
This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).
Contact information
Henning.Mortveit@virginia.edu
Identifiers
| Region name | Manipur |
| Region ID | ind_140001942 |
| Model | coarse |
| Version | 0_9_0 |
Statistics
| Name | Value |
|---|---|
| Population | 2796700 |
| Average age | 27.5 |
| Households | 635806 |
| Average household size | 4.4 |
| Residence locations | 635806 |
| Activity locations | 192709 |
| Average number of activities | 5.5 |
| Average travel distance | 78.3 |
Sources
| Description | Name | Version | Url |
|---|---|---|---|
| Activity template data | World Bank | 2021 | https://data.worldbank.org |
| Administrative boundaries | ADCW | 7.6 | https://www.adci.com/adc-worldmap |
| Curated POIs based on OSM | SLIPO/OSM POIs | http://slipo.eu/?p=1551 https://www.openstreetmap.org/ | |
| Household data | DHS | https://dhsprogram.com | |
| Population count with demographic attributes | GPW | v4.11 | https://sedac.ciesin.columbia.edu/data/set/gpw-v4-admin-unit-center-points-population-estimates-rev11 |
Files description
Base data files (ind_140001942_data_v_0_9.zip)
| Filename | Description |
|---|---|
ind_140001942_person_v_0_9.csv | Data for each person including attributes such as age, gender, and household ID. |
ind_140001942_household_v_0_9.csv | Data at household level. |
ind_140001942_residence_locations_v_0_9.csv | Data about residence locations |
ind_140001942_activity_locations_v_0_9.csv | Data about activity locations, including what activity types are supported at these locations |
ind_140001942_activity_location_assignment_v_0_9.csv | For each person and for each of their activities, this file specifies the location where the activity takes place |
Derived data files
| Filename | Description |
|---|---|
ind_140001942_contact_matrix_v_0_9.csv | A POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model. |
Validation and measures files
| Filename | Description |
|---|---|
ind_140001942_household_grouping_validation_v_0_9.pdf | Validation plots for household construction |
ind_140001942_activity_durations_{adult,child}_v_0_9.pdf | Comparison of time spent on generated activities with survey data |
ind_140001942_activity_patterns_{adult,child}_v_0_9.pdf | Comparison of generated activity patterns by the time of day with survey data |
ind_140001942_location_construction_0_9.pdf | Validation plots for location construction |
ind_140001942_location_assignement_0_9.pdf | Validation plots for location assignment, including travel distribution plots |
ind_140001942_ind_140001942_ver_0_9_0_avg_travel_distance.pdf | Choropleth map visualizing average travel distance |
ind_140001942_ind_140001942_ver_0_9_0_travel_distr_combined.pdf | Travel distance distribution |
ind_140001942_ind_140001942_ver_0_9_0_num_activity_loc.pdf | Choropleth map visualizing number of activity locations |
ind_140001942_ind_140001942_ver_0_9_0_avg_age.pdf | Choropleth map visualizing average age |
ind_140001942_ind_140001942_ver_0_9_0_pop_density_per_sqkm.pdf | Choropleth map visualizing population density |
ind_140001942_ind_140001942_ver_0_9_0_pop_size.pdf | Choropleth map visualizing population size |
Facebook
TwitterThe dataset contains Local Authority Boundaries for Great Britain (England, Scotland and Wales) as of December 2021. A total of 363 Local Authority objects are included. Created for future use in folium choropleth maps when combined with other datasets that contain the matching Local Authority Codes. Additionally, subsets were created for convenience holding the boundaries of local authorities in England and Wales together, and in each individual country, i.e., England, Scotland and Wales on their own.
The original dataset was downloaded from ONS. Since the dataset was too large for most use cases (129.4MB) due to the level of detail, it was simplified with https://mapshaper.org/ using the default method (Visvalingam / weighted area) with 'prevent shape removal' enabled. The simplification was set to 1.4%, followed by intersection repair and export back to geojson. The shape coordinates were originally in British National Grid (BNG) format, which had to be converted to WGS84 (latitude and longitude) format. Finally, the coordinates were rounded to 6 decimal places, resulting in a file containing 2.2MB of uncompressed data with a sensible level of detail. The individual country data were extracted, based on the LAD21CD property, to create the additional files.
https://www.ons.gov.uk/methodology/geography/licences
Digital boundary products and reference maps are supplied under the Open Government Licence. You must use the following copyright statements when you reproduce or use this material:
- Source: Office for National Statistics licensed under the Open Government Licence v.3.0
- Contains OS data © Crown copyright and database right 2023
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are 2 files with Geographic vector layers with Continent Boundaries. You can use this data to create Choropleth maps or for any map visualization that requires vectors at the Continent level.
Citation of the data source:
Shepherd, Stephanie (2020). Continent Polygons. figshare. Dataset. https://doi.org/10.6084/m9.figshare.12555170.v3
I transformed the data lightly to make a GPKG file and to add the documents to Kaggle.
The original data is similar to the file continent_boundaries_8.gpkg, which includes the following continents:
I also created a file called continent_boundaries_7.gpkg that merges Australia and Oceania as a single continent.
You can find the transformations in the following Kaggle Notebook: https://www.kaggle.com/code/ericnarro/create-continents-geodataframe-and-file/notebook
Facebook
TwitterThis is dataset which you can find population of Ecuadorian cities in 2022 . The data downloaded from this website. In my case, I utilize this data for making choropleth map for analyzing data of "Store Sales - Time Series Forecasting" data and please freely utilize this data for such use. (Thank you very much for "World Population Review"!)
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The dataset consists of COVID-19 cases in Malaysia from 27 March 2020 to 15 April 2021. This dataset is collected for the purpose of creating better visualizations for the COVID-19 cases in Malaysia. All of the data is web scraped from https://kpkesihatan.com/ by using BeautifulSoup library.
The data is also available in GitHub, along with the scripts made to scrape the data. There is also a Web Application made to show the visualizations.
Originally I planned to update the data daily but I find that it seems too tedious for me to do this alone without some sort of automated scripts or schedulers. I have been wondering how to do this efficiently with automation or schedulers, if someone knows how to do this efficiently, please reach out to me by emailing or message in LinkedIn, the links can be found in my GitHub, thank you very much.
There are three CSV files and one GeoJSON file:
- all_2020-03-27_2021-04-15.csv: all daily cases excluding state data
- state_all.csv: all daily cases for each state
- state_cumu.csv: all daily cumulative cases for each state
- malaysia_state_province_boundary.geojson: Malaysia's GeoJSON map file
The columns consist of: 1. Date 2. Recovered 3. Cumulative Recovered 4. Imported Case (many NaN values till the end of 2020) 5. Local Case (many NaN values) 6. Active Case (many NaN values but can be inferred) 7. New Case 8. Cumulative Case 9. ICU - Number of patients admitted into Intensive Care Unit 10. Ventilator - Number of patients who need ventilator in ICU 11. Death 12. Cumulative Death 13. URL - link to the original webpage
Thanks to Info GIS MAP.com that provides Malaysia's GeoJSON file to create Choropleth maps.
Hopefully, there will be people utilizing the scripts or the data to create better visualizations.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
How was the data collected? The dataset was collected from Crop Residue Burning Information and Management System (CRBIMS), Punjab Remote Sensing Centre (PRSC). The data is collected for all the districts of Punjab from April 2016 to June 2024.
What can be done with this data?
Visualization: This data can be used for visualization purposes. Choropleth maps can be made using the coordinates available.
https://github.com/waitasecant/CRBIMS/blob/main/trends.png?raw=true" alt="">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Tokyo is the largest prefecture and has the largest number of cases of COVID-19 in Japan. The number of total confirmed cases in Tokyo is about 73000 (as of January 9th, 2021). In this dataset, data about COVID-19 in Tokyo contain. If you want to download it, please consider upvoting.
Data was collected from Tokyo Metropolitan Government Open Data Catalog Site and Updates on COVID-19 in Tokyo.
tokyo_covid19_patients.csv file in this dataset has 7 columns. | Column | Description | | --- | --- | | Number | | | Date | Published Date | | Date (Onset) | Date of onset of symptoms | | Region | Region where patients live in | | Age | Patients age| | Gender | Patients gender| | Situation | This columns shows whether the patient was discharged (include death) or not.|
tokyo_cases_byarea.csv has 4 columns. | Column | Description | | --- | --- | | Area | This column shows that which area the municipality belong. | | Municipality | Municipality name | | Positive Cases | The number of total cases | | Code | Code required to draw a choropleth map |
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterLearn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets
Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.
Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.
airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).windvectors.csv, annual-precip.json).This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Map (1:10m) | us-10m.json | 627 KB | TopoJSON | CC-BY-4.0 | US state and county boundaries. Contains states and counties objects. Ideal for choropleths. | id (FIPS code) property on geometries |
| World Map (1:110m) | world-110m.json | 117 KB | TopoJSON | CC-BY-4.0 | World country boundaries. Contains countries object. Suitable for world-scale viz. | id property on geometries |
| London Boroughs | londonBoroughs.json | 14 KB | TopoJSON | CC-BY-4.0 | London borough boundaries. | properties.BOROUGHN (name) |
| London Centroids | londonCentroids.json | 2 KB | GeoJSON | CC-BY-4.0 | Center points for London boroughs. | properties.id, properties.name |
| London Tube Lines | londonTubeLines.json | 78 KB | GeoJSON | CC-BY-4.0 | London Underground network lines. | properties.name, properties.color |
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Airports | airports.csv | 205 KB | CSV | Public Domain | US airports with codes and coordinates. | iata, state, `l... |