12 datasets found
  1. Geospatial Data Pack for Visualization

    • kaggle.com
    zip
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vega Datasets (2025). Geospatial Data Pack for Visualization [Dataset]. https://www.kaggle.com/datasets/vega-datasets/geospatial-data-pack
    Explore at:
    zip(1422109 bytes)Available download formats
    Dataset updated
    Oct 21, 2025
    Dataset authored and provided by
    Vega Datasets
    Description

    Geospatial Data Pack for Visualization đŸ—ș

    Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

    Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.

    Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.

    Why Use This Dataset? đŸ€”

    • Comprehensive Geospatial Types: Explore a variety of core geospatial data models:
      • Vector Data: Includes points (like airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).
      • Raster-like Data: Work with gridded datasets (like windvectors.csv, annual-precip.json).
    • Diverse Formats: Gain experience with standard and efficient geospatial formats like GeoJSON (see Table 1, 2, 4), compressed TopoJSON (see Table 1), and plain CSV/TSV (see Table 2, 3, 4) for point data and attribute tables ready for joining.
    • Multi-Scale Coverage: Practice visualization across different geographic scales, from global and national (Table 1, 4) down to the city level (Table 1).
    • Rich Thematic Mapping: Includes multiple datasets (Table 3) specifically designed for joining attributes to geographic boundaries (like states or counties from Table 1) to create insightful choropleth maps.
    • Ready-to-Use & Example-Driven: Cleaned datasets tightly integrated with 31+ official examples (see Appendix) from Altair, Vega-Lite, and Vega, allowing you to immediately practice techniques like projections, point maps, network maps, and interactive displays.
    • Python Friendly: Works seamlessly with essential Python libraries like Altair (which can directly read TopoJSON/GeoJSON), Pandas, and GeoPandas, fitting perfectly into the Kaggle notebook environment.

    Table of Contents

    Dataset Inventory đŸ—‚ïž

    This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.

    1. BASE MAP BOUNDARIES (Topological Data)

    DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
    US Map (1:10m)us-10m.json627 KBTopoJSONCC-BY-4.0US state and county boundaries. Contains states and counties objects. Ideal for choropleths.id (FIPS code) property on geometries
    World Map (1:110m)world-110m.json117 KBTopoJSONCC-BY-4.0World country boundaries. Contains countries object. Suitable for world-scale viz.id property on geometries
    London BoroughslondonBoroughs.json14 KBTopoJSONCC-BY-4.0London borough boundaries.properties.BOROUGHN (name)
    London CentroidslondonCentroids.json2 KBGeoJSONCC-BY-4.0Center points for London boroughs.properties.id, properties.name
    London Tube LineslondonTubeLines.json78 KBGeoJSONCC-BY-4.0London Underground network lines.properties.name, properties.color

    2. GEOGRAPHIC REFERENCE POINTS (Point Data) 📍

    DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
    US Airportsairports.csv205 KBCSVPublic DomainUS airports with codes and coordinates.iata, state, `l...
  2. Pakistan Cities— 1,513 locations with lat/lon/pop

    • kaggle.com
    zip
    Updated Aug 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ikram Ul Hassan (2025). Pakistan Cities— 1,513 locations with lat/lon/pop [Dataset]. https://www.kaggle.com/datasets/ikramshah512/pakistan-cities-wikidata-linked-1513-locations
    Explore at:
    zip(42829 bytes)Available download formats
    Dataset updated
    Aug 17, 2025
    Authors
    Ikram Ul Hassan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Pakistan
    Description

    A comprehensive dataset of 1,513 Pakistani cities, towns, tehsils, districts and places with latitude/longitude, administrative region, population (when available) and Wikidata IDs — ideal for mapping, geospatial analysis, enrichment, and location-based ML.

    Why this dataset is valuable:

    • Full geocoordinates for every entry (100% coverage) — ready for mapping and spatial joins.
    • Wide geographic coverage across all 7 major regions of Pakistan (provinces / administrative regions).
    • Wikidata IDs included for reliable cross-referencing and automatic enrichment from external knowledge bases.
    • Useful for data scientists, GIS engineers, civic tech projects, academic research, and startups building Pakistan-focused location services.

    Highlights (fetched from the data):

    • Total rows: 1,513
    • Unique places (city field): 1,497
    • Rows with population > 0: 526 (≈34.8%)
    • Coordinate coverage: 1513 / 1513 (100%) — directly usable with mapping libraries.

    Column definitions (short):

    • id — Internal numeric row id (unique integer).
    • wikiDataId — Wikidata QID (e.g., Q####) for the place; use to fetch rich metadata.
    • type — Administrative/place type (e.g., ADM1, ADM2, city, district, tehsil).
    • city — Common/local city/place name (short label).
    • name — Full name / official name of the place (may include “District”, “Tehsil”, etc.).
    • country — Country name (Pakistan).
    • countryCode — ISO country code (e.g., PK).
    • region — Primary administrative region / province (e.g., Punjab, Sindh).
    • regionCode — Short code for region (e.g., PB, KP depending on your encoding).
    • regionWdId — Wikidata QID for the region.
    • latitude — Latitude in decimal degrees (float).
    • longitude — Longitude in decimal degrees (float).
    • population — Integer population (0 or NA where unknown).

    Typical & high-value use cases:

    • Mapping & visualization: choropleth maps, point overlays, heatmaps of population or density.
    • Geospatial analysis: distance calculations, nearest-neighbor queries, clustering of urban centers.
    • Data enrichment: join with other datasets (OpenStreetMap, Wikidata, census data) using wikiDataId and coordinates.
    • Machine learning & NLP: training geolocation models, geoparsing, toponym resolution, place name disambiguation.
    • Urban planning & research: analyze distribution of population-ready places vs administrative units.
    • Mobile / location-based apps: lookup & reverse geocoding fallback, seeding POI databases for Pakistan.
    • Humanitarian & disaster response: baseline location lists for logistics and situational awareness.
  3. d

    How to select appropriate hue ranges for sequential color schemes on...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tai sheng Chen; Xi Lv; Kun Hu; Meng lin Chen; Lu Cheng; Wei xing Jiang (2025). How to select appropriate hue ranges for sequential color schemes on choropleth maps? A quantitative evaluation using map reading experiments [Dataset]. http://doi.org/10.5061/dryad.c59zw3rdt
    Explore at:
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Tai sheng Chen; Xi Lv; Kun Hu; Meng lin Chen; Lu Cheng; Wei xing Jiang
    Time period covered
    Jan 1, 2023
    Description

    We propose map reading experiments to quantitatively evaluate the selection of hue ranges for sequential color schemes on choropleth maps. In these experiments, 60 sequential color schemes with six base hues and ten hue ranges were employed as experimental color schemes, and a total of 414 college students were invited to complete identification, comparison, and ranking tasks. Both controlled and real-map experiments were performed, each involving a web-based survey and an eye-tracking experiment. In the controlled experiments, the shapes of the map objects were relatively regular, and attribute data were randomized. In contrast, the shapes were complex in real-map experiments, and real data were employed. Our findings show that widely used color schemes with a hue range of 0ÂÂș yield poor performance in all tasks; 15ÂÂș hue ranges yield good performance in the comparison and ranking tasks but poor performance in the identification task. For large hue ranges of 120-360ÂÂș, participants showed...

  4. Mapping 2021 Census Data using the Living Atlas

    • lecture-with-gis-esriukeducation.hub.arcgis.com
    • teachwithgis.co.uk
    Updated Apr 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri UK Education (2025). Mapping 2021 Census Data using the Living Atlas [Dataset]. https://lecture-with-gis-esriukeducation.hub.arcgis.com/datasets/mapping-2021-census-data-using-the-living-atlas
    Explore at:
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    Esri UK Education
    Description

    Anyone who has taught GIS using Census Data knows it is an invaluable data set for showing students how to take data stored in a table and join it to boundary data to transform this data into something that can be visualised and analysed spatially. Joins are a core GIS skill and need to be learnt, as not every data set is going to come neatly packaged as a shapefile or feature layer with all the data you need stored within. I don't know how many times I taught students to download data as a table from Nomis, load it into a GIS and then join that table data to the appropriate boundary data so they could produce choropleth maps to do some visual analysis, but it was a lot! Once students had gotten the hang of joins using census data they'd often ask why this data doesn't exist as a prepackaged feature layer with all the data they wanted within it. Well good news, now a lot off it is and it's accessible through the Living Atlas! Don't get me wrong I fully understand the importance of teaching students how to perform joins but once you have this understanding if you can access data that already contains all the information you need then you should be taking advantage of it to save you time. So in this exercise I am going to show you how to load English and Welsh Census Data from the 2021 Census into the ArcGIS Map Viewer from the Living Atlas and produce some choropleth maps to use to perform visual analysis without having to perform a single join.

  5. NYC zipcode geodata

    • kaggle.com
    zip
    Updated Sep 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saidakbarp (2019). NYC zipcode geodata [Dataset]. https://www.kaggle.com/saidakbarp/nyc-zipcode-geodata
    Explore at:
    zip(552766 bytes)Available download formats
    Dataset updated
    Sep 23, 2019
    Authors
    Saidakbarp
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    New York
    Description

    Context

    I used this publicly available data for making interactive map visualization of NYC. Zipcode geodata is useful for building interactive maps with each zip code area representing a separate area on the map.

    Content

    NYC zipcode geodata in geojson format

    Acknowledgements

    The rights belong to the original authors.

  6. FOLIUM_INDIA

    • kaggle.com
    zip
    Updated Jun 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KD007 (2020). FOLIUM_INDIA [Dataset]. https://www.kaggle.com/krishcross/india-shape-map
    Explore at:
    zip(16183750 bytes)Available download formats
    Dataset updated
    Jun 15, 2020
    Authors
    KD007
    Area covered
    India
    Description

    Folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. These files can be used to mark the state boundaries on the map of INDIA using folium library and the CSV also contains the state data and how to use it in our notebooks. I have used it in one of my kernels which can be viewed.

    The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON, and TopoJSON overlays. Due to extensible functionalities I find folium the best map plotting library in python. Do give it a try and use it in your kernels.

  7. d

    Data from: CrimeMapTutorial Workbooks and Sample Data for ArcView and...

    • catalog.data.gov
    • icpsr.umich.edu
    • +1more
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). CrimeMapTutorial Workbooks and Sample Data for ArcView and MapInfo, 2000 [Dataset]. https://catalog.data.gov/dataset/crimemaptutorial-workbooks-and-sample-data-for-arcview-and-mapinfo-2000-3c9be
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    National Institute of Justice
    Description

    CrimeMapTutorial is a step-by-step tutorial for learning crime mapping using ArcView GIS or MapInfo Professional GIS. It was designed to give users a thorough introduction to most of the knowledge and skills needed to produce daily maps and spatial data queries that uniformed officers and detectives find valuable for crime prevention and enforcement. The tutorials can be used either for self-learning or in a laboratory setting. The geographic information system (GIS) and police data were supplied by the Rochester, New York, Police Department. For each mapping software package, there are three PDF tutorial workbooks and one WinZip archive containing sample data and maps. Workbook 1 was designed for GIS users who want to learn how to use a crime-mapping GIS and how to generate maps and data queries. Workbook 2 was created to assist data preparers in processing police data for use in a GIS. This includes address-matching of police incidents to place them on pin maps and aggregating crime counts by areas (like car beats) to produce area or choropleth maps. Workbook 3 was designed for map makers who want to learn how to construct useful crime maps, given police data that have already been address-matched and preprocessed by data preparers. It is estimated that the three tutorials take approximately six hours to complete in total, including exercises.

  8. Covid-19_WorldSpreading

    • kaggle.com
    Updated Sep 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Hany (2020). Covid-19_WorldSpreading [Dataset]. https://www.kaggle.com/mohamedhanyyy/covid19-worldspreading/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed Hany
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Story behind the dataset

    I wanted to Collect all the Covid-19 cases all over the world and make analysis on it

    Data is simple but can bring a lot of insights

    Data is classified into 4 columns (Country/Region', 'Confirmed', 'Country Abbr 2', 'Country Abbr 3)

    1. Country/Region contain all world Countries

    2. Confirmed contain all confirmed Covid-19 cases

    3. Country Abbr 2 contain every country with the abbreviation of 2 letter

    4. Country Abbr 3 contain every country with the abbreviation of 3 letter

    This 2 columns are useful to use in visualization of Choropleth with plotly to make the world map Data is collected from many resources to be accurate

  9. Data from: Depends on how you count them: the value of general propensity...

    • tandf.figshare.com
    docx
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Bekker (2023). Depends on how you count them: the value of general propensity choropleth maps for visualising databases of protest incidents [Dataset]. http://doi.org/10.6084/m9.figshare.19642925.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Martin Bekker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public protest represents an important sanction on rulers and institutions. Protest is a quotidian phenomenon in South Africa; perhaps the defining element of post-apartheid political life. Geographic representations of protest abound – typically dot distribution maps – but these merely confirm that more protests occur where there are more people. Visualisations of protest per capita and protestors per capita (or ‘general propensity’), which are best rendered as choropleth maps, are well-placed to overcome this limitation. The South African Police Services' database of protest is the largest publicly-available single-country protest event database. Having used machine learning to classify 89,000 protest events, I locate each within one of the country's 234 municipalities, and depict these events using counts, count per capita, and the general propensity. This reveals a proportionally high number of rural protests, and that municipalities hosting major industries, along with provincial seats of government, present the highest propensity for protest.

  10. Homicide Rates in Mexico by State (1990-2023)

    • figshare.com
    csv
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Montserrat Mora (2025). Homicide Rates in Mexico by State (1990-2023) [Dataset]. http://doi.org/10.6084/m9.figshare.28067651.v4
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Montserrat Mora
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Mexico
    Description

    This project provides a comprehensive dataset on intentional homicides in Mexico from 1990 to 2023, disaggregated by sex and state. It includes both raw data and tools for visualization, making it a valuable resource for researchers, policymakers, and analysts studying violence trends, gender disparities, and regional patterns.ContentsHomicide Data: Total number of male and female victims per state and year.Population Data: Corresponding male and female population estimates for each state and year.Homicide Rates: Per 100,000 inhabitants, calculated for both sexes.Choropleth Map Script: A Python script that generates homicide rate maps using a GeoJSON file.GeoJSON File: A spatial dataset defining Mexico's state boundaries, used for mapping.Sample Figure: A pre-generated homicide rate map for 2023 as an example.Requirements File: A requirements.txt file listing necessary dependencies for running the script.SourcesHomicide Data: INEGI - Vital Statistics MicrodataPopulation Data: Mexican Population Projections 2020-2070This dataset enables spatial analysis and data visualization, helping users explore homicide trends across Mexico in a structured and reproducible way.

  11. a

    5 year Male Colorectal Cancer Incidence MSSA

    • usc-geohealth-hub-uscssi.hub.arcgis.com
    Updated Nov 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spatial Sciences Institute (2021). 5 year Male Colorectal Cancer Incidence MSSA [Dataset]. https://usc-geohealth-hub-uscssi.hub.arcgis.com/datasets/5-year-male-colorectal-cancer-incidence-mssa
    Explore at:
    Dataset updated
    Nov 12, 2021
    Dataset authored and provided by
    Spatial Sciences Institute
    Area covered
    Description

    Medical Service Study Areas (MSSAs)As defined by California's Office of Statewide Health Planning and Development (OSHPD) in 2013, "MSSAs are sub-city and sub-county geographical units used to organize and display population, demographic and physician data" (Source). Each census tract in CA is assigned to a given MSSA. The most recent MSSA dataset (2014) was used. Spatial data are available via OSHPD at the California Open Data Portal. This information may be useful in studying health equity.Age-Adjusted Incidence Rate (AAIR)Age-adjustment is a statistical method that allows comparisons of incidence rates to be made between populations with different age distributions. This is important since the incidence of most cancers increases with age. An age-adjusted cancer incidence (or death) rate is defined as the number of new cancers (or deaths) per 100,000 population that would occur in a certain period of time if that population had a 'standard' age distribution. In the California Health Maps, incidence rates are age-adjusted using the U.S. 2000 Standard Population.Cancer incidence ratesIncidence rates were calculated using case counts from the California Cancer Registry. Population data from 2010 Census and SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators. Yearly SEER 2015 census tract estimates by race/origin (controlling to Vintage 2015) were used to estimate population denominators for 5-year incidence rates (2013-2017)According to California Department of Public Health guidelines, cancer incidence rates cannot be reported if based on <15 cancer cases and/or a population <10,000 to ensure confidentiality and stable statistical rates.Spatial extent: CaliforniaSpatial Unit: MSSACreated: n/aUpdated: n/aSource: California Health MapsContact Email: gbacr@ucsf.eduSource Link: https://www.californiahealthmaps.org/?areatype=mssa&address=&sex=Both&site=AllSite&race=&year=05yr&overlays=none&choropleth=Obesity

  12. BIXI Montreal (public bicycle sharing system)

    • kaggle.com
    zip
    Updated Dec 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aubert Sigouin (2017). BIXI Montreal (public bicycle sharing system) [Dataset]. https://www.kaggle.com/aubertsigouin/biximtl
    Explore at:
    zip(174091411 bytes)Available download formats
    Dataset updated
    Dec 1, 2017
    Authors
    Aubert Sigouin
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    Montreal
    Description

    Context

    « BIXI Montréal is a public bicycle sharing system serving Montréal, Quebec, Canada.

    Launched in May 2009, it is North America's first large-scale bike sharing system and the original BIXI brand of systems.

    The location of a BIXI bike station is determined by several parameters, including population density, points of interest and activities (universities, bike paths, other transportation networks, and data on travel patterns of the general public. In 2009, 5,000 bikes were deployed in Montreal through a network of pay stations located mainly in the boroughs of Rosemont–La Petite-Patrie, the Plateau-Mont-Royal and Ville-Marie, spilling over into parts of Outremont and the South West. As of 2011, the system has spread to Hochelaga-Maisonneuve, Villeray–Saint-Michel–Parc-Extension, Ahuntsic, CĂŽte-des-Neiges–Notre-Dame-de-GrĂące, Westmount and Verdun. » [1]

    Content

    1. BIXI - Movements history

    • Datasets containing the details of the travels made via the BIXI MontrĂ©al self-service bike network. Each year is a .zip file (ex. BixiMontrealRentals2014.zip) containing several .CSV files (ex. OD_2014-04.csv) for each months.

      Version 2 : All .csv are merged per year (OD-year.csv).

    • The data is extracted from the BIXI MontrĂ©al station and bike management system. Trips of less than 1 minute or more than 2 hours are excluded. The station identifiers used correspond to those of the station status data set

    Data Sructure

    • start_date: Date and time of the start of the trip ( AAAA-MM-JJ hh:mm )
    • start_station_code: Start station ID
    • end_date: Date and time of the start of the trip ( AAAA-MM-JJ hh:mm )
    • end_station_code : End station ID
    • is_member : Type users. (1 : Suscriber, 0 : Non-suscriber)
    • duration_sec: Total travel time in seconds

    2. BIXI - The condition of stations

    • This dataset presents the list of stations in the BIXI MontrĂ©al self-service bicycle network, including the geographic position, the number of bicycles available and the number of terminals available. It is a .json file, which corresponds to a dictionary.

    • The data is produced by the BIXI MontrĂ©al station management system with a refresh rate of 5 minutes. Station locations are generally stable over time, but may be subject to change during the season, particularly when the City of Montreal is carrying out work or as part of special events. Temporary storage stations are not included in station status. The data is automatically generated on the BIXI servers, so the date of last update of this dataset does not represent the actual date of update. Information about bikes and bad terminals is available in the JSON format file.

    Data Sructure

    • id: Unique station ID
    • s: Name of the station
    • n: Station terminal ID
    • st: Station status
    • b: Boolean value (true or false) specifying whether the station is blocked
    • su: Boolean value (true or false) specifying whether the station is suspended
    • m: Boolean value (true or false) specifying whether the station is displayed as out of service
    • read: timestamp of the last update of the data in milliseconds since January 1, 1970.
    • lc: timestamp of the last communication with the server in milliseconds since January 1, 1970.
    • bk: (For future use)
    • bl: (For future use)
    • la: latitude of the station according to the geodesic datum WGS84
    • lo: longitude of the station according to the WGS84 datum
    • da: Number of available terminals at this station
    • dx: Number of unavailable terminals at this station
    • ba: Number of available bikes at this station
    • bx: Number of unavailable bicycles at this station

    3. Geographical boundaries of Montreal (Borough and related city)

    • This dataset is optional and will be useful mostly for ploting data and doing some choropleth maps.

    Acknowledgements

    Creative Commons Attribution 4.0 International

    For more details :
    http://donnees.ville.montreal.qc.ca/dataset/bixi-historique-des-deplacements
    http://donnees.ville.montreal.qc.ca/dataset/bixi-etat-des-stations
    http://donnees.ville.montreal.qc.ca/dataset/polygones-arrondissements

    Inspiration

    Can you find pattern in the behavior of Bixi users?
    Are there any inefficient stations ?
    What insights can we use from this data for decision making ?

    [1] https://en.wikipedia.org/wiki/BIXI_Montr%C3%A9al

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vega Datasets (2025). Geospatial Data Pack for Visualization [Dataset]. https://www.kaggle.com/datasets/vega-datasets/geospatial-data-pack
Organization logo

Geospatial Data Pack for Visualization

Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

Explore at:
zip(1422109 bytes)Available download formats
Dataset updated
Oct 21, 2025
Dataset authored and provided by
Vega Datasets
Description

Geospatial Data Pack for Visualization đŸ—ș

Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.

Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.

Why Use This Dataset? đŸ€”

  • Comprehensive Geospatial Types: Explore a variety of core geospatial data models:
    • Vector Data: Includes points (like airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).
    • Raster-like Data: Work with gridded datasets (like windvectors.csv, annual-precip.json).
  • Diverse Formats: Gain experience with standard and efficient geospatial formats like GeoJSON (see Table 1, 2, 4), compressed TopoJSON (see Table 1), and plain CSV/TSV (see Table 2, 3, 4) for point data and attribute tables ready for joining.
  • Multi-Scale Coverage: Practice visualization across different geographic scales, from global and national (Table 1, 4) down to the city level (Table 1).
  • Rich Thematic Mapping: Includes multiple datasets (Table 3) specifically designed for joining attributes to geographic boundaries (like states or counties from Table 1) to create insightful choropleth maps.
  • Ready-to-Use & Example-Driven: Cleaned datasets tightly integrated with 31+ official examples (see Appendix) from Altair, Vega-Lite, and Vega, allowing you to immediately practice techniques like projections, point maps, network maps, and interactive displays.
  • Python Friendly: Works seamlessly with essential Python libraries like Altair (which can directly read TopoJSON/GeoJSON), Pandas, and GeoPandas, fitting perfectly into the Kaggle notebook environment.

Table of Contents

Dataset Inventory đŸ—‚ïž

This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.

1. BASE MAP BOUNDARIES (Topological Data)

DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
US Map (1:10m)us-10m.json627 KBTopoJSONCC-BY-4.0US state and county boundaries. Contains states and counties objects. Ideal for choropleths.id (FIPS code) property on geometries
World Map (1:110m)world-110m.json117 KBTopoJSONCC-BY-4.0World country boundaries. Contains countries object. Suitable for world-scale viz.id property on geometries
London BoroughslondonBoroughs.json14 KBTopoJSONCC-BY-4.0London borough boundaries.properties.BOROUGHN (name)
London CentroidslondonCentroids.json2 KBGeoJSONCC-BY-4.0Center points for London boroughs.properties.id, properties.name
London Tube LineslondonTubeLines.json78 KBGeoJSONCC-BY-4.0London Underground network lines.properties.name, properties.color

2. GEOGRAPHIC REFERENCE POINTS (Point Data) 📍

DatasetFileSizeFormatLicenseDescriptionKey Fields / Join Info
US Airportsairports.csv205 KBCSVPublic DomainUS airports with codes and coordinates.iata, state, `l...
Search
Clear search
Close search
Google apps
Main menu