Facebook
TwitterLearn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets
Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.
Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.
airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).windvectors.csv, annual-precip.json).This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Map (1:10m) | us-10m.json | 627 KB | TopoJSON | CC-BY-4.0 | US state and county boundaries. Contains states and counties objects. Ideal for choropleths. | id (FIPS code) property on geometries |
| World Map (1:110m) | world-110m.json | 117 KB | TopoJSON | CC-BY-4.0 | World country boundaries. Contains countries object. Suitable for world-scale viz. | id property on geometries |
| London Boroughs | londonBoroughs.json | 14 KB | TopoJSON | CC-BY-4.0 | London borough boundaries. | properties.BOROUGHN (name) |
| London Centroids | londonCentroids.json | 2 KB | GeoJSON | CC-BY-4.0 | Center points for London boroughs. | properties.id, properties.name |
| London Tube Lines | londonTubeLines.json | 78 KB | GeoJSON | CC-BY-4.0 | London Underground network lines. | properties.name, properties.color |
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Airports | airports.csv | 205 KB | CSV | Public Domain | US airports with codes and coordinates. | iata, state, `l... |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We recruited 414 college students to participate in the experiment. Through the experiment, we collected their visual data and arranged them according to different visual indicators. Then we process our data through qualitative and quantitative analysis to get the final result.
Facebook
TwitterI created a dataset to help people create choropleth maps of United States states.
One geojson to plot the countries borders, and one csv from the Census Bureau for the us population per state.
I think the best way to use this dataset is in joining it with other data. For example, I used this dataset to plot police killings using the data from https://www.kaggle.com/jpmiller/police-violence-in-the-us
Facebook
TwitterWith this add in it is possible to create map templates from GIS files in KML format, and create choropleths with them. Providing you have access to KML format map boundary files, it is possible to create your own quick and easy choropleth maps in Excel. The KML format files can be converted from 'shape' files. Many shape files are available to download for free from the web, including from Ordnance Survey and the London Datastore. Standard mapping packages such as QGIS (free to download) and ArcGIS can convert the files to KML format. A sample of a KML file (London wards) can be downloaded from this page, so that users can easily test the tool out. Macros must be enabled for the tool to function. When creating the map using the Excel tool, the 'unique ID' should normally be the area code, the 'Name' should be the area name and then if required and there is additional data in the KML file, further 'data' fields can be added. These columns will appear below and to the right of the map. If not, data can be added later on next to the codes and names. In the add-in version of the tool the final control, 'Scale (% window)' should not normally be changed. With the default value 0.5, the height of the map is set to be half the total size of the user's Excel window. To run a choropleth, select the menu option 'Run Choropleth' to get this form. To specify the colour ramp for the choropleth, the user needs to enter the number of boxes into which the range is to be divided, and the colours for the high and low ends of the range, which is done by selecting coloured option boxes as appropriate. If wished, hit the 'Swap' button to change which colours are for the different ends of the range. Then hit the 'Choropleth' button. The default options for the colours of the ends of the choropleth colour range are saved in the add in, but different values can be selected but setting up a column range of up to twelve cells, anywhere in Excel, filled with the option colours wanted. Then use the 'Colour range' control to select this range, and hit apply, having selected high or low values as wished. The button 'Copy' sets up a sheet 'ColourRamp' in the active workbook with the default colours, which can just be extended or deleted with just a few cells, so saving the user time. The add-in was developed entirely within the Excel VBA IDE by Tim Lund. He is kindly distributing the tool for free on the Datastore but suggests that users who find the tool useful make a donation to the Shelter charity. It is not intended to keep the actively maintained, but if any users or developers would like to add more features, email the author. Acknowledgments Calculation of Excel freeform shapes from latitudes and longitudes is done using calculations from the Ordnance Survey.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains geometry data for the countries of the world together with their names and country codes in various formats. The primary use case is choropleths, color-coded maps. The data can be read as a pandas DataFrame with geopandas and plotted with matplotlib. See the starter notebook for an example how to do it.
The data was created by Natural Earth. It is in public domain and free to use for any purpose at the time of this writing; you might want to check their Terms of Use.
Photo by KOBU Agency on Unsplash
Facebook
TwitterAnyone who has taught GIS using Census Data knows it is an invaluable data set for showing students how to take data stored in a table and join it to boundary data to transform this data into something that can be visualised and analysed spatially. Joins are a core GIS skill and need to be learnt, as not every data set is going to come neatly packaged as a shapefile or feature layer with all the data you need stored within. I don't know how many times I taught students to download data as a table from Nomis, load it into a GIS and then join that table data to the appropriate boundary data so they could produce choropleth maps to do some visual analysis, but it was a lot! Once students had gotten the hang of joins using census data they'd often ask why this data doesn't exist as a prepackaged feature layer with all the data they wanted within it. Well good news, now a lot off it is and it's accessible through the Living Atlas! Don't get me wrong I fully understand the importance of teaching students how to perform joins but once you have this understanding if you can access data that already contains all the information you need then you should be taking advantage of it to save you time. So in this exercise I am going to show you how to load English and Welsh Census Data from the 2021 Census into the ArcGIS Map Viewer from the Living Atlas and produce some choropleth maps to use to perform visual analysis without having to perform a single join.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I used this publicly available data for making interactive map visualization of NYC. Zipcode geodata is useful for building interactive maps with each zip code area representing a separate area on the map.
NYC zipcode geodata in geojson format
The rights belong to the original authors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of sociodemographic variables used in PCA analysis to create new indicators for spatial analysis.
Facebook
TwitterThese datasets are map cognitive data collected in experiments and processed through programs. The README file provides relevant information.
Datasets included:
File
Description
Controlled experiments-web-based survey.xlsx Web survey data for a controlled experiment, where the map data is randomly generated simulation data
Controlled experiments-eye-tracking.xlsx Eye movement data from a controlled experiment, which added an eye tracker to record eye movement data
Real-map experiments-web-based survey.xlsx Real experiment web survey data, the experimental map data is real data
Real-map experiments-eye-tracking.xlsx Eye movement data from real experiments
Correlation and significance test results.xlsx Correlation significance parameters of various experimental data
Blank spaces in the table indicate no data.All tables in the dataset can be opened using Office 2010 and above versions.
Facebook
TwitterCrimeMapTutorial is a step-by-step tutorial for learning crime mapping using ArcView GIS or MapInfo Professional GIS. It was designed to give users a thorough introduction to most of the knowledge and skills needed to produce daily maps and spatial data queries that uniformed officers and detectives find valuable for crime prevention and enforcement. The tutorials can be used either for self-learning or in a laboratory setting. The geographic information system (GIS) and police data were supplied by the Rochester, New York, Police Department. For each mapping software package, there are three PDF tutorial workbooks and one WinZip archive containing sample data and maps. Workbook 1 was designed for GIS users who want to learn how to use a crime-mapping GIS and how to generate maps and data queries. Workbook 2 was created to assist data preparers in processing police data for use in a GIS. This includes address-matching of police incidents to place them on pin maps and aggregating crime counts by areas (like car beats) to produce area or choropleth maps. Workbook 3 was designed for map makers who want to learn how to construct useful crime maps, given police data that have already been address-matched and preprocessed by data preparers. It is estimated that the three tutorials take approximately six hours to complete in total, including exercises.
Facebook
TwitterFolium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. These files can be used to mark the state boundaries on the map of INDIA using folium library and the CSV also contains the state data and how to use it in our notebooks. I have used it in one of my kernels which can be viewed.
The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON, and TopoJSON overlays. Due to extensible functionalities I find folium the best map plotting library in python. Do give it a try and use it in your kernels.
Facebook
TwitterThis is dataset which you can find population of Ecuadorian cities in 2022 . The data downloaded from this website. In my case, I utilize this data for making choropleth map for analyzing data of "Store Sales - Time Series Forecasting" data and please freely utilize this data for such use. (Thank you very much for "World Population Review"!)
Facebook
TwitterThis dataset is flattened and multicounty communities are unsplit by county lines. Flattened means that there are no overlaps; larger shapes like counties are punched out or clipped where smaller communities are contained within them. This allows for choropleth shading and other mapping techniques such as calculating unincorporated county land area. Multicounty cities like Houston are a single feature, undivided by counties. This layer is derived from Census, State of Maine, and National Flood Hazard Layer political boundaries.rnrnThe Community Layer datasets contain geospatial community boundaries associated with Census and NFIP data. The dataset does not contain personal identifiable information (PII). The Community Layer can be used to tie Community ID numbers (CID) to jurisdiction, tribal, and special land use area boundaries.rnrnA geodatabase (GDB) link is Included in the Full Data section below. The compressed file contains a collection of files that can store, query, and manage both spatial and nonspatial data using software that can read such a file. It bcontains all of the community layers/b, not just the layer for which this dataset page describes. rnThis layer can also be accessed from the FEMA ArcGIS viewer online: https://fema.maps.arcgis.com/home/item.html?id=8dcf28fc5b97404bbd9d1bc6d3c9b3cfrnrnrnCitation: FEMA's citation requirements for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnrnFor answers to Frequently Asked Questions (FAQs) about the OpenFEMA program, API, and publicly available datasets, please visit: https://www.fema.gov/about/openfema/faq.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This project provides a comprehensive dataset on intentional homicides in Mexico from 1990 to 2023, disaggregated by sex and state. It includes both raw data and tools for visualization, making it a valuable resource for researchers, policymakers, and analysts studying violence trends, gender disparities, and regional patterns.ContentsHomicide Data: Total number of male and female victims per state and year.Population Data: Corresponding male and female population estimates for each state and year.Homicide Rates: Per 100,000 inhabitants, calculated for both sexes.Choropleth Map Script: A Python script that generates homicide rate maps using a GeoJSON file.GeoJSON File: A spatial dataset defining Mexico's state boundaries, used for mapping.Sample Figure: A pre-generated homicide rate map for 2023 as an example.Requirements File: A requirements.txt file listing necessary dependencies for running the script.SourcesHomicide Data: INEGI - Vital Statistics MicrodataPopulation Data: Mexican Population Projections 2020-2070This dataset enables spatial analysis and data visualization, helping users explore homicide trends across Mexico in a structured and reproducible way.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Digital Equity and Inclusion in Western Parkland CityThe Western Parkland City Digital Equity and Inclusion project shows the localised findings of targeted research aimed at benchmarking digital inclusion across the Western Parkland City. The index scores, depicted through a choropleth map, highlight specific gaps and priorities for improving overall digital inclusion and dimensions of Access, Ability and Affordability across each of the eight participating Council areas.AbilityDigital Ability is about our skill levels: what we are able do online, and our confidence in doing it. Having limited digital capability in the types of skills and knowledge needed to get online, operate functions and navigate sites with confidence and safety has been referred to the ‘second level’ digital divide. In measuring Digital Ability, the ADII draws upon the Internet Skills Scale (ISS) to focus on six skills domains. The ADII’s Digital Ability score measures the following skills components:Basic operational (i.e., downloading and opening files, connecting to the internet, and setting passwords)Advanced operational (i.e., saving to the cloud, determining what is safe to download, customising devices and connections, and adjusting privacy settings (e.g. downloading and opening files, connecting to the internet))Information navigation (i.e., searching and navigating, verifying trustworthy information, and managing third party data collection)Social (i.e., deciding what to share, how, and who with, managing and monitoring contacts, and communicating with others)Creative (i.e., editing, producing, and posting content, as well as having a broad understanding of the rules that may apply to these activities)Smart (i.e., connecting, operating, and managing smart devices and IoT technologies)AccessThe Digital Access dimension within the ADII is a measure of several interrelated components of internet usage that include intensity and frequency of use, types of devices, and use of fixed and mobile plans. It is well recognised that the quality of both fixed and mobile connectivity is problematic and underserviced in regional and remote areas – often due to intermittent and unreliable access to the nbn.AffordabilityAs connected technologies have developed and more people move online, some gaps in connectivity access have narrowed. However, for many people, particularly in areas with higher concentrations of low-income individuals and households, affordability can present significant barriers to achieving digital equity across the city. This includes being able to afford quality and reliable mobile and fixed broadband plans and the devices needed to connect online.For additional information click this linkSource: Data is sourced through a collaboration between Smart Places, Cities and Active Transport, Transport for NSW and The Parks, Sydney’s Parkland Councils, an alliance of the eight local government areas that comprise Western Parkland City as part of the Western Parkland City Digital Equity and Inclusion Insights Program. This is currently a one-off release. At this time we do not have plans to update this dataset regularly.
Facebook
TwitterPostGIS data for London and Greater London ward boundaries as of 2018.
This dataset is used in the london_votes sample Splitfile in which the 2017 General Election results and London Ward geodata are joined through the ONS UK Ward-Constituency lookup table to build a dataset of London constituencies and Conservative/Labour votes in each, ready for plotting as a Choropleth map.
https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london
Contains National Statistics data © Crown copyright and database right 2012
Contains Ordnance Survey data © Crown copyright and database right 2012
Facebook
TwitterOverviewThe multiple hazard index for the United States Counties was designed to map natural hazard relating to exposure to multiple natural disasters. The index was created to provide communities and public health officials with an overview of the risks that are prominent in their county, and to facilitate the comparison of hazard level between counties. Most existing hazard maps focus on a single disaster type. By creating a measure that aggregates the hazard from individual disasters, the increased hazard that results from exposure to multiple natural disasters can be better understood. The multiple hazard index represents the aggregate of hazard from eleven individual disasters. Layers displaying the hazard from each individual disaster are also included.
The hazard index is displayed visually as a choropleth map, with the color blue representing areas with less hazard and red representing areas with higher hazard. Users can click on each county to view its hazard index value, and the level of hazard for each individual disaster. Layers describing the relative level of hazard from each individual disaster are also available as choropleth maps with red areas representing high, orange representing medium, and yellow representing low levels of hazard.Methodology and Data CitationsMultiple Hazard Index
The multiple hazard index was created by coding the individual hazard classifications and summing the coded values for each United States County. Each individual hazard is weighted equally in the multiple hazard index. Alaska and Hawaii were excluded from analysis because one third of individual hazard datasets only describe the coterminous United States.
Avalanche Hazard
University of South Carolina Hazards and Vulnerability Research Institute. “Spatial Hazard Events and Losses Database”. United States Counties. “Avalanches United States 2001-2009”. < http://hvri.geog.sc.edu/SHELDUS/
Downloaded 06/2016.
Classification
Avalanche hazard was classified by dividing counties based upon the number of avalanches they experienced over the nine year period in the dataset. Avalanche hazard was not normalized by total county area because it caused an over-emphasis on small counties, and because avalanches are a highly local hazard.
None = 0 AvalanchesLow = 1 AvalancheMedium = 2-5 AvalanchesHigh = 6-10 Avalanches
Earthquake Hazard
United States Geological Survey. “Earthquake Hazard Maps”. 1:2,000,000. “Peak Ground Acceleration 2% in 50 Years”. < http://earthquake.usgs.gov/hazards/products/conterminous/
. Downloaded 07/2016.
Classification
Peak ground acceleration (% gravity) with a 2% likelihood in 50 years was averaged by United States County, and the earthquake hazard of counties was classified based upon this average.
Low = 0 - 14.25 % gravity peak ground accelerationMedium = 14.26 - 47.5 % gravity peak ground accelerationHigh = 47.5+ % gravity peak ground acceleration
Flood Hazard
United States Federal Emergency Management Administration. “National Flood Hazard Layer”. 1:10,000. “0.2 Percent Annual Flood Area”. < https://data.femadata.com/FIMA/Risk_MAP/NFHL/
. Downloaded 07/2016.
Classification
The National Flood Hazard Layer 0.2 Percent Annual Flood Area was spatially intersected with the United States Counties layer, splitting flood areas by county and adding county information to flood areas. Flood area was aggregated by county, expressed as a fraction of the total county land area, and flood hazard was classified based upon percentage of land that is susceptible to flooding. National Flood Hazard Layer does not cover the entire United States; coverage is focused on populated areas. Areas not included in National Flood Hazard Layer were assigned flood risk of Low in order to include these areas in further analysis.
Low = 0-.001% area susceptibleMedium = .00101 % - .005 % area susceptibleHigh = .00501+ % area susceptible
Heat Wave Hazard
United States Center for Disease Control and Prevention. “National Climate Assessment”. Contiguous United States Counties. “Extreme Heat Events: Heat Wave Days in May - September for years 1981-2010”. Downloaded 06/2016.
Classification
Heat wave was classified by dividing counties based upon the number of heat wave days they experienced over the 30 year time period described in the dataset.
Low = 126 - 171 Heat wave DaysMedium = 172 – 187 Heat wave DaysHigh = 188 – 255 Heat wave Days
Hurricane Hazard
National Oceanic and Atmospheric Administration. Coastal Services Center. “Historical North Atlantic Tropical Cyclone Tracks, 1851-2004”. 1: 2,000,000. < https://catalog.data.gov/dataset/historical-north-atlantic-tropical-cyclone-tracks-1851-2004-direct-download
. Downloaded 06/2016.
National Oceanic and Atmospheric Administration. Coastal Services Center. “Historical North Pacific Tropical Cyclone Tracks, 1851-2004”. 1: 2,000,000. < https://catalog.data.gov/dataset/historical-north-atlantic-tropical-cyclone-tracks-1851-2004-direct-download
. Downloaded 06/2016.
Classification
Atlantic and Pacific datasets were merged. Tropical storm and disturbance tracks were filtered out leaving hurricane tracks. Each hurricane track was assigned the value of the category number that describes that event. Weighting each event by intensity ensures that areas with higher intensity events are characterized as being more hazardous. Values describing each hurricane event were aggregated by United States County, normalized by total county area, and the hurricane hazard of counties was classified based upon the normalized value.
Landslide Hazard
United States Geological Survey. “Landslide Overview Map of the United States”. 1:4,000,000. “Landslide Incidence and Susceptibility in the Conterminous United States”. < https://catalog.data.gov/dataset/landslide-incidence-and-susceptibility-in-the-conterminous-united-states-direct-download
. Downloaded 07/2016.
Classification
The classifications of High, Moderate, and Low landslide susceptibility and incidence from the study were numerically coded, the average value was computed for each county, and the landslide hazard was classified based upon the average value.
Long-Term Drought Hazard
United States Drought Monitor, Drought Mitigation Center, United States Department of Agriculture, National Oceanic and Atmospheric Administration. “Drought Monitor Summary Map”. “Long-Term Drought Impact”. < http://droughtmonitor.unl.edu/MapsAndData/GISData.aspx >. Downloaded 06/2016.
Classification
Short-term drought areas were filtered from the data; leaving only long-term drought areas. United States Counties were assigned the average U.S. Drought Monitor Classification Scheme Drought Severity Classification value that characterizes the county area. County long-term drought hazard was classified based upon average Drought Severity Classification value.
Low = 1 – 1.75 average Drought Severity Classification valueMedium = 1.76 -3.0 average Drought Severity Classification valueHigh = 3.0+ average Drought Severity Classification value
Snowfall Hazard
United States National Oceanic and Atmospheric Administration. “1981-2010 U.S. Climate Normals”. 1: 2,000,000. “Annual Snow Normal”. < http://www1.ncdc.noaa.gov/pub/data/normals/1981-2010/products/precipitation/
. Downloaded 08/2016.
Classification
Average yearly snowfall was joined with point location of weather measurement stations, and stations without valid snowfall measurements were filtered out (leaving 6233 stations). Snowfall was interpolated using least squared distance interpolation to create a .05 degree raster describing an estimate of yearly snowfall for the United States. The average yearly snowfall raster was aggregated by county to yield the average yearly snowfall per United States County. The snowfall risk of counties was classified by average snowfall.
None = 0 inchesLow = .01- 10 inchesMedium = 10.01- 50 inchesHigh = 50.01+ inches
Tornado Hazard
United States National Oceanic and Atmospheric Administration Storm Prediction Center. “Severe Thunderstorm Database and Storm Data Publication”. 1: 2,000,000. “United States Tornado Touchdown Points 1950-2004”. < https://catalog.data.gov/dataset/united-states-tornado-touchdown-points-1950-2004-direct-download
. Downloaded 07/2016.
Classification
Each tornado touchdown point was assigned the value of the Fujita Scale that describes that event. Weighting each event by intensity ensures that areas with higher intensity events are characterized as more hazardous. Values describing each tornado event were aggregated by United States County, normalized by total county area, and the tornado hazard of counties was classified based upon the normalized value.
Volcano Hazard
Smithsonian Institution National Volcanism Program. “Volcanoes of the World”. “Holocene Volcanoes”. < http://volcano.si.edu/search_volcano.cfm
. Downloaded 07/2016.
Classification
Volcano coordinate locations from spreadsheet were mapped and aggregated by United States County. Volcano count was normalized by county area, and the volcano hazard of counties was classified based upon the number of volcanoes present per unit area.
None = 0 volcanoes/100 kilometersLow = 0.000915 - 0.007611 volcanoes / 100 kilometersMedium = 0.007612 - 0.018376 volcanoes / 100 kilometersHigh = 0.018377- 0.150538 volcanoes / 100 kilometers
Wildfire Hazard
United States Department of Agriculture, Forest Service, Fire, Fuel, and Smoke Science Program. “Classified 2014 Wildfire Hazard Potential”. 270 meters. < http://www.firelab.org/document/classified-2014-whp-gis-data-and-maps
. Downloaded 06/2016.
Classification
The classifications of Very High, High, Moderate, Low, Very Low, and Non-Burnable/Water wildfire hazard from the study were numerically coded, the average value was computed for each county, and the wildfire hazard was classified based upon the average value.
Facebook
TwitterThe dataset defines the geographic polygon shapes of the prefectures of Japan. You can use it for plotting Mapbox Choropleth maps by the plotly package conveniently. It is a small modification from the dataset at https://github.com/dataofjapan/land/blob/master/japan.geojson.
For each prefecture, an id is assigned. The id naming is something like 'Kyoto' which means for the Kyoto prefecture, and 'Okinawa' which means for the Okinawa prefecture.
It is a small modification from the original dataset at https://github.com/dataofjapan/land/blob/master/japan.geojson. I have added id for each element so that it can be conveniently used for plotting Mapbox Choropleth maps.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I'm trying to make a Choropleth map over time of home sale prices by block in Brooklyn for the last 15 years to visualize gentrification. I have the entire dataset for all 5 boroughs of New York, but am starting with Brooklyn.
Primary dataset is the NYC Housing Sales Data Found in this Link: http://www1.nyc.gov/site/finance/taxes/property-rolling-sales-data.page
The data in all the separate excel spreadsheets for 2003-2017 was merged via VBA scripting in Excel and further cleaned & de-duped in R
Additionally, in my hunt for shapefiles I discovered these wonderful shapefiles from NYCPluto: https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
I left joined it by "Block" & "Lot" onto the primary data frame, but 25% of the block/lot combo's ended up not having a corresponding entry in the Pluto shapefile and are NAs.
Note that as in other uploaded datasets of NYC housing on Kaggle, many of these transactions have a sale_price of $0 or only a nominal amount far less than market value. These are likely property transfers to relatives and should be excluded from any analysis of market prices.
Can you model Brooklyn home prices accurately?
Facebook
TwitterIt's difficult to find shapefiles to construct choropleth map. So this dataset is trying to gather a maximum of shapefiles. Be carefull to downloads all the files, otherwise you will have errors.
tl_2014_us_state: It gathers 56 american territories. 50 states and 6 others, Guam, Puerto Rico, United States Virgin Islands, Commonwealth of the Northern Mariana Islands, American Samoa, District of Columbia.
Found here: ftp://ftp2.census.gov/geo/tiger/TIGER2014/STATE/
Facebook
TwitterLearn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets
Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.
Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.
airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).windvectors.csv, annual-precip.json).This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Map (1:10m) | us-10m.json | 627 KB | TopoJSON | CC-BY-4.0 | US state and county boundaries. Contains states and counties objects. Ideal for choropleths. | id (FIPS code) property on geometries |
| World Map (1:110m) | world-110m.json | 117 KB | TopoJSON | CC-BY-4.0 | World country boundaries. Contains countries object. Suitable for world-scale viz. | id property on geometries |
| London Boroughs | londonBoroughs.json | 14 KB | TopoJSON | CC-BY-4.0 | London borough boundaries. | properties.BOROUGHN (name) |
| London Centroids | londonCentroids.json | 2 KB | GeoJSON | CC-BY-4.0 | Center points for London boroughs. | properties.id, properties.name |
| London Tube Lines | londonTubeLines.json | 78 KB | GeoJSON | CC-BY-4.0 | London Underground network lines. | properties.name, properties.color |
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Airports | airports.csv | 205 KB | CSV | Public Domain | US airports with codes and coordinates. | iata, state, `l... |