Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.
Each CSV has the 1. Longitude 2. Latitude
of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code
Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061
Last Updated On : 29th January 2022
The MAR Web Geocoder is a web browser-based tool for geocoding locations, typically addresses, in Washington, DC. It is developed by the Office of Chief Technology Officer (OCTO) and can input Excel or CSV files to output an Excel file. Geocoding is the process of assigning a location in the form of geographic coordinates (often expressed as latitude and longitude) to spreadsheet data. This is done by comparing the descriptive geographic data to known geographic locations such as addresses, blocks, intersections, or place names.
The resource-location extension for CKAN enhances data resources by automatically adding latitude and longitude coordinates to CSV files containing address data, using provided address, city and zipcode columns. This simplifies geocoding and location-based analysis directly within CKAN. The extension requires CKAN version 2.7.2 or higher. Key Features: Automated Geocoding: Automatically converts address data within CSV files into latitude and longitude coordinates during resource upload. Address Field Configuration: Allows users to specify the CSV column numbers corresponding to address, city, and zipcode fields. Coordinate Appending: Adds new columns to the CSV file containing the calculated latitude and longitude coordinates, preserving the original data. CSV Processing during Upload: Geocoding process is integrated directly into the resource upload workflow. Language Management: Offers translation support and instructions for adding new translations. How It Works: During CSV resource upload, the user is prompted to input column numbers corresponding to the address, city, and zipcode. Upon submission of the upload form, the extension processes the file, geocodes the addresses using these column values, and appends latitude and longitude as new columns to the CSV. This modified CSV file, now containing geographic coordinates, is stored as the resource. Benefits & Impact: By automatically adding geographic coordinates, the resource-location extension simplifies tasks such as mapping and spatial analysis of tabular data. This automated geocoding process enhances the usability and value of address-based datasets within CKAN.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Created for use in the Renewable and Appropriate Energy Lab at UC Berkeley and Lawrence Berkeley National Laboratory.
Geography: All 58 Counties of the American State of California
Time period: 2015
Unit of analysis: Tons per year
Variables:
Sources: All columns except for lat and lon were scraped from the California Air Resources Board Facility Search Tool using the Request module from Python’s Urllib library. The script used is included below in scripts in case you would like to get additional columns.
The lat and lon columns were geocoded using the Geocoder library for Python with the Bing provider.
download.py
import pandas as pd
out_dir = 'ARB/'
file_ext = '.csv'
for i in range(1, 59):
facilities = pd.read_csv("https://www.arb.ca.gov/app/emsinv/facinfo/faccrit_output.csv?&dbyr=2015&ab_=&dis_=&co_=" + str(i) + "&fname_=&city_=&sort=FacilityNameA&fzip_=&fsic_=&facid_=&all_fac=C&chapis_only=&CERR=&dd=")
for index, row in facilities.iterrows():
curr_facility = pd.read_csv("https://www.arb.ca.gov/app/emsinv/facinfo/facdet_output.csv?&dbyr=2015&ab_=" + str(row['AB']) + "&dis_=" + str(row['DIS']) + "&co_=" + str(row['CO']) + "&fname_=&city_=&sort=C&fzip_=&fsic_=&facid_=" + str(row['FACID']) + "&all_fac=&chapis_only=&CERR=&dd=")
facilities.set_value(index, 'PM2.5T', curr_facility.loc[curr_facility['POLLUTANT NAME'] == 'PM2.5'].iloc[0]['EMISSIONS_TONS_YR'])
facilities.to_csv(out_dir + str(i) + file_ext)
geocode.py
import geocoder
import csv
directory = 'ARB/'
outdirectory = 'ARB_OUT/'
for i in range(1, 59):
with open(directory + str(i) + ".csv", 'rb') as csvfile, open(outdirectory + str(i) + '.csv', 'a') as csvout:
reader = csv.DictReader(csvfile)
fieldnames = reader.fieldnames + ['lat'] + ['lon'] # Add new columns
writer = csv.DictWriter(csvout, fieldnames)
writer.writeheader()
for row in reader:
address = row['FSTREET'] + ', ' + row['FCITY'] + ', California ' + row['FZIP']
g = geocoder.bing(address, key='API_KEY')
newrow = dict(row)
if g.latlng:
newrow['lat'] = g.json['lat']
newrow['lon'] = g.json['lng']
writer.writerow(newrow) # Only write row if successfully geocoded
Offices_on_Aging_csv_Geocoded
Geoscape G-NAF is the geocoded address database for Australian businesses and governments. It’s the trusted source of geocoded address data for Australia with over 50 million contributed addresses distilled into 15.4 million G-NAF addresses. It is built and maintained by Geoscape Australia using independently examined and validated government data.
From 22 August 2022, Geoscape Australia is making G-NAF available in an additional simplified table format. G-NAF Core makes accessing geocoded addresses easier by utilising less technical effort.
G-NAF Core will be updated on a quarterly basis along with G-NAF.
Further information about contributors to G-NAF is available here.
With more than 15 million Australian physical address record, G-NAF is one of the most ubiquitous and powerful spatial datasets. The records include geocodes, which are latitude and longitude map coordinates. G-NAF does not contain personal information or details relating to individuals.
Updated versions of G-NAF are published on a quarterly basis. Previous versions are available here
Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.
Changes in the August 2025 release
Nationally, the August 2025 update of G-NAF shows an overall increase of 40,716 addresses (0.30%). The total number of addresses in G-NAF now stands at 15,794,643 of which 14,950,491 or 94.66% are principal.
In the ACT, there have been minor updates to the address parsing of flat-numbered addresses aimed at: improving the address representation of flat-numbered addresses; improving address coverage; and improving address alignment between contributors. This change affects approximately 4,000 addresses.
A small number of additional address sites have implemented the use of the BUILDING_NAME attribute as part of the merge criteria to improve address coverage for flat-numbered addresses in NSW and QLD. These changes have resulted in the creation of approximately 400 addresses in NSW and 120 in QLD.
A focus has been applied to Tasmanian street-locality addresses to reduce the number of these addresses. For the August 2025 release, there is a reduction of some 900 street-locality addresses in Tasmania.
Geoscape has moved product descriptions, guides and reports online to https://docs.geoscape.com.au.
Further information on G-NAF, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on G-NAF, including software solutions, consultancy and support.
Additional information: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia.
Use of the G-NAF downloaded from data.gov.au is subject to the End User Licence Agreement (EULA)
The EULA terms are based on the Creative Commons Attribution 4.0 International license (CC BY 4.0). However, an important restriction relating to the use of the open G-NAF for the sending of mail has been added.
The open G-NAF data must not be used for the generation of an address or the compilation of an address for the sending of mail unless the user has verified that each address to be used for the sending of mail is capable of receiving mail by reference to a secondary source of information. Further information on this use restriction is available here.
End users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).
Users must also note the following attribution requirements:
Preferred attribution for the Licensed Material:
_G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the _Open Geo-coded National Address File (G-NAF) End User Licence Agreement.
Preferred attribution for Adapted Material:
Incorporates or developed using G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.
G-NAF is a complex and large dataset (approximately 5GB unpacked), consisting of multiple tables that will need to be joined prior to use. The dataset is primarily designed for application developers and large-scale spatial integration. Users are advised to read the technical documentation, including product change notices and the individual product descriptions before downloading and using the product. A quick reference guide on unpacking the G-NAF is also available.
Mapping incident locations from a CSV file in a web map (YouTube video).
Senior_Centers_csv_Geocoded
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains the locations found in the Kiva datasets included in an administrative or geographical region. You can also find poverty data about this region. This facilitates answering some of the tough questions about a region's poverty.
In the interest of preserving the original names and spelling for the locations/countries/regions all the data is in Excel format and has no preview (I think only the Kaggle recommended file types have preview - if anyone can show me how to do this for an xlsx file, it will be greatly appreciated)
The Tables datasets contain the most recent analysis of the MPI on countries and regions. These datasets are updated regularly. In unique regions_names_from_google_api you will find 3 levels of inclusion for every geocode provided in Kiva datasets. (village/town, administrative region, sub-national region - which can be administrative or geographical). These are the results from the Google API Geocoding process.
Files:
Dropped multiple columns, kept all the rows from loans.csv with names, tags, descriptions and got a csv file of 390MB instead of 2.13 GB. Basically is a simplified version of loans.csv (originally included in the analysis by beluga)
This is the loan_themes_by_region left joined with Tables_5.3_Contribution_of_Deprivations. (all the original entries from loan_themes and only the entries that match from Tables_5; for the regions that lack MPI data, you will find Nan)
These are the columns in the database:
Matched the loans in loan_themes_by_region with the regions that have info regarding MPI. This dataset brings together the amount invested in a region and the biggest problems the said region has to deal with. It is a join between the loan_themes_by_region provided by Kiva and Tables 5.3 Contribution_of_Deprivations.
It is a subset of the all_loan_theme_merged_with_geo_mpi_regions.xlsx, which contains only the entries that I could match with poverty decomposition data. It has the same columns.
Multidimensional poverty index decomposition for over 1000 regions part of 79 countries.
Table 5.3: Contribution of deprivations to the MPI, by sub-national regions
This table shows which dimensions and indicators contribute most to a region's MPI, which is useful for understanding the major source(s) of deprivation in a sub-national region.
Source: http://ophi.org.uk/multidimensional-poverty-index/global-mpi-2016/
MPI decomposition for 120 countries.
Table 7 All Published MPI Results since 2010
The table presents an archive of all MPI estimations published over the past 5 years, together with MPI, H, A and censored headcount ratios. For comparisons over time please use Table 6, which is strictly harmonised. The full set of data tables for each year published (Column A), is found on the 'data tables' page under 'Archive'.
The data in this file is shown in interactive plots on Oxford Poverty and Human Development Initiative website. http://www.dataforall.org/dashboard/ophi/index.php/
These are all the regions corresponding to the geocodes found in Kiva's loan_themes_by_region.
There are 718 unique entries, that you can join with any database from Kiva that has either a coordinates or region column.
Columns:
geo: pair of Lat, Lon (from loan_themes_by_region)
City: name of the city (has the most NaN's)
Administrative region: first level of administrative inclusion for the city/location; (the equivalent of county for US)
Sub-national region: second level of administrative inclusion for the geo pair. (like state for US)
Country: name of the country
Thanks to Shane Lynn for the batch geocoding and to Joseph Deferio for reverse geocoding:
https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/
https://github.com/jdeferio/Reverse_Geocode
The MPI datasets you can find on the Oxford website (http://ophi.org.uk/) under Research.
"Citation: Alkire, S. and Kanagaratnam, U. (2018)
“Multidimensional Poverty Index Winter 2017-18: Brief methodological note and results.” Oxford Poverty and Human Development Initiative, University of Oxford, OPHI Methodological Notes 45."
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Column name | Example | Description | Source |
alpha_2_country |
AD | 2 letter based country code, ISO 3166-1 | pycountry |
alpha_3_country |
AND | 3 letter based country code, ISO 3166-1 | pycountry |
flag_country |
🇦🇩 | unicode flag | pycountry |
name_country |
Andorra | Country name, ISO 3166-1 | pycountry |
numeric_country |
020 | Country code | pycountry |
official_name_country |
Principality of Andorra | Country official name | pycountry |
score | 95 | Confidence code of geocoding [0-100/nan] | ArcGIS |
geocoded | True | Was the coordinate derived from geocoding [true/false] | ArcGIS |
geometry |
POINT (1.516666666666667 42.5) | Point coordinate | ArcGIS if geocoded else original UNLO |
Geospatial data about Fresno County, California Addresses. Export to CAD, GIS, PDF, CSV and access via API.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor Geocoding of worldwide patent data. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Versioning Note:Version 2 was generated when the metadata format was updated from JSON to JSON-LD. This was an automatic process that changed only the format, not the contents, of the metadata.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Instructions (with screenshots) to replicate results from Section 3 of the manuscript are available in "Step-by-step Instructions to Replicate Results.pdf".-------------------------------------------------------------------------------------------------------------------Step 1: Download the replication materialsDownload the whole replication folder on figshare containing the code, data and replication files.Step 2: Replicate Tables in Section 3All of the data is available inside the sub-folder replication/Data. To replicate Tables 1 and 2 from section 3 of the manuscript run the Python file replicate_section3_tables.py locally on your computer. This will produce two .csv files containing Tables 1 and 2 (already provided). Note that it is not necessary to run the code in order to replicate the tables. The output data needed for replication is provided.Step 3: Replicate Figures in QGISThe Figures must be replicated using QGIS, freely available at https://www.qgis.org/. Open the QGIS project replicate_figures.qgz inside the replication/Replicate Figures sub-folder. It should auto-find the layer data. The Figures are replicated as layers in the project. Step 4: Running the code from scratchThe accompanying code for the manuscript IJGIS-2024-1305, entitled "Route-based Geocoding of Traffic Congestion-Related Social Media Texts on a Complex Network" runs on Google Colab as Python notebooks. Please follow the instructions below to run the entire geocoder and network mapper from scratch. The expected running time is of the order of 10 hours on free tier Google Colab. 4a) Upload to Google DriveUpload the entire replication folder to your Google Drive. Note the path (location) to which you have uploaded it. There are two Google Colab notebooks that need to be executed in their entirety. These are Code/Geocoder/The_Geocoder.ipynb and Code/Complex_Network/Complex_network_code.ipynb. They need to be run in order (Geocoder first and Complex Network second). 4b) Set the path In each Google Colab notebook, you have to set the variable called “REPL_PATH” to the location on your Google Drive where you uploaded the replication folder. Include the replication folder in the path. For example "/content/drive/MyDrive/replication"4c) Run the codeThe code is available in two sub-folders, replication/Code/Geocoder and replication/Code/Complex_Network. You may simply open the Google Colab notebooks inside each folder, mount your Google Drive, set the path and run all cells.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Federal Superfund sites are some of the most polluted in the United States. This dataset contains a multifaceted view of Superfunds, including free-form text descriptions, geography, demographics and socioeconomics.
The core data was scraped from the National Priorities List (NPL) provided by the U.S. Environmental Protection Agency (EPA). This table provides basic information such as site name, site score, date added, and links to a site description and current status. Apache Tika was used to extract text from the site description pdfs. The addresses were scraped from site status pages, and used to geocode to latitude and longitude and Census block group. The block group assignment was used to join with the Census Bureau's planning database, a rich source of nationwide demographic and socioeconomic data. The full source code used to generate the data can be found here, on github.
I have provided three separate downloads to explore:
Some caveats:
I would like to thank the EPA and the Census Bureau for making such detailed information publicly available. For relevant academic work, please see Burwell-Naney et al. (2013) and references, both to and therein.
Please let me know if you have any suggestions for improving the dataset!
Geospatial data about Clark County, Nevada Addresses. Export to CAD, GIS, PDF, CSV and access via API.
A global self-hosted Market Research dataset containing all administrative divisions, cities, addresses, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.
Use cases for the Global Zip Code Database (Market Research data)
Address capture and validation
Map and visualization
Reporting and Business Intelligence (BI)
Master Data Mangement
Logistics and Supply Chain Management
Sales and Marketing
Data export methodology
Our map data packages are offered in variable formats, including .csv. All geographic data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.
Product Features
Fully and accurately geocoded
Administrative areas with a level range of 0-4
Multi-language support including address names in local and foreign languages
Comprehensive city definitions across countries
For additional insights, you can combine the map data with:
UNLOCODE and IATA codes
Time zones and Daylight Saving Times
Why do companies choose our Market Research databases
Enterprise-grade service
Reduce integration time and cost by 30%
Weekly updates for the highest quality
Note: Custom geographic data packages are available. Please submit a request via the above contact button for more details.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor GDIS, a global dataset of geocoded disaster locations. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
A global self-hosted location dataset containing all administrative divisions, cities, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.
Use cases for the Global Zip Code Database (Geospatial data)
Address capture and validation
Map and visualization
Reporting and Business Intelligence (BI)
Master Data Mangement
Logistics and Supply Chain Management
Sales and Marketing
Data export methodology
Our location data packages are offered in variable formats, including .csv. All geospatial data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.
Product Features
Fully and accurately geocoded
Administrative areas with a level range of 0-4
Multi-language support including address names in local and foreign languages
Comprehensive city definitions across countries
For additional insights, you can combine the map data with:
UNLOCODE and IATA codes
Time zones and Daylight Saving Times
Why do companies choose our location databases
Enterprise-grade service
Reduce integration time and cost by 30%
Weekly updates for the highest quality
Note: Custom geospatial data packages are available. Please submit a request via the above contact button for more details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Social network services such as Twitter are important venues that can be used as rich data sources to mine public opinions about various topics. In this study, we used Twitter to collect data on one of the most growing theories in education, namely Self-Regulated Learning (SRL) and carry out further analysis to investigate What Twitter says about SRL? This work uses three main analysis methods, descriptive, topic modeling, and geocoding analysis. The searched and collected dataset consists of a large volume of relevant SRL tweets equal to 54,070 tweets between 2011 and 2021. The descriptive analysis uncovers a growing discussion on SRL on Twitter from 2011 till 2018 and then markedly decreased till the collection day. For topic modeling, the text mining technique of Latent Dirichlet allocation (LDA) was applied and revealed insights on computationally processed topics. Finally, the geocoding analysis uncovers a diverse community from all over the world, yet a higher density representation of users from the Global North was identified. Further implications are discussed in the paper.
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
(English version below) Ce jeu de données est une version étendue de la base de données GeoVirus, qui comprend 229 articles de WikiNews sur les épidémies mondiales, dont les entités spatiales nommées (SNE) sont manuellement annotées par des experts, avec leurs coordonnées et noms. Nous avons intégré un processus automatique d'extraction et de désambiguïsation des SNE, lequel a été aligné avec les données annotées par les experts. Ce processus a impliqué une corrélation de 1,360 SNE identifiées à la fois dans notre extraction automatique et dans l'ensemble de données annoté par les experts, garantissant ainsi précision et cohérence dans l'identification spatiale. Le corpus résultant est une extension de la base GeoVirus originale, enrichie de trois colonnes supplémentaires présentant les annotations spatiales automatiques. GeoVirus dataset: Gritta, Milan, Mohammad Taher Pilehvar, and Nigel Collier. "Which melbourne? augmenting geocoding with maps." Association for Computational Linguistics, 2018. Ce jeu de données amélioré non seulement préserve l'intégrité des annotations expertes, mais démontre également l'efficacité de notre processus automatique, comme en témoigne le taux de rappel de 0.911 de notre approche, surpassant nettement le taux de rappel de 0.871 du géocodeur standard Geonames. Ce jeu de données comprend un seul fichier: sne_data.csv. Il est constitué des colonnes suivantes: source : Lien url WikiNews fourni dans les données GeoVirus input_sne: Entité nommée spatiale extraite avec Spacy à partir des données brutes (document). Utilisée comme entrée pour Geonames true_country_code: Code pays obtenu par géocodage inverse, basé sur la latitude et la longitude fournies output_sne: Entité nommée spatiale renvoyée par Geonames predicted_country_code: Code pays correspondant à l'output_sne après désambiguïsation disamb_phase: Correspond à la phase de désambiguïsation qui a aidé à désambiguïser l'entité nommée spatiale saisie -------- This dataset is an extended version of the GeoVirus database, which includes 229 WikiNews articles on global epidemics. Named Spatial Entities (SNEs) in these articles are manually annotated by experts, complete with their coordinates and names. We integrated an automated process for SNE extraction and disambiguation, aligning it with the data annotated by experts. This involved correlating 1,360 SNEs identified both in our automatic extraction and the expert-annotated dataset, ensuring precision and consistency in spatial identification. The resulting corpus is an extension of the original GeoVirus base, enriched with three additional columns presenting automatic spatial annotations. GeoVirus dataset reference: Gritta, Milan, Mohammad Taher Pilehvar, and Nigel Collier. "Which Melbourne? Augmenting geocoding with maps." Association for Computational Linguistics, 2018. This enhanced dataset not only preserves the integrity of expert annotations but also demonstrates the effectiveness of our automated process, evidenced by the recall rate of 0.911 in our approach, significantly surpassing the standard Geonames geocoder's recall rate of 0.871. The dataset consists of a single file: sne_data.csv. It includes the following columns: source: WikiNews URL provided in the GeoVirus data. input_sne: Spatial Named Entity extracted with Spacy from raw data (document). Used as input for Geonames. true_country_code: Country code obtained through reverse geocoding based on provided latitude and longitude. output_sne: Spatial Named Entity returned by Geonames. predicted_country_code: Country code corresponding to output_sne after disambiguation. disamb_phase: Corresponds to the disambiguation phase that aided in disambiguating the entered spatial named entity.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.
Each CSV has the 1. Longitude 2. Latitude
of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code
Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061
Last Updated On : 29th January 2022