100+ datasets found

Geolocation Data [Longitude Latitude]
kaggle.com
Updated Mar 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
You Sheng (2022). Geolocation Data [Longitude Latitude] [Dataset]. https://www.kaggle.com/datasets/liewyousheng/geolocation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
You Sheng
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Context

Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.

Each CSV has the 1. Longitude 2. Latitude

of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code

Content

Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061

Last Updated On : 29th January 2022

Source

https://github.com/dr5hn/countries-states-cities-database
d
MAR Web Geocoder User Guide
catalog.data.gov
opendata.dc.gov
+2more
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Chief Technology Officer (2025). MAR Web Geocoder User Guide [Dataset]. https://catalog.data.gov/dataset/mar-web-gecoder-user-guide
Explore at:
Dataset updated
Apr 16, 2025
Dataset provided by
Office of the Chief Technology Officer
Description
The MAR Web Geocoder is a web browser-based tool for geocoding locations, typically addresses, in Washington, DC. It is developed by the Office of Chief Technology Officer (OCTO) and can input Excel or CSV files to output an Excel file. Geocoding is the process of assigning a location in the form of geographic coordinates (often expressed as latitude and longitude) to spreadsheet data. This is done by comparing the descriptive geographic data to known geographic locations such as addresses, blocks, intersections, or place names.
c
ckanext-resource-location
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-resource-location [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-resource-location
Explore at:
Dataset updated
Jun 4, 2025
Description
The resource-location extension for CKAN enhances data resources by automatically adding latitude and longitude coordinates to CSV files containing address data, using provided address, city and zipcode columns. This simplifies geocoding and location-based analysis directly within CKAN. The extension requires CKAN version 2.7.2 or higher. Key Features: Automated Geocoding: Automatically converts address data within CSV files into latitude and longitude coordinates during resource upload. Address Field Configuration: Allows users to specify the CSV column numbers corresponding to address, city, and zipcode fields. Coordinate Appending: Adds new columns to the CSV file containing the calculated latitude and longitude coordinates, preserving the original data. CSV Processing during Upload: Geocoding process is integrated directly into the resource upload workflow. Language Management: Offers translation support and instructions for adding new translations. How It Works: During CSV resource upload, the user is prompted to input column numbers corresponding to the address, city, and zipcode. Upon submission of the upload form, the extension processes the file, geocodes the addresses using these column values, and appends latitude and longitude as new columns to the CSV. This modified CSV file, now containing geographic coordinates, is stored as the resource. Benefits & Impact: By automatically adding geographic coordinates, the resource-location extension simplifies tasks such as mapping and spatial analysis of tabular data. This automated geocoding process enhances the usability and value of address-based datasets within CKAN.
California Facilities Pollutant Emissions Data
kaggle.com
zip
Updated Nov 21, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florin Langer (2017). California Facilities Pollutant Emissions Data [Dataset]. https://www.kaggle.com/florinlanger/cal-facilities
Explore at:
zip(2602145 bytes)Available download formats
Dataset updated
Nov 21, 2017
Authors
Florin Langer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
California
Description
Context

Created for use in the Renewable and Appropriate Energy Lab at UC Berkeley and Lawrence Berkeley National Laboratory.

Content

Geography: All 58 Counties of the American State of California

Time period: 2015

Unit of analysis: Tons per year

Variables:

CO: County ID as numbered in the County dropdown menu on the California Air Resources Board Facility Search Tool

AB

FACID

DIS

FNAME

FSTREET

FCITY

FZIP

FSIC: Facility Standard Industrial Classification Code specified by the US Department of Labor

COID

DISN

CHAPIS

CERR_CODE

TOGT: Total organic gases consist of all hydrocarbons, i.e. compounds containing hydrogen and carbon with or without other chemical elements.

ROGT: Reactive organic gases include all the organic gases exclude methane, ethane, acetone, methyl acetate, methylated siloxanes, and number of low molecular weight halogenated organics that have a low rate of reactivity.

COT: The emissions of CO are for the single species, carbon monoxide.

NOXT: The emissions of NOx gases (mostly nitric oxide and nitrogen dioxide) are reported as equivalent amounts of NO2.

SOXT: The emissions of SOx gases (sulfur dioxide and sulfur trioxide) are reported as equivalent amounts of SO2.

PMT: Particulate matter refers to small solid and liquid particles such as dust, sand, salt spray, metallic and mineral particles, pollen, smoke, mist and acid fumes.

PM10T: PM10 refers to the fraction of particulate matter with an aerodynamic diameter of 10 micrometer and smaller. These particles are small enough to penetrate the lower respiratory tract.

PM2.5T: PM2.5 refers to the fraction of particulate matter with an aerodynamic diameter of 2.5 micrometer and smaller. These particles are small enough to penetrate the lower respiratory tract.

lat: Facility latitude geocoded by inputting FSTREET, FCITY, California FZIP into Bing’s geocoding service.

lon: Facility longitude geocoded in the same way.

Sources: All columns except for lat and lon were scraped from the California Air Resources Board Facility Search Tool using the Request module from Python’s Urllib library. The script used is included below in scripts in case you would like to get additional columns.

The lat and lon columns were geocoded using the Geocoder library for Python with the Bing provider.

Scripts

download.py

import pandas as pd out_dir = 'ARB/' file_ext = '.csv' for i in range(1, 59): facilities = pd.read_csv("https://www.arb.ca.gov/app/emsinv/facinfo/faccrit_output.csv?&dbyr=2015&ab_=&dis_=&co_=" + str(i) + "&fname_=&city_=&sort=FacilityNameA&fzip_=&fsic_=&facid_=&all_fac=C&chapis_only=&CERR=&dd=") for index, row in facilities.iterrows(): curr_facility = pd.read_csv("https://www.arb.ca.gov/app/emsinv/facinfo/facdet_output.csv?&dbyr=2015&ab_=" + str(row['AB']) + "&dis_=" + str(row['DIS']) + "&co_=" + str(row['CO']) + "&fname_=&city_=&sort=C&fzip_=&fsic_=&facid_=" + str(row['FACID']) + "&all_fac=&chapis_only=&CERR=&dd=") facilities.set_value(index, 'PM2.5T', curr_facility.loc[curr_facility['POLLUTANT NAME'] == 'PM2.5'].iloc[0]['EMISSIONS_TONS_YR']) facilities.to_csv(out_dir + str(i) + file_ext)

geocode.py

import geocoder import csv directory = 'ARB/' outdirectory = 'ARB_OUT/' for i in range(1, 59): with open(directory + str(i) + ".csv", 'rb') as csvfile, open(outdirectory + str(i) + '.csv', 'a') as csvout: reader = csv.DictReader(csvfile) fieldnames = reader.fieldnames + ['lat'] + ['lon'] # Add new columns writer = csv.DictWriter(csvout, fieldnames) writer.writeheader() for row in reader: address = row['FSTREET'] + ', ' + row['FCITY'] + ', California ' + row['FZIP'] g = geocoder.bing(address, key='API_KEY') newrow = dict(row) if g.latlng: newrow['lat'] = g.json['lat'] newrow['lon'] = g.json['lng'] writer.writerow(newrow) # Only write row if successfully geocoded
a
Offices on Aging csv Geocoded
njogis-newjersey.opendata.arcgis.com
hub.arcgis.com
+1more
Updated Aug 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Middlesex County, NJ (2021). Offices on Aging csv Geocoded [Dataset]. https://njogis-newjersey.opendata.arcgis.com/maps/middlesexcounty::offices-on-aging-csv-geocoded
Explore at:
Dataset updated
Aug 5, 2021
Dataset authored and provided by
Middlesex County, NJ
Area covered

Description
Offices_on_Aging_csv_Geocoded
d
Geoscape Geocoded National Address File (G-NAF)
data.gov.au
researchdata.edu.au
+1more
pdf, zip
Updated Aug 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Industry, Science and Resources (DISR) (2025). Geoscape Geocoded National Address File (G-NAF) [Dataset]. https://data.gov.au/data/dataset/geocoded-national-address-file-g-naf
Explore at:
pdf(383741), pdf, zip(1695191699), zip(1691304483)Available download formats
Dataset updated
Aug 18, 2025
Dataset authored and provided by
Department of Industry, Science and Resources (DISR)
Description
Geoscape G-NAF is the geocoded address database for Australian businesses and governments. It’s the trusted source of geocoded address data for Australia with over 50 million contributed addresses distilled into 15.4 million G-NAF addresses. It is built and maintained by Geoscape Australia using independently examined and validated government data.

From 22 August 2022, Geoscape Australia is making G-NAF available in an additional simplified table format. G-NAF Core makes accessing geocoded addresses easier by utilising less technical effort.

G-NAF Core will be updated on a quarterly basis along with G-NAF.

Further information about contributors to G-NAF is available here.

With more than 15 million Australian physical address record, G-NAF is one of the most ubiquitous and powerful spatial datasets. The records include geocodes, which are latitude and longitude map coordinates. G-NAF does not contain personal information or details relating to individuals.

Updated versions of G-NAF are published on a quarterly basis. Previous versions are available here

Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.

Changes in the August 2025 release

Nationally, the August 2025 update of G-NAF shows an overall increase of 40,716 addresses (0.30%). The total number of addresses in G-NAF now stands at 15,794,643 of which 14,950,491 or 94.66% are principal.

In the ACT, there have been minor updates to the address parsing of flat-numbered addresses aimed at: improving the address representation of flat-numbered addresses; improving address coverage; and improving address alignment between contributors. This change affects approximately 4,000 addresses.

A small number of additional address sites have implemented the use of the BUILDING_NAME attribute as part of the merge criteria to improve address coverage for flat-numbered addresses in NSW and QLD. These changes have resulted in the creation of approximately 400 addresses in NSW and 120 in QLD.

A focus has been applied to Tasmanian street-locality addresses to reduce the number of these addresses. For the August 2025 release, there is a reduction of some 900 street-locality addresses in Tasmania.

Geoscape has moved product descriptions, guides and reports online to https://docs.geoscape.com.au.

Further information on G-NAF, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on G-NAF, including software solutions, consultancy and support.

Additional information: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia.

License Information

Use of the G-NAF downloaded from data.gov.au is subject to the End User Licence Agreement (EULA)

The EULA terms are based on the Creative Commons Attribution 4.0 International license (CC BY 4.0). However, an important restriction relating to the use of the open G-NAF for the sending of mail has been added.

The open G-NAF data must not be used for the generation of an address or the compilation of an address for the sending of mail unless the user has verified that each address to be used for the sending of mail is capable of receiving mail by reference to a secondary source of information. Further information on this use restriction is available here.

End users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).

Users must also note the following attribution requirements:

Preferred attribution for the Licensed Material:

_G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the _Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

Preferred attribution for Adapted Material:

Incorporates or developed using G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

What to Expect When You Download G-NAF

G-NAF is a complex and large dataset (approximately 5GB unpacked), consisting of multiple tables that will need to be joined prior to use. The dataset is primarily designed for application developers and large-scale spatial integration. Users are advised to read the technical documentation, including product change notices and the individual product descriptions before downloading and using the product. A quick reference guide on unpacking the G-NAF is also available.
A
Mapping incident locations from a CSV file in a web map (video)
data.amerigeoss.org
esri rest, html
Updated Mar 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ESRI (2020). Mapping incident locations from a CSV file in a web map (video) [Dataset]. https://data.amerigeoss.org/zh_CN/dataset/mapping-incident-locations-from-a-csv-file-in-a-web-map-video
Explore at:
esri rest, htmlAvailable download formats
Dataset updated
Mar 17, 2020
Dataset provided by
ESRI
Description
Mapping incident locations from a CSV file in a web map (YouTube video).

View this short demonstration video to learn how to geocode incident locations from a spreadsheet in ArcGIS Online. In this demonstration, the presenter drags a simple .csv file into a browser-based Web Map and maps the appropriate address fields to display incident points allowing different types of spatial overlays and analysis.

_

Communities around the world are taking strides in mitigating the threat that COVID-19 (coronavirus) poses. Geography and location analysis have a crucial role in better understanding this evolving pandemic.

When you need help quickly, Esri can provide data, software, configurable applications, and technical support for your emergency GIS operations. Use GIS to rapidly access and visualize mission-critical information. Get the information you need quickly, in a way that’s easy to understand, to make better decisions during a crisis.

Esri’s Disaster Response Program (DRP) assists with disasters worldwide as part of our corporate citizenship. We support response and relief efforts with GIS technology and expertise.

More information...
a
Senior Centers csv Geocoded
hub.arcgis.com
njogis-newjersey.opendata.arcgis.com
Updated Aug 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Middlesex County, NJ (2021). Senior Centers csv Geocoded [Dataset]. https://hub.arcgis.com/datasets/middlesexcounty::senior-centers-and-offices-on-aging/explore?layer=3&showTable=true
Explore at:
Dataset updated
Aug 5, 2021
Dataset authored and provided by
Middlesex County, NJ
Area covered

Description
Senior_Centers_csv_Geocoded
Additional resources for Kiva Crowdfunding
kaggle.com
zip
Updated Apr 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke (2018). Additional resources for Kiva Crowdfunding [Dataset]. https://www.kaggle.com/forums/f/26443/additional-resources-for-kiva-crowdfunding/t/54374/dataset-suggestion
Explore at:
zip(104671314 bytes)Available download formats
Dataset updated
Apr 12, 2018
Authors
Luke
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset contains the locations found in the Kiva datasets included in an administrative or geographical region. You can also find poverty data about this region. This facilitates answering some of the tough questions about a region's poverty.

Content

In the interest of preserving the original names and spelling for the locations/countries/regions all the data is in Excel format and has no preview (I think only the Kaggle recommended file types have preview - if anyone can show me how to do this for an xlsx file, it will be greatly appreciated)

The Tables datasets contain the most recent analysis of the MPI on countries and regions. These datasets are updated regularly. In unique regions_names_from_google_api you will find 3 levels of inclusion for every geocode provided in Kiva datasets. (village/town, administrative region, sub-national region - which can be administrative or geographical). These are the results from the Google API Geocoding process.

Files:

all_kiva_loans.csv

Dropped multiple columns, kept all the rows from loans.csv with names, tags, descriptions and got a csv file of 390MB instead of 2.13 GB. Basically is a simplified version of loans.csv (originally included in the analysis by beluga)

country_stats.csv

population source: https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)

population_below_poverty_line: Percentage

hdi: Human Development Index

life_expectancy: Life expectancy at birth

expected_years_of_schooling: Expected years of schooling

mean_years_of_schooling: Mean years of schooling

gni: Gross national income (GNI) per capita This dataset was originally created by beluga.

all_loan_theme_merged_with_geo_mpi_regions.xlsx

This is the loan_themes_by_region left joined with Tables_5.3_Contribution_of_Deprivations. (all the original entries from loan_themes and only the entries that match from Tables_5; for the regions that lack MPI data, you will find Nan)

These are the columns in the database:

Partner ID

Field Partner

Name

sector

Loan Theme ID

Loan Theme Type

Country

forkiva

number

amount

geo

rural_pct

City

Administrative region

Sub-national region

ISO

World region

Population Share of the Region (%)

region MPI

Education (%)

Health (%)

Living standards (%)

Schooling (%)

Child school attendance (%)

Child Mortality (%)

Nutrition (%)

Electricity (%)

Improved sanitation (%)

Drinking water (%)

Floor (%)

Cooking fuel (%)

Asset ownership (%)

mpi_on_regions.xlsx

Matched the loans in loan_themes_by_region with the regions that have info regarding MPI. This dataset brings together the amount invested in a region and the biggest problems the said region has to deal with. It is a join between the loan_themes_by_region provided by Kiva and Tables 5.3 Contribution_of_Deprivations.

It is a subset of the all_loan_theme_merged_with_geo_mpi_regions.xlsx, which contains only the entries that I could match with poverty decomposition data. It has the same columns.

Tables_5_SubNational_Decomposition_MPI_2017-18.xlsx

Multidimensional poverty index decomposition for over 1000 regions part of 79 countries.

Table 5.3: Contribution of deprivations to the MPI, by sub-national regions
This table shows which dimensions and indicators contribute most to a region's MPI, which is useful for understanding the major source(s) of deprivation in a sub-national region.

Source: http://ophi.org.uk/multidimensional-poverty-index/global-mpi-2016/

Tables_7_MPI_estimations_country_levels.xlsx

MPI decomposition for 120 countries.

Table 7 All Published MPI Results since 2010
The table presents an archive of all MPI estimations published over the past 5 years, together with MPI, H, A and censored headcount ratios. For comparisons over time please use Table 6, which is strictly harmonised. The full set of data tables for each year published (Column A), is found on the 'data tables' page under 'Archive'.

The data in this file is shown in interactive plots on Oxford Poverty and Human Development Initiative website. http://www.dataforall.org/dashboard/ophi/index.php/

unique_regions_from_kiva_loan_themes.xlsx

These are all the regions corresponding to the geocodes found in Kiva's loan_themes_by_region. There are 718 unique entries, that you can join with any database from Kiva that has either a coordinates or region column.
Columns:

geo: pair of Lat, Lon (from loan_themes_by_region)

City: name of the city (has the most NaN's)

Administrative region: first level of administrative inclusion for the city/location; (the equivalent of county for US)

Sub-national region: second level of administrative inclusion for the geo pair. (like state for US)

Country: name of the country

Acknowledgements

Thanks to Shane Lynn for the batch geocoding and to Joseph Deferio for reverse geocoding:

https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/

https://github.com/jdeferio/Reverse_Geocode

The MPI datasets you can find on the Oxford website (http://ophi.org.uk/) under Research.

"Citation: Alkire, S. and Kanagaratnam, U. (2018)

“Multidimensional Poverty Index Winter 2017-18: Brief methodological note and results.” Oxford Poverty and Human Development Initiative, University of Oxford, OPHI Methodological Notes 45."

Geocoded variant of United Nations Code for Trade and Transport Locations

zenodo.org

bin

Updated May 14, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Fedor Baart; Fedor Baart; Bas Turpijn; Rolien van der Mark; Guus van Hemert; Bas Turpijn; Rolien van der Mark; Guus van Hemert (2024). Geocoded variant of United Nations Code for Trade and Transport Locations [Dataset]. http://doi.org/10.5281/zenodo.11191511

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.11191511

Dataset updated

May 14, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Fedor Baart; Fedor Baart; Bas Turpijn; Rolien van der Mark; Guus van Hemert; Bas Turpijn; Rolien van der Mark; Guus van Hemert

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United Nations

Description

The United Nations Code for Trade and Transport Locations dataset provides information on the codes known as "UN/LOCODE".

This version of the dataset, based on the csv files of version 232, has extra information based on pycountry (names of administrative regions, country names, flags) and on most important on location.

The dataset has the following extra fields:

Column name	Example	Description	Source
alpha_2_country	AD	2 letter based country code, ISO 3166-1	pycountry
alpha_3_country	AND	3 letter based country code, ISO 3166-1	pycountry
flag_country	🇦🇩	unicode flag	pycountry
name_country	Andorra	Country name, ISO 3166-1	pycountry
numeric_country	020	Country code	pycountry
official_name_country	Principality of Andorra	Country official name	pycountry
score	95	Confidence code of geocoding [0-100/nan]	ArcGIS
geocoded	True	Was the coordinate derived from geocoding [true/false]	ArcGIS
geometry	POINT (1.516666666666667 42.5)	Point coordinate	ArcGIS if geocoded else original UNLO

The figure below shows the locations (n=23663) that have been geocoded in this dataset in green and the existing locations in pink.

Ownership of the original data recides at UNLO and the participants.

K
Fresno County, California Addresses
koordinates.com
csv, dwg, geodatabase +6
Updated Jun 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fresno County, California (2023). Fresno County, California Addresses [Dataset]. https://koordinates.com/layer/109482-fresno-county-california-addresses/
Explore at:
geopackage / sqlite, csv, geodatabase, mapinfo mif, kml, shapefile, dwg, mapinfo tab, pdfAvailable download formats
Dataset updated
Jun 26, 2023
Dataset authored and provided by
Fresno County, California
Area covered

Description
Geospatial data about Fresno County, California Addresses. Export to CAD, GIS, PDF, CSV and access via API.
Metadata record for: Geocoding of worldwide patent data
springernature.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaétan de Rassenfosse; Jan Kozak; Florian Seliger (2023). Metadata record for: Geocoding of worldwide patent data [Dataset]. http://doi.org/10.6084/m9.figshare.9970454.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9970454.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Gaétan de Rassenfosse; Jan Kozak; Florian Seliger
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains key characteristics about the data described in the Data Descriptor Geocoding of worldwide patent data. Contents:

1. human readable metadata summary table in CSV format 2. machine readable metadata file in JSON format Versioning Note:Version 2 was generated when the metadata format was updated from JSON to JSON-LD. This was an automatic process that changed only the format, not the contents, of the metadata.
Replication Data/Code for Route-based Geocoding of Traffic...
figshare.com
csv
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saif Ali (2025). Replication Data/Code for Route-based Geocoding of Traffic congestion-Related Social Media Texts on a Complex Network (Manuscript ID IJGIS-2024-1073) [Dataset]. http://doi.org/10.6084/m9.figshare.28210757.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28210757.v1
Dataset updated
Jan 15, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Saif Ali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Instructions (with screenshots) to replicate results from Section 3 of the manuscript are available in "Step-by-step Instructions to Replicate Results.pdf".-------------------------------------------------------------------------------------------------------------------Step 1: Download the replication materialsDownload the whole replication folder on figshare containing the code, data and replication files.Step 2: Replicate Tables in Section 3All of the data is available inside the sub-folder replication/Data. To replicate Tables 1 and 2 from section 3 of the manuscript run the Python file replicate_section3_tables.py locally on your computer. This will produce two .csv files containing Tables 1 and 2 (already provided). Note that it is not necessary to run the code in order to replicate the tables. The output data needed for replication is provided.Step 3: Replicate Figures in QGISThe Figures must be replicated using QGIS, freely available at https://www.qgis.org/. Open the QGIS project replicate_figures.qgz inside the replication/Replicate Figures sub-folder. It should auto-find the layer data. The Figures are replicated as layers in the project. Step 4: Running the code from scratchThe accompanying code for the manuscript IJGIS-2024-1305, entitled "Route-based Geocoding of Traffic Congestion-Related Social Media Texts on a Complex Network" runs on Google Colab as Python notebooks. Please follow the instructions below to run the entire geocoder and network mapper from scratch. The expected running time is of the order of 10 hours on free tier Google Colab. 4a) Upload to Google DriveUpload the entire replication folder to your Google Drive. Note the path (location) to which you have uploaded it. There are two Google Colab notebooks that need to be executed in their entirety. These are Code/Geocoder/The_Geocoder.ipynb and Code/Complex_Network/Complex_network_code.ipynb. They need to be run in order (Geocoder first and Complex Network second). 4b) Set the path In each Google Colab notebook, you have to set the variable called “REPL_PATH” to the location on your Google Drive where you uploaded the replication folder. Include the replication folder in the path. For example "/content/drive/MyDrive/replication"4c) Run the codeThe code is available in two sub-folders, replication/Code/Geocoder and replication/Code/Complex_Network. You may simply open the Google Colab notebooks inside each folder, mount your Google Drive, set the path and run all cells.
U.S. Federal Superfund Sites
kaggle.com
Updated Nov 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
4d4stra (2017). U.S. Federal Superfund Sites [Dataset]. https://www.kaggle.com/srrobert50/federal-superfunds/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2017
Dataset provided by
Kaggle
Authors
4d4stra
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Federal Superfund sites are some of the most polluted in the United States. This dataset contains a multifaceted view of Superfunds, including free-form text descriptions, geography, demographics and socioeconomics.

Content

The core data was scraped from the National Priorities List (NPL) provided by the U.S. Environmental Protection Agency (EPA). This table provides basic information such as site name, site score, date added, and links to a site description and current status. Apache Tika was used to extract text from the site description pdfs. The addresses were scraped from site status pages, and used to geocode to latitude and longitude and Census block group. The block group assignment was used to join with the Census Bureau's planning database, a rich source of nationwide demographic and socioeconomic data. The full source code used to generate the data can be found here, on github.

I have provided three separate downloads to explore:

priorities_list_full.json: the NPL containing all geographic, site information, text descriptions, and Census Bureau data from the relevant block groups.

pdb_tract.csv: the planning database aggregated on the tract level with an additional indicator (has_superfund) noting whether or not the tract contains the address of a Superfund site.

pdb_block_group.csv: the planning database aggregated on the block group level with an additional indicator (has_superfund) noting whether or not the block group contains the address of a Superfund site.

Some caveats:

The planning database contains 300+ columns. For a full description of these columns, please see the documentation here.

Since the Google geocoder is relatively aggressive in providing address matches, geocoding was done through a hierarchy of queries (full address, city-state-zip, and zipcode only) to prevent gross errors. The address string used to geocode is noted through the 'geocode_source' column.

While this data is linked to demographic and socioeconomic data based on either the block group (tract for pdb_tract.csv), the impacts of a particular site's pollution may extend beyond these geographic regions.

Acknowledgements

I would like to thank the EPA and the Census Bureau for making such detailed information publicly available. For relevant academic work, please see Burwell-Naney et al. (2013) and references, both to and therein.

Please let me know if you have any suggestions for improving the dataset!
K
Clark County, Nevada Addresses
koordinates.com
csv, dwg, geodatabase +6
Updated Jan 16, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clark County, Nevada (2007). Clark County, Nevada Addresses [Dataset]. https://koordinates.com/layer/109477-clark-county-nevada-addresses/
Explore at:
pdf, shapefile, dwg, mapinfo tab, geodatabase, csv, geopackage / sqlite, mapinfo mif, kmlAvailable download formats
Dataset updated
Jan 16, 2007
Dataset authored and provided by
Clark County, Nevada
Area covered

Description
Geospatial data about Clark County, Nevada Addresses. Export to CAD, GIS, PDF, CSV and access via API.
d
Market Research Data | Global Map data | Geographic data | Address and Zip...
datarade.ai
.csv
Updated Oct 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeoPostcodes (2024). Market Research Data | Global Map data | Geographic data | Address and Zip Code Database | Geocoded [Dataset]. https://datarade.ai/data-products/geopostcodes-market-research-data-map-data-geographic-dat-geopostcodes
Explore at:
.csvAvailable download formats
Dataset updated
Oct 19, 2024
Dataset authored and provided by
GeoPostcodes
Area covered
Saint Barthélemy, Poland, South Sudan, Papua New Guinea, Christmas Island, Monaco, Korea (Democratic People's Republic of), Tokelau, Slovenia, Sierra Leone
Description
A global self-hosted Market Research dataset containing all administrative divisions, cities, addresses, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.

Use cases for the Global Zip Code Database (Market Research data)

Address capture and validation

Map and visualization

Reporting and Business Intelligence (BI)

Master Data Mangement

Logistics and Supply Chain Management

Sales and Marketing

Data export methodology

Our map data packages are offered in variable formats, including .csv. All geographic data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.

Product Features

Fully and accurately geocoded

Administrative areas with a level range of 0-4

Multi-language support including address names in local and foreign languages

Comprehensive city definitions across countries

For additional insights, you can combine the map data with:

UNLOCODE and IATA codes

Time zones and Daylight Saving Times

Why do companies choose our Market Research databases

Enterprise-grade service

Reduce integration time and cost by 30%

Weekly updates for the highest quality

Note: Custom geographic data packages are available. Please submit a request via the above contact button for more details.
Metadata record for: GDIS, a global dataset of geocoded disaster locations
springernature.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scientific Data Curation Team (2023). Metadata record for: GDIS, a global dataset of geocoded disaster locations [Dataset]. http://doi.org/10.6084/m9.figshare.13177022.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13177022.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Scientific Data Curation Team
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains key characteristics about the data described in the Data Descriptor GDIS, a global dataset of geocoded disaster locations. Contents:

1. human readable metadata summary table in CSV format 2. machine readable metadata file in JSON format
d
Global Zip Code Dataset (9M+) | Address Data | Country, Regions, Lat/Long,...
datarade.ai
Updated Jun 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeoPostcodes (2024). Global Zip Code Dataset (9M+) | Address Data | Country, Regions, Lat/Long, City | Weekly Updated [Dataset]. https://datarade.ai/data-products/geopostcodes-zip-code-data-global-coverage-8-6-m-zip-code-geopostcodes
Explore at:
.csv, .geojson, .kmlAvailable download formats
Dataset updated
Jun 14, 2024
Dataset authored and provided by
GeoPostcodes
Area covered
United States
Description
A global self-hosted location dataset containing all administrative divisions, cities, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.

Use cases for the Global Zip Code Database (Geospatial data)

Address capture and validation

Map and visualization

Reporting and Business Intelligence (BI)

Master Data Mangement

Logistics and Supply Chain Management

Sales and Marketing

Data export methodology

Our location data packages are offered in variable formats, including .csv. All geospatial data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.

Product Features

Fully and accurately geocoded

Administrative areas with a level range of 0-4

Multi-language support including address names in local and foreign languages

Comprehensive city definitions across countries

For additional insights, you can combine the map data with:

UNLOCODE and IATA codes

Time zones and Daylight Saving Times

Why do companies choose our location databases

Enterprise-grade service

Reduce integration time and cost by 30%

Weekly updates for the highest quality

Note: Custom geospatial data packages are available. Please submit a request via the above contact button for more details.
f
Data_Sheet_4_What Does Twitter Say About Self-Regulated Learning? Mapping...
frontiersin.figshare.com
txt
Updated Jun 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Khalil; Gleb Belokrys (2023). Data_Sheet_4_What Does Twitter Say About Self-Regulated Learning? Mapping Tweets From 2011 to 2021.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2022.820813.s004
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2022.820813.s004
Dataset updated
Jun 4, 2023
Dataset provided by
Frontiers
Authors
Mohammad Khalil; Gleb Belokrys
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Social network services such as Twitter are important venues that can be used as rich data sources to mine public opinions about various topics. In this study, we used Twitter to collect data on one of the most growing theories in education, namely Self-Regulated Learning (SRL) and carry out further analysis to investigate What Twitter says about SRL? This work uses three main analysis methods, descriptive, topic modeling, and geocoding analysis. The searched and collected dataset consists of a large volume of relevant SRL tweets equal to 54,070 tweets between 2011 and 2021. The descriptive analysis uncovers a growing discussion on SRL on Twitter from 2011 till 2018 and then markedly decreased till the collection day. For topic modeling, the text mining technique of Latent Dirichlet allocation (LDA) was applied and revealed insights on computationally processed topics. Finally, the geocoding analysis uncovers a diverse community from all over the world, yet a higher density representation of users from the Global North was identified. Further implications are discussed in the paper.
R
Enhanced Spatial Disambiguation in the GeoVirus Dataset Using SNEToolkit
entrepot.recherche.data.gouv.fr
tsv
Updated Jan 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KAFANDO Rodrique; KAFANDO Rodrique; DECOUPES REMY; DECOUPES REMY; ROCHE Mathieu; ROCHE Mathieu; TEISSEIRE Maguelonne; TEISSEIRE Maguelonne (2024). Enhanced Spatial Disambiguation in the GeoVirus Dataset Using SNEToolkit [Dataset]. http://doi.org/10.57745/2RUX6W
Explore at:
tsv(185770)Available download formats
Unique identifier
https://doi.org/10.57745/2RUX6W
Dataset updated
Jan 12, 2024
Dataset provided by
Recherche Data Gouv
Authors
KAFANDO Rodrique; KAFANDO Rodrique; DECOUPES REMY; DECOUPES REMY; ROCHE Mathieu; ROCHE Mathieu; TEISSEIRE Maguelonne; TEISSEIRE Maguelonne
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
(English version below) Ce jeu de données est une version étendue de la base de données GeoVirus, qui comprend 229 articles de WikiNews sur les épidémies mondiales, dont les entités spatiales nommées (SNE) sont manuellement annotées par des experts, avec leurs coordonnées et noms. Nous avons intégré un processus automatique d'extraction et de désambiguïsation des SNE, lequel a été aligné avec les données annotées par les experts. Ce processus a impliqué une corrélation de 1,360 SNE identifiées à la fois dans notre extraction automatique et dans l'ensemble de données annoté par les experts, garantissant ainsi précision et cohérence dans l'identification spatiale. Le corpus résultant est une extension de la base GeoVirus originale, enrichie de trois colonnes supplémentaires présentant les annotations spatiales automatiques. GeoVirus dataset: Gritta, Milan, Mohammad Taher Pilehvar, and Nigel Collier. "Which melbourne? augmenting geocoding with maps." Association for Computational Linguistics, 2018. Ce jeu de données amélioré non seulement préserve l'intégrité des annotations expertes, mais démontre également l'efficacité de notre processus automatique, comme en témoigne le taux de rappel de 0.911 de notre approche, surpassant nettement le taux de rappel de 0.871 du géocodeur standard Geonames. Ce jeu de données comprend un seul fichier: sne_data.csv. Il est constitué des colonnes suivantes: source : Lien url WikiNews fourni dans les données GeoVirus input_sne: Entité nommée spatiale extraite avec Spacy à partir des données brutes (document). Utilisée comme entrée pour Geonames true_country_code: Code pays obtenu par géocodage inverse, basé sur la latitude et la longitude fournies output_sne: Entité nommée spatiale renvoyée par Geonames predicted_country_code: Code pays correspondant à l'output_sne après désambiguïsation disamb_phase: Correspond à la phase de désambiguïsation qui a aidé à désambiguïser l'entité nommée spatiale saisie -------- This dataset is an extended version of the GeoVirus database, which includes 229 WikiNews articles on global epidemics. Named Spatial Entities (SNEs) in these articles are manually annotated by experts, complete with their coordinates and names. We integrated an automated process for SNE extraction and disambiguation, aligning it with the data annotated by experts. This involved correlating 1,360 SNEs identified both in our automatic extraction and the expert-annotated dataset, ensuring precision and consistency in spatial identification. The resulting corpus is an extension of the original GeoVirus base, enriched with three additional columns presenting automatic spatial annotations. GeoVirus dataset reference: Gritta, Milan, Mohammad Taher Pilehvar, and Nigel Collier. "Which Melbourne? Augmenting geocoding with maps." Association for Computational Linguistics, 2018. This enhanced dataset not only preserves the integrity of expert annotations but also demonstrates the effectiveness of our automated process, evidenced by the recall rate of 0.911 in our approach, significantly surpassing the standard Geonames geocoder's recall rate of 0.871. The dataset consists of a single file: sne_data.csv. It includes the following columns: source: WikiNews URL provided in the GeoVirus data. input_sne: Spatial Named Entity extracted with Spacy from raw data (document). Used as input for Geonames. true_country_code: Country code obtained through reverse geocoding based on provided latitude and longitude. output_sne: Spatial Named Entity returned by Geonames. predicted_country_code: Country code corresponding to output_sne after disambiguation. disamb_phase: Corresponds to the disambiguation phase that aided in disambiguating the entered spatial named entity.

Facebook

Twitter

Click to copy link

Link copied

Cite

You Sheng (2022). Geolocation Data [Longitude Latitude] [Dataset]. https://www.kaggle.com/datasets/liewyousheng/geolocation

Geolocation Data [Longitude Latitude]

Location Geodata [Longitude Latitude] in CSV.

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 12, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

You Sheng

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Context

Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.

Each CSV has the 1. Longitude 2. Latitude

of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code

Content

Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061

Last Updated On : 29th January 2022

Source

https://github.com/dr5hn/countries-states-cities-database

Clear search

Close search

Google apps

Main menu

Geolocation Data [Longitude Latitude]

Context

Content

Source

MAR Web Geocoder User Guide

ckanext-resource-location

California Facilities Pollutant Emissions Data

Context

Content

Scripts

Offices on Aging csv Geocoded

Geoscape Geocoded National Address File (G-NAF)

License Information

What to Expect When You Download G-NAF

Mapping incident locations from a CSV file in a web map (video)

Senior Centers csv Geocoded

Additional resources for Kiva Crowdfunding

Context

Content

Acknowledgements

Geocoded variant of United Nations Code for Trade and Transport Locations

Fresno County, California Addresses

Metadata record for: Geocoding of worldwide patent data

Replication Data/Code for Route-based Geocoding of Traffic...

U.S. Federal Superfund Sites

Context

Content

Acknowledgements

Clark County, Nevada Addresses

Market Research Data | Global Map data | Geographic data | Address and Zip...

Metadata record for: GDIS, a global dataset of geocoded disaster locations

Global Zip Code Dataset (9M+) | Address Data | Country, Regions, Lat/Long,...

Data_Sheet_4_What Does Twitter Say About Self-Regulated Learning? Mapping...

Enhanced Spatial Disambiguation in the GeoVirus Dataset Using SNEToolkit

Geolocation Data [Longitude Latitude]

Location Geodata [Longitude Latitude] in CSV.

Context

Content

Source