100+ datasets found
  1. Geolocation Data [Longitude Latitude]

    • kaggle.com
    Updated Mar 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    You Sheng (2022). Geolocation Data [Longitude Latitude] [Dataset]. https://www.kaggle.com/datasets/liewyousheng/geolocation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    You Sheng
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.

    Each CSV has the 1. Longitude 2. Latitude

    of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code

    Content

    Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061

    Last Updated On : 29th January 2022

    Source

    https://github.com/dr5hn/countries-states-cities-database

  2. d

    MAR Web Geocoder User Guide

    • catalog.data.gov
    • opendata.dc.gov
    • +2more
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the Chief Technology Officer (2025). MAR Web Geocoder User Guide [Dataset]. https://catalog.data.gov/dataset/mar-web-gecoder-user-guide
    Explore at:
    Dataset updated
    Apr 16, 2025
    Dataset provided by
    Office of the Chief Technology Officer
    Description

    The MAR Web Geocoder is a web browser-based tool for geocoding locations, typically addresses, in Washington, DC. It is developed by the Office of Chief Technology Officer (OCTO) and can input Excel or CSV files to output an Excel file. Geocoding is the process of assigning a location in the form of geographic coordinates (often expressed as latitude and longitude) to spreadsheet data. This is done by comparing the descriptive geographic data to known geographic locations such as addresses, blocks, intersections, or place names.

  3. c

    ckanext-resource-location

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-resource-location [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-resource-location
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The resource-location extension for CKAN enhances data resources by automatically adding latitude and longitude coordinates to CSV files containing address data, using provided address, city and zipcode columns. This simplifies geocoding and location-based analysis directly within CKAN. The extension requires CKAN version 2.7.2 or higher. Key Features: Automated Geocoding: Automatically converts address data within CSV files into latitude and longitude coordinates during resource upload. Address Field Configuration: Allows users to specify the CSV column numbers corresponding to address, city, and zipcode fields. Coordinate Appending: Adds new columns to the CSV file containing the calculated latitude and longitude coordinates, preserving the original data. CSV Processing during Upload: Geocoding process is integrated directly into the resource upload workflow. Language Management: Offers translation support and instructions for adding new translations. How It Works: During CSV resource upload, the user is prompted to input column numbers corresponding to the address, city, and zipcode. Upon submission of the upload form, the extension processes the file, geocodes the addresses using these column values, and appends latitude and longitude as new columns to the CSV. This modified CSV file, now containing geographic coordinates, is stored as the resource. Benefits & Impact: By automatically adding geographic coordinates, the resource-location extension simplifies tasks such as mapping and spatial analysis of tabular data. This automated geocoding process enhances the usability and value of address-based datasets within CKAN.

  4. California Facilities Pollutant Emissions Data

    • kaggle.com
    zip
    Updated Nov 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florin Langer (2017). California Facilities Pollutant Emissions Data [Dataset]. https://www.kaggle.com/florinlanger/cal-facilities
    Explore at:
    zip(2602145 bytes)Available download formats
    Dataset updated
    Nov 21, 2017
    Authors
    Florin Langer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    California
    Description

    Context

    Created for use in the Renewable and Appropriate Energy Lab at UC Berkeley and Lawrence Berkeley National Laboratory.

    Content

    Geography: All 58 Counties of the American State of California

    Time period: 2015

    Unit of analysis: Tons per year

    Variables:

    • CO: County ID as numbered in the County dropdown menu on the California Air Resources Board Facility Search Tool
    • AB
    • FACID
    • DIS
    • FNAME
    • FSTREET
    • FCITY
    • FZIP
    • FSIC: Facility Standard Industrial Classification Code specified by the US Department of Labor
    • COID
    • DISN
    • CHAPIS
    • CERR_CODE
    • TOGT: Total organic gases consist of all hydrocarbons, i.e. compounds containing hydrogen and carbon with or without other chemical elements.
    • ROGT: Reactive organic gases include all the organic gases exclude methane, ethane, acetone, methyl acetate, methylated siloxanes, and number of low molecular weight halogenated organics that have a low rate of reactivity.
    • COT: The emissions of CO are for the single species, carbon monoxide.
    • NOXT: The emissions of NOx gases (mostly nitric oxide and nitrogen dioxide) are reported as equivalent amounts of NO2.
    • SOXT: The emissions of SOx gases (sulfur dioxide and sulfur trioxide) are reported as equivalent amounts of SO2.
    • PMT: Particulate matter refers to small solid and liquid particles such as dust, sand, salt spray, metallic and mineral particles, pollen, smoke, mist and acid fumes.
    • PM10T: PM10 refers to the fraction of particulate matter with an aerodynamic diameter of 10 micrometer and smaller. These particles are small enough to penetrate the lower respiratory tract.
    • PM2.5T: PM2.5 refers to the fraction of particulate matter with an aerodynamic diameter of 2.5 micrometer and smaller. These particles are small enough to penetrate the lower respiratory tract.
    • lat: Facility latitude geocoded by inputting FSTREET, FCITY, California FZIP into Bing’s geocoding service.
    • lon: Facility longitude geocoded in the same way.

    Sources: All columns except for lat and lon were scraped from the California Air Resources Board Facility Search Tool using the Request module from Python’s Urllib library. The script used is included below in scripts in case you would like to get additional columns.

    The lat and lon columns were geocoded using the Geocoder library for Python with the Bing provider.

    Scripts

    download.py

    import pandas as pd
    out_dir = 'ARB/'
    file_ext = '.csv'
    for i in range(1, 59):
      facilities = pd.read_csv("https://www.arb.ca.gov/app/emsinv/facinfo/faccrit_output.csv?&dbyr=2015&ab_=&dis_=&co_=" + str(i) + "&fname_=&city_=&sort=FacilityNameA&fzip_=&fsic_=&facid_=&all_fac=C&chapis_only=&CERR=&dd=")
      for index, row in facilities.iterrows():
        curr_facility = pd.read_csv("https://www.arb.ca.gov/app/emsinv/facinfo/facdet_output.csv?&dbyr=2015&ab_=" + str(row['AB']) + "&dis_=" + str(row['DIS']) + "&co_=" + str(row['CO']) + "&fname_=&city_=&sort=C&fzip_=&fsic_=&facid_=" + str(row['FACID']) + "&all_fac=&chapis_only=&CERR=&dd=")
        facilities.set_value(index, 'PM2.5T', curr_facility.loc[curr_facility['POLLUTANT NAME'] == 'PM2.5'].iloc[0]['EMISSIONS_TONS_YR'])
      facilities.to_csv(out_dir + str(i) + file_ext)
    

    geocode.py

    import geocoder
    import csv
    directory = 'ARB/'
    outdirectory = 'ARB_OUT/'
    for i in range(1, 59):
      with open(directory + str(i) + ".csv", 'rb') as csvfile, open(outdirectory + str(i) + '.csv', 'a') as csvout:
        reader = csv.DictReader(csvfile)
        fieldnames = reader.fieldnames + ['lat'] + ['lon'] # Add new columns
        writer = csv.DictWriter(csvout, fieldnames)
        writer.writeheader()
        for row in reader:
          address = row['FSTREET'] + ', ' + row['FCITY'] + ', California ' + row['FZIP']
          g = geocoder.bing(address, key='API_KEY')
          newrow = dict(row)
          if g.latlng:
            newrow['lat'] = g.json['lat']
            newrow['lon'] = g.json['lng']
            writer.writerow(newrow) # Only write row if successfully geocoded
    
  5. a

    Offices on Aging csv Geocoded

    • njogis-newjersey.opendata.arcgis.com
    • hub.arcgis.com
    • +1more
    Updated Aug 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Middlesex County, NJ (2021). Offices on Aging csv Geocoded [Dataset]. https://njogis-newjersey.opendata.arcgis.com/maps/middlesexcounty::offices-on-aging-csv-geocoded
    Explore at:
    Dataset updated
    Aug 5, 2021
    Dataset authored and provided by
    Middlesex County, NJ
    Area covered
    Description

    Offices_on_Aging_csv_Geocoded

  6. d

    Geoscape Geocoded National Address File (G-NAF)

    • data.gov.au
    • researchdata.edu.au
    • +1more
    pdf, zip
    Updated Aug 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Industry, Science and Resources (DISR) (2025). Geoscape Geocoded National Address File (G-NAF) [Dataset]. https://data.gov.au/data/dataset/geocoded-national-address-file-g-naf
    Explore at:
    pdf(383741), pdf, zip(1695191699), zip(1691304483)Available download formats
    Dataset updated
    Aug 18, 2025
    Dataset authored and provided by
    Department of Industry, Science and Resources (DISR)
    Description

    Geoscape G-NAF is the geocoded address database for Australian businesses and governments. It’s the trusted source of geocoded address data for Australia with over 50 million contributed addresses distilled into 15.4 million G-NAF addresses. It is built and maintained by Geoscape Australia using independently examined and validated government data.

    From 22 August 2022, Geoscape Australia is making G-NAF available in an additional simplified table format. G-NAF Core makes accessing geocoded addresses easier by utilising less technical effort.

    G-NAF Core will be updated on a quarterly basis along with G-NAF.

    Further information about contributors to G-NAF is available here.

    With more than 15 million Australian physical address record, G-NAF is one of the most ubiquitous and powerful spatial datasets. The records include geocodes, which are latitude and longitude map coordinates. G-NAF does not contain personal information or details relating to individuals.

    Updated versions of G-NAF are published on a quarterly basis. Previous versions are available here

    Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.

    Changes in the August 2025 release

    • Nationally, the August 2025 update of G-NAF shows an overall increase of 40,716 addresses (0.30%). The total number of addresses in G-NAF now stands at 15,794,643 of which 14,950,491 or 94.66% are principal.

    • In the ACT, there have been minor updates to the address parsing of flat-numbered addresses aimed at: improving the address representation of flat-numbered addresses; improving address coverage; and improving address alignment between contributors. This change affects approximately 4,000 addresses.

    • A small number of additional address sites have implemented the use of the BUILDING_NAME attribute as part of the merge criteria to improve address coverage for flat-numbered addresses in NSW and QLD. These changes have resulted in the creation of approximately 400 addresses in NSW and 120 in QLD.

    • A focus has been applied to Tasmanian street-locality addresses to reduce the number of these addresses. For the August 2025 release, there is a reduction of some 900 street-locality addresses in Tasmania.

    • Geoscape has moved product descriptions, guides and reports online to https://docs.geoscape.com.au.

    Further information on G-NAF, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on G-NAF, including software solutions, consultancy and support.

    Additional information: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia.

    License Information

    Use of the G-NAF downloaded from data.gov.au is subject to the End User Licence Agreement (EULA)

    The EULA terms are based on the Creative Commons Attribution 4.0 International license (CC BY 4.0). However, an important restriction relating to the use of the open G-NAF for the sending of mail has been added.

    The open G-NAF data must not be used for the generation of an address or the compilation of an address for the sending of mail unless the user has verified that each address to be used for the sending of mail is capable of receiving mail by reference to a secondary source of information. Further information on this use restriction is available here.

    End users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).

    Users must also note the following attribution requirements:

    Preferred attribution for the Licensed Material:

    _G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the _Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

    Preferred attribution for Adapted Material:

    Incorporates or developed using G-NAF © Geoscape Australia licensed by the Commonwealth of Australia under the Open Geo-coded National Address File (G-NAF) End User Licence Agreement.

    What to Expect When You Download G-NAF

    G-NAF is a complex and large dataset (approximately 5GB unpacked), consisting of multiple tables that will need to be joined prior to use. The dataset is primarily designed for application developers and large-scale spatial integration. Users are advised to read the technical documentation, including product change notices and the individual product descriptions before downloading and using the product. A quick reference guide on unpacking the G-NAF is also available.

  7. A

    Mapping incident locations from a CSV file in a web map (video)

    • data.amerigeoss.org
    esri rest, html
    Updated Mar 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ESRI (2020). Mapping incident locations from a CSV file in a web map (video) [Dataset]. https://data.amerigeoss.org/zh_CN/dataset/mapping-incident-locations-from-a-csv-file-in-a-web-map-video
    Explore at:
    esri rest, htmlAvailable download formats
    Dataset updated
    Mar 17, 2020
    Dataset provided by
    ESRI
    Description

    Mapping incident locations from a CSV file in a web map (YouTube video).


    View this short demonstration video to learn how to geocode incident locations from a spreadsheet in ArcGIS Online. In this demonstration, the presenter drags a simple .csv file into a browser-based Web Map and maps the appropriate address fields to display incident points allowing different types of spatial overlays and analysis.

    _

    Communities around the world are taking strides in mitigating the threat that COVID-19 (coronavirus) poses. Geography and location analysis have a crucial role in better understanding this evolving pandemic.

    When you need help quickly, Esri can provide data, software, configurable applications, and technical support for your emergency GIS operations. Use GIS to rapidly access and visualize mission-critical information. Get the information you need quickly, in a way that’s easy to understand, to make better decisions during a crisis.

    Esri’s Disaster Response Program (DRP) assists with disasters worldwide as part of our corporate citizenship. We support response and relief efforts with GIS technology and expertise.


  8. a

    Senior Centers csv Geocoded

    • hub.arcgis.com
    • njogis-newjersey.opendata.arcgis.com
    Updated Aug 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Middlesex County, NJ (2021). Senior Centers csv Geocoded [Dataset]. https://hub.arcgis.com/datasets/middlesexcounty::senior-centers-and-offices-on-aging/explore?layer=3&showTable=true
    Explore at:
    Dataset updated
    Aug 5, 2021
    Dataset authored and provided by
    Middlesex County, NJ
    Area covered
    Description

    Senior_Centers_csv_Geocoded

  9. Additional resources for Kiva Crowdfunding

    • kaggle.com
    zip
    Updated Apr 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke (2018). Additional resources for Kiva Crowdfunding [Dataset]. https://www.kaggle.com/forums/f/26443/additional-resources-for-kiva-crowdfunding/t/54374/dataset-suggestion
    Explore at:
    zip(104671314 bytes)Available download formats
    Dataset updated
    Apr 12, 2018
    Authors
    Luke
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains the locations found in the Kiva datasets included in an administrative or geographical region. You can also find poverty data about this region. This facilitates answering some of the tough questions about a region's poverty.

    Content

    In the interest of preserving the original names and spelling for the locations/countries/regions all the data is in Excel format and has no preview (I think only the Kaggle recommended file types have preview - if anyone can show me how to do this for an xlsx file, it will be greatly appreciated)

    The Tables datasets contain the most recent analysis of the MPI on countries and regions. These datasets are updated regularly. In unique regions_names_from_google_api you will find 3 levels of inclusion for every geocode provided in Kiva datasets. (village/town, administrative region, sub-national region - which can be administrative or geographical). These are the results from the Google API Geocoding process.

    Files:

    • all_kiva_loans.csv

    Dropped multiple columns, kept all the rows from loans.csv with names, tags, descriptions and got a csv file of 390MB instead of 2.13 GB. Basically is a simplified version of loans.csv (originally included in the analysis by beluga)

    • country_stats.csv
    1. population source: https://en.wikipedia.org/wiki/List_of_countries_by_population_(United_Nations)
    2. population_below_poverty_line: Percentage
    3. hdi: Human Development Index
    4. life_expectancy: Life expectancy at birth
    5. expected_years_of_schooling: Expected years of schooling
    6. mean_years_of_schooling: Mean years of schooling
    7. gni: Gross national income (GNI) per capita This dataset was originally created by beluga.
    • all_loan_theme_merged_with_geo_mpi_regions.xlsx

    This is the loan_themes_by_region left joined with Tables_5.3_Contribution_of_Deprivations. (all the original entries from loan_themes and only the entries that match from Tables_5; for the regions that lack MPI data, you will find Nan)

    These are the columns in the database:

    1. Partner ID
    2. Field Partner
    3. Name
    4. sector
    5. Loan Theme ID
    6. Loan Theme Type
    7. Country
    8. forkiva
    9. number
    10. amount
    11. geo
    12. rural_pct
    13. City
    14. Administrative region
    15. Sub-national region
    16. ISO
    17. World region
    18. Population Share of the Region (%)
    19. region MPI
    20. Education (%)
    21. Health (%)
    22. Living standards (%)
    23. Schooling (%)
    24. Child school attendance (%)
    25. Child Mortality (%)
    26. Nutrition (%)
    27. Electricity (%)
    28. Improved sanitation (%)
    29. Drinking water (%)
    30. Floor (%)
    31. Cooking fuel (%)
    32. Asset ownership (%)
    • mpi_on_regions.xlsx

    Matched the loans in loan_themes_by_region with the regions that have info regarding MPI. This dataset brings together the amount invested in a region and the biggest problems the said region has to deal with. It is a join between the loan_themes_by_region provided by Kiva and Tables 5.3 Contribution_of_Deprivations.

    It is a subset of the all_loan_theme_merged_with_geo_mpi_regions.xlsx, which contains only the entries that I could match with poverty decomposition data. It has the same columns.

    • Tables_5_SubNational_Decomposition_MPI_2017-18.xlsx

    Multidimensional poverty index decomposition for over 1000 regions part of 79 countries.

    Table 5.3: Contribution of deprivations to the MPI, by sub-national regions
    This table shows which dimensions and indicators contribute most to a region's MPI, which is useful for understanding the major source(s) of deprivation in a sub-national region.

    Source: http://ophi.org.uk/multidimensional-poverty-index/global-mpi-2016/

    • Tables_7_MPI_estimations_country_levels.xlsx

    MPI decomposition for 120 countries.

    Table 7 All Published MPI Results since 2010
    The table presents an archive of all MPI estimations published over the past 5 years, together with MPI, H, A and censored headcount ratios. For comparisons over time please use Table 6, which is strictly harmonised. The full set of data tables for each year published (Column A), is found on the 'data tables' page under 'Archive'.

    The data in this file is shown in interactive plots on Oxford Poverty and Human Development Initiative website. http://www.dataforall.org/dashboard/ophi/index.php/

    • unique_regions_from_kiva_loan_themes.xlsx

    These are all the regions corresponding to the geocodes found in Kiva's loan_themes_by_region. There are 718 unique entries, that you can join with any database from Kiva that has either a coordinates or region column.
    Columns:

    • geo: pair of Lat, Lon (from loan_themes_by_region)

    • City: name of the city (has the most NaN's)

    • Administrative region: first level of administrative inclusion for the city/location; (the equivalent of county for US)

    • Sub-national region: second level of administrative inclusion for the geo pair. (like state for US)

    • Country: name of the country

    Acknowledgements

    Thanks to Shane Lynn for the batch geocoding and to Joseph Deferio for reverse geocoding:

    https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/

    https://github.com/jdeferio/Reverse_Geocode

    The MPI datasets you can find on the Oxford website (http://ophi.org.uk/) under Research.

    "Citation: Alkire, S. and Kanagaratnam, U. (2018)

    “Multidimensional Poverty Index Winter 2017-18: Brief methodological note and results.” Oxford Poverty and Human Development Initiative, University of Oxford, OPHI Methodological Notes 45."

  10. Geocoded variant of United Nations Code for Trade and Transport Locations

    • zenodo.org
    bin
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fedor Baart; Fedor Baart; Bas Turpijn; Rolien van der Mark; Guus van Hemert; Bas Turpijn; Rolien van der Mark; Guus van Hemert (2024). Geocoded variant of United Nations Code for Trade and Transport Locations [Dataset]. http://doi.org/10.5281/zenodo.11191511
    Explore at:
    binAvailable download formats
    Dataset updated
    May 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fedor Baart; Fedor Baart; Bas Turpijn; Rolien van der Mark; Guus van Hemert; Bas Turpijn; Rolien van der Mark; Guus van Hemert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Nations
    Description
    The United Nations Code for Trade and Transport Locations dataset provides information on the codes known as "UN/LOCODE".
    This version of the dataset, based on the csv files of version 232, has extra information based on pycountry (names of administrative regions, country names, flags) and on most important on location.
    The dataset has the following extra fields:
    Column nameExampleDescriptionSource
    alpha_2_country
    AD
    2 letter based country code, ISO 3166-1pycountry
    alpha_3_country
    AND
    3 letter based country code, ISO 3166-1pycountry
    flag_country
    🇦🇩
    unicode flagpycountry
    name_country
    Andorra
    Country name, ISO 3166-1pycountry
    numeric_country
    020
    Country codepycountry
    official_name_country
    Principality of Andorra
    Country official namepycountry
    score
    95Confidence code of geocoding [0-100/nan]ArcGIS
    geocoded
    TrueWas the coordinate derived from geocoding [true/false]ArcGIS
    geometry
    POINT (1.516666666666667 42.5)
    Point coordinateArcGIS if geocoded else original UNLO

    The figure below shows the locations (n=23663) that have been geocoded in this dataset in green and the existing locations in pink.

    Ownership of the original data recides at UNLO and the participants.

  11. K

    Fresno County, California Addresses

    • koordinates.com
    csv, dwg, geodatabase +6
    Updated Jun 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fresno County, California (2023). Fresno County, California Addresses [Dataset]. https://koordinates.com/layer/109482-fresno-county-california-addresses/
    Explore at:
    geopackage / sqlite, csv, geodatabase, mapinfo mif, kml, shapefile, dwg, mapinfo tab, pdfAvailable download formats
    Dataset updated
    Jun 26, 2023
    Dataset authored and provided by
    Fresno County, California
    Area covered
    Description

    Geospatial data about Fresno County, California Addresses. Export to CAD, GIS, PDF, CSV and access via API.

  12. Metadata record for: Geocoding of worldwide patent data

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaétan de Rassenfosse; Jan Kozak; Florian Seliger (2023). Metadata record for: Geocoding of worldwide patent data [Dataset]. http://doi.org/10.6084/m9.figshare.9970454.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Gaétan de Rassenfosse; Jan Kozak; Florian Seliger
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains key characteristics about the data described in the Data Descriptor Geocoding of worldwide patent data. Contents:

        1. human readable metadata summary table in CSV format
    
    
        2. machine readable metadata file in JSON format 
    
    
          Versioning Note:Version 2 was generated when the metadata format was updated from JSON to JSON-LD. This was an automatic process that changed only the format, not the contents, of the metadata.
    
  13. Replication Data/Code for Route-based Geocoding of Traffic...

    • figshare.com
    csv
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saif Ali (2025). Replication Data/Code for Route-based Geocoding of Traffic congestion-Related Social Media Texts on a Complex Network (Manuscript ID IJGIS-2024-1073) [Dataset]. http://doi.org/10.6084/m9.figshare.28210757.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Saif Ali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Instructions (with screenshots) to replicate results from Section 3 of the manuscript are available in "Step-by-step Instructions to Replicate Results.pdf".-------------------------------------------------------------------------------------------------------------------Step 1: Download the replication materialsDownload the whole replication folder on figshare containing the code, data and replication files.Step 2: Replicate Tables in Section 3All of the data is available inside the sub-folder replication/Data. To replicate Tables 1 and 2 from section 3 of the manuscript run the Python file replicate_section3_tables.py locally on your computer. This will produce two .csv files containing Tables 1 and 2 (already provided). Note that it is not necessary to run the code in order to replicate the tables. The output data needed for replication is provided.Step 3: Replicate Figures in QGISThe Figures must be replicated using QGIS, freely available at https://www.qgis.org/. Open the QGIS project replicate_figures.qgz inside the replication/Replicate Figures sub-folder. It should auto-find the layer data. The Figures are replicated as layers in the project. Step 4: Running the code from scratchThe accompanying code for the manuscript IJGIS-2024-1305, entitled "Route-based Geocoding of Traffic Congestion-Related Social Media Texts on a Complex Network" runs on Google Colab as Python notebooks. Please follow the instructions below to run the entire geocoder and network mapper from scratch. The expected running time is of the order of 10 hours on free tier Google Colab. 4a) Upload to Google DriveUpload the entire replication folder to your Google Drive. Note the path (location) to which you have uploaded it. There are two Google Colab notebooks that need to be executed in their entirety. These are Code/Geocoder/The_Geocoder.ipynb and Code/Complex_Network/Complex_network_code.ipynb. They need to be run in order (Geocoder first and Complex Network second). 4b) Set the path In each Google Colab notebook, you have to set the variable called “REPL_PATH” to the location on your Google Drive where you uploaded the replication folder. Include the replication folder in the path. For example "/content/drive/MyDrive/replication"4c) Run the codeThe code is available in two sub-folders, replication/Code/Geocoder and replication/Code/Complex_Network. You may simply open the Google Colab notebooks inside each folder, mount your Google Drive, set the path and run all cells.

  14. U.S. Federal Superfund Sites

    • kaggle.com
    Updated Nov 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    4d4stra (2017). U.S. Federal Superfund Sites [Dataset]. https://www.kaggle.com/srrobert50/federal-superfunds/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2017
    Dataset provided by
    Kaggle
    Authors
    4d4stra
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Federal Superfund sites are some of the most polluted in the United States. This dataset contains a multifaceted view of Superfunds, including free-form text descriptions, geography, demographics and socioeconomics.

    Content

    The core data was scraped from the National Priorities List (NPL) provided by the U.S. Environmental Protection Agency (EPA). This table provides basic information such as site name, site score, date added, and links to a site description and current status. Apache Tika was used to extract text from the site description pdfs. The addresses were scraped from site status pages, and used to geocode to latitude and longitude and Census block group. The block group assignment was used to join with the Census Bureau's planning database, a rich source of nationwide demographic and socioeconomic data. The full source code used to generate the data can be found here, on github.

    I have provided three separate downloads to explore:

    • priorities_list_full.json: the NPL containing all geographic, site information, text descriptions, and Census Bureau data from the relevant block groups.
    • pdb_tract.csv: the planning database aggregated on the tract level with an additional indicator (has_superfund) noting whether or not the tract contains the address of a Superfund site.
    • pdb_block_group.csv: the planning database aggregated on the block group level with an additional indicator (has_superfund) noting whether or not the block group contains the address of a Superfund site.

    Some caveats:

    1. The planning database contains 300+ columns. For a full description of these columns, please see the documentation here.
    2. Since the Google geocoder is relatively aggressive in providing address matches, geocoding was done through a hierarchy of queries (full address, city-state-zip, and zipcode only) to prevent gross errors. The address string used to geocode is noted through the 'geocode_source' column.
    3. While this data is linked to demographic and socioeconomic data based on either the block group (tract for pdb_tract.csv), the impacts of a particular site's pollution may extend beyond these geographic regions.

    Acknowledgements

    I would like to thank the EPA and the Census Bureau for making such detailed information publicly available. For relevant academic work, please see Burwell-Naney et al. (2013) and references, both to and therein.

    Please let me know if you have any suggestions for improving the dataset!

  15. K

    Clark County, Nevada Addresses

    • koordinates.com
    csv, dwg, geodatabase +6
    Updated Jan 16, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clark County, Nevada (2007). Clark County, Nevada Addresses [Dataset]. https://koordinates.com/layer/109477-clark-county-nevada-addresses/
    Explore at:
    pdf, shapefile, dwg, mapinfo tab, geodatabase, csv, geopackage / sqlite, mapinfo mif, kmlAvailable download formats
    Dataset updated
    Jan 16, 2007
    Dataset authored and provided by
    Clark County, Nevada
    Area covered
    Description

    Geospatial data about Clark County, Nevada Addresses. Export to CAD, GIS, PDF, CSV and access via API.

  16. d

    Market Research Data | Global Map data | Geographic data | Address and Zip...

    • datarade.ai
    .csv
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoPostcodes (2024). Market Research Data | Global Map data | Geographic data | Address and Zip Code Database | Geocoded [Dataset]. https://datarade.ai/data-products/geopostcodes-market-research-data-map-data-geographic-dat-geopostcodes
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Oct 19, 2024
    Dataset authored and provided by
    GeoPostcodes
    Area covered
    Saint Barthélemy, Poland, South Sudan, Papua New Guinea, Christmas Island, Monaco, Korea (Democratic People's Republic of), Tokelau, Slovenia, Sierra Leone
    Description

    A global self-hosted Market Research dataset containing all administrative divisions, cities, addresses, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.

    Use cases for the Global Zip Code Database (Market Research data)

    • Address capture and validation

    • Map and visualization

    • Reporting and Business Intelligence (BI)

    • Master Data Mangement

    • Logistics and Supply Chain Management

    • Sales and Marketing

    Data export methodology

    Our map data packages are offered in variable formats, including .csv. All geographic data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.

    Product Features

    • Fully and accurately geocoded

    • Administrative areas with a level range of 0-4

    • Multi-language support including address names in local and foreign languages

    • Comprehensive city definitions across countries

    For additional insights, you can combine the map data with:

    • UNLOCODE and IATA codes

    • Time zones and Daylight Saving Times

    Why do companies choose our Market Research databases

    • Enterprise-grade service

    • Reduce integration time and cost by 30%

    • Weekly updates for the highest quality

    Note: Custom geographic data packages are available. Please submit a request via the above contact button for more details.

  17. Metadata record for: GDIS, a global dataset of geocoded disaster locations

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scientific Data Curation Team (2023). Metadata record for: GDIS, a global dataset of geocoded disaster locations [Dataset]. http://doi.org/10.6084/m9.figshare.13177022.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Scientific Data Curation Team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains key characteristics about the data described in the Data Descriptor GDIS, a global dataset of geocoded disaster locations. Contents:

        1. human readable metadata summary table in CSV format
    
    
        2. machine readable metadata file in JSON format
    
  18. d

    Global Zip Code Dataset (9M+) | Address Data | Country, Regions, Lat/Long,...

    • datarade.ai
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoPostcodes (2024). Global Zip Code Dataset (9M+) | Address Data | Country, Regions, Lat/Long, City | Weekly Updated [Dataset]. https://datarade.ai/data-products/geopostcodes-zip-code-data-global-coverage-8-6-m-zip-code-geopostcodes
    Explore at:
    .csv, .geojson, .kmlAvailable download formats
    Dataset updated
    Jun 14, 2024
    Dataset authored and provided by
    GeoPostcodes
    Area covered
    United States
    Description

    A global self-hosted location dataset containing all administrative divisions, cities, and zip codes for 247 countries. All geospatial data is updated weekly to maintain the highest data quality, including challenging countries such as China, Brazil, Russia, and the United Kingdom.

    Use cases for the Global Zip Code Database (Geospatial data)

    • Address capture and validation

    • Map and visualization

    • Reporting and Business Intelligence (BI)

    • Master Data Mangement

    • Logistics and Supply Chain Management

    • Sales and Marketing

    Data export methodology

    Our location data packages are offered in variable formats, including .csv. All geospatial data are optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more.

    Product Features

    • Fully and accurately geocoded

    • Administrative areas with a level range of 0-4

    • Multi-language support including address names in local and foreign languages

    • Comprehensive city definitions across countries

    For additional insights, you can combine the map data with:

    • UNLOCODE and IATA codes

    • Time zones and Daylight Saving Times

    Why do companies choose our location databases

    • Enterprise-grade service

    • Reduce integration time and cost by 30%

    • Weekly updates for the highest quality

    Note: Custom geospatial data packages are available. Please submit a request via the above contact button for more details.

  19. f

    Data_Sheet_4_What Does Twitter Say About Self-Regulated Learning? Mapping...

    • frontiersin.figshare.com
    txt
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Khalil; Gleb Belokrys (2023). Data_Sheet_4_What Does Twitter Say About Self-Regulated Learning? Mapping Tweets From 2011 to 2021.CSV [Dataset]. http://doi.org/10.3389/fpsyg.2022.820813.s004
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Mohammad Khalil; Gleb Belokrys
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Social network services such as Twitter are important venues that can be used as rich data sources to mine public opinions about various topics. In this study, we used Twitter to collect data on one of the most growing theories in education, namely Self-Regulated Learning (SRL) and carry out further analysis to investigate What Twitter says about SRL? This work uses three main analysis methods, descriptive, topic modeling, and geocoding analysis. The searched and collected dataset consists of a large volume of relevant SRL tweets equal to 54,070 tweets between 2011 and 2021. The descriptive analysis uncovers a growing discussion on SRL on Twitter from 2011 till 2018 and then markedly decreased till the collection day. For topic modeling, the text mining technique of Latent Dirichlet allocation (LDA) was applied and revealed insights on computationally processed topics. Finally, the geocoding analysis uncovers a diverse community from all over the world, yet a higher density representation of users from the Global North was identified. Further implications are discussed in the paper.

  20. R

    Enhanced Spatial Disambiguation in the GeoVirus Dataset Using SNEToolkit

    • entrepot.recherche.data.gouv.fr
    tsv
    Updated Jan 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KAFANDO Rodrique; KAFANDO Rodrique; DECOUPES REMY; DECOUPES REMY; ROCHE Mathieu; ROCHE Mathieu; TEISSEIRE Maguelonne; TEISSEIRE Maguelonne (2024). Enhanced Spatial Disambiguation in the GeoVirus Dataset Using SNEToolkit [Dataset]. http://doi.org/10.57745/2RUX6W
    Explore at:
    tsv(185770)Available download formats
    Dataset updated
    Jan 12, 2024
    Dataset provided by
    Recherche Data Gouv
    Authors
    KAFANDO Rodrique; KAFANDO Rodrique; DECOUPES REMY; DECOUPES REMY; ROCHE Mathieu; ROCHE Mathieu; TEISSEIRE Maguelonne; TEISSEIRE Maguelonne
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    (English version below) Ce jeu de données est une version étendue de la base de données GeoVirus, qui comprend 229 articles de WikiNews sur les épidémies mondiales, dont les entités spatiales nommées (SNE) sont manuellement annotées par des experts, avec leurs coordonnées et noms. Nous avons intégré un processus automatique d'extraction et de désambiguïsation des SNE, lequel a été aligné avec les données annotées par les experts. Ce processus a impliqué une corrélation de 1,360 SNE identifiées à la fois dans notre extraction automatique et dans l'ensemble de données annoté par les experts, garantissant ainsi précision et cohérence dans l'identification spatiale. Le corpus résultant est une extension de la base GeoVirus originale, enrichie de trois colonnes supplémentaires présentant les annotations spatiales automatiques. GeoVirus dataset: Gritta, Milan, Mohammad Taher Pilehvar, and Nigel Collier. "Which melbourne? augmenting geocoding with maps." Association for Computational Linguistics, 2018. Ce jeu de données amélioré non seulement préserve l'intégrité des annotations expertes, mais démontre également l'efficacité de notre processus automatique, comme en témoigne le taux de rappel de 0.911 de notre approche, surpassant nettement le taux de rappel de 0.871 du géocodeur standard Geonames. Ce jeu de données comprend un seul fichier: sne_data.csv. Il est constitué des colonnes suivantes: source : Lien url WikiNews fourni dans les données GeoVirus input_sne: Entité nommée spatiale extraite avec Spacy à partir des données brutes (document). Utilisée comme entrée pour Geonames true_country_code: Code pays obtenu par géocodage inverse, basé sur la latitude et la longitude fournies output_sne: Entité nommée spatiale renvoyée par Geonames predicted_country_code: Code pays correspondant à l'output_sne après désambiguïsation disamb_phase: Correspond à la phase de désambiguïsation qui a aidé à désambiguïser l'entité nommée spatiale saisie -------- This dataset is an extended version of the GeoVirus database, which includes 229 WikiNews articles on global epidemics. Named Spatial Entities (SNEs) in these articles are manually annotated by experts, complete with their coordinates and names. We integrated an automated process for SNE extraction and disambiguation, aligning it with the data annotated by experts. This involved correlating 1,360 SNEs identified both in our automatic extraction and the expert-annotated dataset, ensuring precision and consistency in spatial identification. The resulting corpus is an extension of the original GeoVirus base, enriched with three additional columns presenting automatic spatial annotations. GeoVirus dataset reference: Gritta, Milan, Mohammad Taher Pilehvar, and Nigel Collier. "Which Melbourne? Augmenting geocoding with maps." Association for Computational Linguistics, 2018. This enhanced dataset not only preserves the integrity of expert annotations but also demonstrates the effectiveness of our automated process, evidenced by the recall rate of 0.911 in our approach, significantly surpassing the standard Geonames geocoder's recall rate of 0.871. The dataset consists of a single file: sne_data.csv. It includes the following columns: source: WikiNews URL provided in the GeoVirus data. input_sne: Spatial Named Entity extracted with Spacy from raw data (document). Used as input for Geonames. true_country_code: Country code obtained through reverse geocoding based on provided latitude and longitude. output_sne: Spatial Named Entity returned by Geonames. predicted_country_code: Country code corresponding to output_sne after disambiguation. disamb_phase: Corresponds to the disambiguation phase that aided in disambiguating the entered spatial named entity.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
You Sheng (2022). Geolocation Data [Longitude Latitude] [Dataset]. https://www.kaggle.com/datasets/liewyousheng/geolocation
Organization logo

Geolocation Data [Longitude Latitude]

Location Geodata [Longitude Latitude] in CSV.

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
You Sheng
License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Context

Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.

Each CSV has the 1. Longitude 2. Latitude

of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code

Content

Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061

Last Updated On : 29th January 2022

Source

https://github.com/dr5hn/countries-states-cities-database

Search
Clear search
Close search
Google apps
Main menu