80 datasets found

w
Dataset of cities in the United States
workwithdata.com
Updated Nov 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of cities in the United States [Dataset]. https://www.workwithdata.com/datasets/cities?f=1&fcol0=country&fop0=%3D&fval0=United+States
Explore at:
Dataset updated
Nov 7, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This dataset is about cities in the United States. It has 4,171 rows. It features 7 columns including country, population, latitude, and longitude.
World cities database
kaggle.com
Updated May 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juanma Hernández (2025). World cities database [Dataset]. http://doi.org/10.34740/kaggle/dsv/11944536
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11944536
Dataset updated
May 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Juanma Hernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is from:

https://simplemaps.com/data/world-cities

We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.

Our database is:

Up-to-date: It was last refreshed on May 11, 2025.

Comprehensive: Over 4 million unique cities and towns from every country in the world (about 48 thousand in basic database).

Accurate: Cleaned and aggregated from official sources. Includes latitude and longitude coordinates.

Simple: A single CSV file, concise field names, only one entry per city.
US ZIP codes to longitude and latitude
redivis.com
application/jsonl +7
Updated Nov 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2019). US ZIP codes to longitude and latitude [Dataset]. http://doi.org/10.57761/5tpn-br04
Explore at:
stata, csv, arrow, sas, spss, parquet, application/jsonl, avroAvailable download formats
Unique identifier
https://doi.org/10.57761/5tpn-br04
Dataset updated
Nov 26, 2019
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 1999 - Dec 31, 2000
Description
Abstract

A crosswalk table from US postal ZIP codes to geo-points (latitude, longitude)

Documentation

Data source: public.opendatasoft.

The ZIP code database contained in 'zipcode.csv' contains 43204 ZIP codes for the continental United States, Alaska, Hawaii, Puerto Rico, and American Samoa. The database is in comma separated value format, with columns for ZIP code, city, state, latitude, longitude, timezone (offset from GMT), and daylight savings time flag (1 if DST is observed in this ZIP code and 0 if not).

This database was composed using ZIP code gazetteers from the US Census Bureau from 1999 and 2000, augmented with additional ZIP code information The database is believed to contain over 98% of the ZIP Codes in current use in the United States. The remaining ZIP Codes absent from this database are entirely PO Box or Firm ZIP codes added in the last five years, which are no longer published by the Census Bureau, but in any event serve a very small minority of the population (probably on the order of .1% or less). Although every attempt has been made to filter them out, this data set may contain up to .5% false positives, that is, ZIP codes that do not exist or are no longer in use but are included due to erroneous data sources. The latitude and longitude given for each ZIP code is typically (though not always) the geographic centroid of the ZIP code; in any event, the location given can generally be expected to lie somewhere within the ZIP code's "boundaries".The ZIP code database contained in 'zipcode.csv' contains 43204 ZIP codes for the continental United States, Alaska, Hawaii, Puerto Rico, and American Samoa. The database is in comma separated value format, with columns for ZIP code, city, state, latitude, longitude, timezone (offset from GMT), and daylight savings time flag (1 if DST is observed in this ZIP code and 0 if not). This database was composed using ZIP code gazetteers from the US Census Bureau from 1999 and 2000, augmented with additional ZIP code information The database is believed to contain over 98% of the ZIP Codes in current use in the United States. The remaining ZIP Codes absent from this database are entirely PO Box or Firm ZIP codes added in the last five years, which are no longer published by the Census Bureau, but in any event serve a very small minority of the population (probably on the order of .1% or less). Although every attempt has been made to filter them out, this data set may contain up to .5% false positives, that is, ZIP codes that do not exist or are no longer in use but are included due to erroneous data sources. The latitude and longitude given for each ZIP code is typically (though not always) the geographic centroid of the ZIP code; in any event, the location given can generally be expected to lie somewhere within the ZIP code's "boundaries".

The database and this README are copyright 2004 CivicSpace Labs, Inc., and are published under a Creative Commons Attribution-ShareAlike license, which requires that all updates must be released under the same license. See http://creativecommons.org/licenses/by-sa/2.0/ for more details. Please contact schuyler@geocoder.us if you are interested in receiving updates to this database as they become available.The database and this README are copyright 2004 CivicSpace Labs, Inc., and are published under a Creative Commons Attribution-ShareAlike license, which requires that all updates must be released under the same license. See http://creativecommons.org/licenses/by-sa/2.0/ for more details. Please contact schuyler@geocoder.us if you are interested in receiving updates to this database as they become available.
n
A dataset of 5 million city trees from 63 US cities: species, location,...
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated Aug 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz (2022). A dataset of 5 million city trees from 63 US cities: species, location, nativity status, health, and more. [Dataset]. http://doi.org/10.5061/dryad.2jm63xsrf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2jm63xsrf
Dataset updated
Aug 31, 2022
Dataset provided by
Worcester Polytechnic Institute
Cornell University
Stanford University
The Biota of North America Program (BONAP)
Harvard University
Authors
Dakota McCoy; Benjamin Goulet-Scott; Weilin Meng; Bulent Atahan; Hana Kiros; Misako Nishino; John Kartesz
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
United States
Description
Sustainable cities depend on urban forests. City trees -- a pillar of urban forests -- improve our health, clean the air, store CO2, and cool local temperatures. Comparatively less is known about urban forests as ecosystems, particularly their spatial composition, nativity statuses, biodiversity, and tree health. Here, we assembled and standardized a new dataset of N=5,660,237 trees from 63 of the largest US cities. The data comes from tree inventories conducted at the level of cities and/or neighborhoods. Each data sheet includes detailed information on tree location, species, nativity status (whether a tree species is naturally occurring or introduced), health, size, whether it is in a park or urban area, and more (comprising 28 standardized columns per datasheet). This dataset could be analyzed in combination with citizen-science datasets on bird, insect, or plant biodiversity; social and demographic data; or data on the physical environment. Urban forests offer a rare opportunity to intentionally design biodiverse, heterogenous, rich ecosystems. Methods See eLife manuscript for full details. Below, we provide a summary of how the dataset was collected and processed.

Data Acquisition We limited our search to the 150 largest cities in the USA (by census population). To acquire raw data on street tree communities, we used a search protocol on both Google and Google Datasets Search (https://datasetsearch.research.google.com/). We first searched the city name plus each of the following: street trees, city trees, tree inventory, urban forest, and urban canopy (all combinations totaled 20 searches per city, 10 each in Google and Google Datasets Search). We then read the first page of google results and the top 20 results from Google Datasets Search. If the same named city in the wrong state appeared in the results, we redid the 20 searches adding the state name. If no data were found, we contacted a relevant state official via email or phone with an inquiry about their street tree inventory. Datasheets were received and transformed to .csv format (if they were not already in that format). We received data on street trees from 64 cities. One city, El Paso, had data only in summary format and was therefore excluded from analyses.

Data Cleaning All code used is in the zipped folder Data S5 in the eLife publication. Before cleaning the data, we ensured that all reported trees for each city were located within the greater metropolitan area of the city (for certain inventories, many suburbs were reported - some within the greater metropolitan area, others not). First, we renamed all columns in the received .csv sheets, referring to the metadata and according to our standardized definitions (Table S4). To harmonize tree health and condition data across different cities, we inspected metadata from the tree inventories and converted all numeric scores to a descriptive scale including “excellent,” “good”, “fair”, “poor”, “dead”, and “dead/dying”. Some cities included only three points on this scale (e.g., “good”, “poor”, “dead/dying”) while others included five (e.g., “excellent,” “good”, “fair”, “poor”, “dead”). Second, we used pandas in Python (W. McKinney & Others, 2011) to correct typos, non-ASCII characters, variable spellings, date format, units used (we converted all units to metric), address issues, and common name format. In some cases, units were not specified for tree diameter at breast height (DBH) and tree height; we determined the units based on typical sizes for trees of a particular species. Wherever diameter was reported, we assumed it was DBH. We standardized health and condition data across cities, preserving the highest granularity available for each city. For our analysis, we converted this variable to a binary (see section Condition and Health). We created a column called “location_type” to label whether a given tree was growing in the built environment or in green space. All of the changes we made, and decision points, are preserved in Data S9. Third, we checked the scientific names reported using gnr_resolve in the R library taxize (Chamberlain & Szöcs, 2013), with the option Best_match_only set to TRUE (Data S9). Through an iterative process, we manually checked the results and corrected typos in the scientific names until all names were either a perfect match (n=1771 species) or partial match with threshold greater than 0.75 (n=453 species). BGS manually reviewed all partial matches to ensure that they were the correct species name, and then we programmatically corrected these partial matches (for example, Magnolia grandifolia-- which is not a species name of a known tree-- was corrected to Magnolia grandiflora, and Pheonix canariensus was corrected to its proper spelling of Phoenix canariensis). Because many of these tree inventories were crowd-sourced or generated in part through citizen science, such typos and misspellings are to be expected. Some tree inventories reported species by common names only. Therefore, our fourth step in data cleaning was to convert common names to scientific names. We generated a lookup table by summarizing all pairings of common and scientific names in the inventories for which both were reported. We manually reviewed the common to scientific name pairings, confirming that all were correct. Then we programmatically assigned scientific names to all common names (Data S9). Fifth, we assigned native status to each tree through reference to the Biota of North America Project (Kartesz, 2018), which has collected data on all native and non-native species occurrences throughout the US states. Specifically, we determined whether each tree species in a given city was native to that state, not native to that state, or that we did not have enough information to determine nativity (for cases where only the genus was known). Sixth, some cities reported only the street address but not latitude and longitude. For these cities, we used the OpenCageGeocoder (https://opencagedata.com/) to convert addresses to latitude and longitude coordinates (Data S9). OpenCageGeocoder leverages open data and is used by many academic institutions (see https://opencagedata.com/solutions/academia). Seventh, we trimmed each city dataset to include only the standardized columns we identified in Table S4. After each stage of data cleaning, we performed manual spot checking to identify any issues.
c
Data from: Population of Counties, Towns, and Cities in the United States,...
archive.ciser.cornell.edu
icpsr.umich.edu
+1more
Updated Jan 1, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Fishman (2020). Population of Counties, Towns, and Cities in the United States, 1850 and 1860 [Dataset]. http://doi.org/10.6077/gdqb-9f63
Explore at:
Unique identifier
https://doi.org/10.6077/gdqb-9f63
Dataset updated
Jan 1, 2020
Authors
Michael Fishman
Area covered
United States
Variables measured
GeographicUnit
Description
This data collection contains information about the population of each county, town, and city of the United States in 1850 and 1860. Specific variables include tabulations of white, black, and slave males and females, and aggregate population for each town. Foreign-born population, total population of each county, and centroid latitudes and longitudes of each county and state were also compiled. (Source: downloaded from ICPSR 7/13/10)

Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR -- https://doi.org/10.3886/ICPSR09424.v2. We highly recommend using the ICPSR version as they made this dataset available in multiple data formats.
H
Official USA Cities Simplified Roads Network
dataverse.harvard.edu
Updated Apr 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabien Pfaender (2017). Official USA Cities Simplified Roads Network [Dataset]. http://doi.org/10.7910/DVN/19UK7N
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/19UK7N
Dataset updated
Apr 6, 2017
Dataset provided by
Harvard Dataverse
Authors
Fabien Pfaender
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
Complete dataset of all 29,850 USA cities Roads network as a graph in the shp format. The extracts follow 2016 official USA cities boundaries. Graph are identified by their [city_code].shp. Cities code are provided by the Tiger Census Dataset. Graph have been created by extracting all openstreetmap.org (osm) maps for each USA Cityextracting the graph from osm extract using the policosm python github librarysimplifying the graph by removing all degree two nodes to retain only a workable transportation network. Original road length is retained as an attribute Nodes includes latitude and longitude attributes from WGS84 projection Edges includes length in meter (precision < 1m), tag:highway value from osm See policosm on github for more informations on extractions algorithm
d
NORTH AMERICA: Daily mobility data for cities, metro areas, districts,...
datarade.ai
.json, .csv
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CITYDATA.ai (2023). NORTH AMERICA: Daily mobility data for cities, metro areas, districts, provinces, and states [Dataset]. https://datarade.ai/data-products/north-america-daily-mobility-data-for-cities-metro-areas-d-citydata-ai
Explore at:
.json, .csvAvailable download formats
Dataset updated
Apr 20, 2023
Dataset provided by
CITYDATA.ai
Area covered
United States
Description
The datasets are split by census block, cities, counties, districts, provinces, and states. The typical dataset includes the below fields.

Column numbers, Data attribute, Description 1, device_id, hashed anonymized unique id per moving device 2, origin_geoid, geohash id of the origin grid cell 3, destination_geoid, geohash id of the destination grid cell 4, origin_lat, origin latitude with 4-to-5 decimal precision 5, origin_long, origin longitude with 4-to-5 decimal precision 6, destination_lat, destination latitude with 5-to-6 decimal precision 7, destination_lon, destination longitude with 5-to-6 decimal precision 8, start_timestamp, start timestamp / local time 9, end_timestamp, end timestamp / local time 10, origin_shape_zone, customer provided origin shape id, zone or census block id 11, destination_shape_zone, customer provided destination shape id, zone or census block id 12, trip_distance, inferred distance traveled in meters, as the crow flies 13, trip_duration, inferred duration of the trip in seconds 14, trip_speed, inferred speed of the trip in meters per second 15, hour_of_day, hour of day of trip start (0-23) 16, time_period, time period of trip start (morning, afternoon, evening, night) 17, day_of_week, day of week of trip start(mon, tue, wed, thu, fri, sat, sun) 18, year, year of trip start 19, iso_week, iso week of the trip 20, iso_week_start_date, start date of the iso week 21, iso_week_end_date, end date of the iso week 22, travel_mode, mode of travel (walking, driving, bicycling, etc) 23, trip_event, trip or segment events (start, route, end, start-end) 24, trip_id, trip identifier (unique for each batch of results) 25, origin_city_block_id, census block id for the trip origin point 26, destination_city_block_id, census block id for the trip destination point 27, origin_city_block_name, census block name for the trip origin point 28, destination_city_block_name, census block name for the trip destination point 29, trip_scaled_ratio, ratio used to scale up each trip, for example, a trip_scaled_ratio value of 10 means that 1 original trip was scaled up to 10 trips 30, route_geojson, geojson line representing trip route trajectory or geometry

The datasets can be processed and enhanced to also include places, POI visitation patterns, hour-of-day patterns, weekday patterns, weekend patterns, dwell time inferences, and macro movement trends.

The dataset is delivered as gzipped CSV archive files that are uploaded to your AWS s3 bucket upon request.
f
Latitude and longitude coordinates, population size, and mean baseline...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodolfo Acuna-Soto; Cécile Viboud; Gerardo Chowell (2023). Latitude and longitude coordinates, population size, and mean baseline pneumonia and influenza death rates for 66 large US reporting cities (1910–1920) with 100, 000 or more inhabitants [10]. [Dataset]. http://doi.org/10.1371/journal.pone.0023467.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0023467.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Rodolfo Acuna-Soto; Cécile Viboud; Gerardo Chowell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Latitude and longitude coordinates, population size, and mean baseline pneumonia and influenza death rates for 66 large US reporting cities (1910–1920) with 100, 000 or more inhabitants [10].
N
lat and long google maps
data.cityofnewyork.us
data.wu.ac.at
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Police Department (NYPD) (2025). lat and long google maps [Dataset]. https://data.cityofnewyork.us/Public-Safety/lat-and-long-google-maps/rjv2-9zvt
Explore at:
application/rdfxml, xml, application/rssxml, csv, tsv, application/geo+json, kml, kmzAvailable download formats
Dataset updated
Jul 15, 2025
Authors
Police Department (NYPD)
Description
This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) for all complete quarters so far this year (2017). For additional details, please see the attached data dictionary in the ‘About’ section.
N
Location
data.cityofnewyork.us
application/rdfxml +5
Updated Sep 26, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Technology and Innovation (OTI) (2019). Location [Dataset]. https://data.cityofnewyork.us/City-Government/Location/xyzy-cmkq
Explore at:
tsv, csv, xml, application/rssxml, application/rdfxml, jsonAvailable download formats
Dataset updated
Sep 26, 2019
Authors
Office of Technology and Innovation (OTI)
Description
Location of wifi hotspots in the city with basic descriptive information.
o
Geonames Cities with population > 5000
documentation-resources.opendatasoft.com
csv, excel, geojson +1
Updated Jan 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Geonames Cities with population > 5000 [Dataset]. https://documentation-resources.opendatasoft.com/explore/dataset/doc-geonames-cities-5000/
Explore at:
geojson, excel, csv, jsonAvailable download formats
Dataset updated
Jan 4, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Extract From geonames files. The data represents populated places with a population > 5000 inhabitants.Table information:geonameid : integer id of record in geonames database name : name of geographical point (utf8) varchar(200) asciiname : name of geographical point in plain ascii characters, varchar(200) alternatenames : alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000) latitude : latitude in decimal degrees (wgs84) longitude : longitude in decimal degrees (wgs84) feature class : see http://www.geonames.org/export/codes.html, char(1) feature code : see http://www.geonames.org/export/codes.html, varchar(10) country code : ISO-3166 2-letter country code, 2 characters cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters admin1 code : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20) admin2 code : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) admin3 code : code for third level administrative division, varchar(20) admin4 code : code for fourth level administrative division, varchar(20) population : bigint (8 byte int) elevation : in meters, integer dem : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat. timezone : the iana timezone id (see file timeZone.txt) varchar(40) modification date : date of last modification in yyyy-MM-dd format
Historical Hourly Weather Data 2012-2017
kaggle.com
Updated Dec 28, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 28, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
David Beniaguev
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Historical Hourly Weather Data

Who amongst us doesn't small talk about the weather every once in a while?
The goal of this dataset is to elevate this small talk to medium talk.

Just kidding, I actually originally decided to collect this dataset in order to demonstrate basic signal processing concepts, such as filtering, Fourier transform, auto-correlation, cross-correlation, etc..., (for a data analysis course I'm currently preparing).
I wanted to demonstrate these concepts on signals that we all have intimate familiarity with and hope that this way these concepts will be better understood than with just made up signals.

The weather is excellent for demonstrating these kinds of concepts as it contains periodic temporal structure with two very different periods (daily and yearly).

http://www.sciencehub4kids.com/wp-content/uploads/2015/08/The-four-seasons.jpg" alt="a nice 4 seasons image">

Content

The dataset contains ~5 years of high temporal resolution (hourly measurements) data of various weather attributes, such as temperature, humidity, air pressure, etc.
This data is available for 30 US and Canadian Cities, as well as 6 Israeli cities.
I've organized the data according to a common time axis for easy use.
Each attribute has it's own file and is organized such that the rows are the time axis (it's the same time axis for all files), and the columns are the different cities (it's the same city ordering for all files as well).
Additionally, for each city we also have the country, latitude and longitude information in a separate file.

Acknowledgements

The dataset was aquired using Weather API on the OpenWeatherMap website, and is available under the ODbL License.

Inspiration

Weather data is both intrinsically interesting, and also potentially useful when correlated with other types of data.
For example, Wildfire spread is potentially related to weather conditions, demand for cabs is famously known to be correlated with weather conditions (here, here and here you can find NYC cab ride data), and use of city bikes is probably also correlated with weather in interesting ways (check out this Austin dataset, this SF dataset, this Montreal dataset, and this NYC dataset).
Traffic is also probably related to weather.
Another potentially interesting source of correlation is between weather and crime. Here are a few crime datasets on kaggle of cities present in this weather dataset: Chicago, Philadelphia, Los Angeles, Vancouver, Austin, NYC

There are many other potentially interesting connections between everyday life and the weather that we can explore together with the help of this dataset. Have fun!
Global RainSIM - Version 1.0
catalog.data.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Global RainSIM - Version 1.0 [Dataset]. https://catalog.data.gov/dataset/global-rainsim-version-1-0-9d757
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The purpose of this tool is to estimate daily precipitation patterns for a yearly cycle at any location on the globe. The user input is simply the latitude and longitude of the selected location. There is an embedded Zip Code search routine to find the latitude and longitude for US cities. GlobalRainSIM forecasts the daily rainfall based upon two databases.The first was the average number of days in a month with precipitation (wet days) that were compiled and interpolated by Legates and Willmott (1990a and 1990b) with further improvements by Willmott and Matsuura (1995). The second database was the global average monthly precipitation data collected 1961-1990 and cross-validated by New et al. (1999). These two datasets were then used to establish the monthly precipitation totals and the frequency of precipitation in a month. The average precipitation event was calculated as the monthly mean divided by the number of wet days. This mean value was then randomly assigned to a day of the month looping through the number of wet days. In other words, if the average monthly rainfall was 10 mm/month with 5 average wet days, each rain event was 2 mm. This amount (2 mm) was then randomly assigned to 5 days of that month. The advantage of this tool is that a typical pattern of precipitation can be simulated for any global location arriving at an •average year• as a baseline case for comparison. This tool also outputs the daily rainfall as a file or can be easily embedded within another program. Resources in this dataset:Resource Title: Global RainSIM Verson 1.0. File Name: Web Page, url: https://www.ars.usda.gov/research/software/download/?softwareid=227&modecode=50-60-05-00 download page
US Earthquake Intensity Database
kaggle.com
zip
Updated Sep 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Savitar (2020). US Earthquake Intensity Database [Dataset]. https://www.kaggle.com/datasets/srijya/us-earthquake-intensity-database/metadata
Explore at:
zip(12219953 bytes)Available download formats
Dataset updated
Sep 15, 2020
Authors
Savitar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Content

This database is a collection of data over 23,000 US earthquakes. It contains data from the year 1638 to 1985. The digital database also includes information regarding epicentral coordinates, magnitudes, focal depths, names and coordinates of reporting cities (or localities), reported intensities, and the distance from the city (or locality) to the epicenter. The majority of felt reports are from the US but there is information about also some other countries such as Antigua and Barbuda, Canada, Mexico, Panama, and the Philippines.

Columns Description

Year Mo Da Hr Mn Sec The Date and Time are listed in Universal Coordinated Time and are Year, Month (Mo), Day (Da), Hour (Hr), Minute (Mn), Second (Sec)

UTC Conv Number of hours to subtract from the Date and Time given in Universal Coordinated Time to get local standard time for the epicenter. In general: 4 = 60 degree meridian (Atlantic Standard Time) 5 = 75 degree meridian (Eastern Standard Time) 6 = 90 degree meridian (Central Standard Time) 7 = 105 degree meridian (Mountain Standard Time) 8 = 120 degree meridian (Pacific Standard Time) 9 = 135 degree meridian (Alaska Standard Time) 10 = 150 degree meridian (Hawaii-Aleutian Standard Time)

U/G Unpublished or grouped intensity U = Intensity (MMI) assigned that was not listed in the source document. G = Intensity grouped I-III in the source document was reassigned intensity III.

EQ Lat / EQ Long This is the geographic latitude and longitude of the epicenter expressed as decimal numbers. The units are degrees. The latitude range is +4.0 to +69.0, where "+" designates North latitude (there are no South latitudes in the database). The longitude range is -179.0 to +180.0, where "-" designates West longitude and "+" designates East longitude. Most of the epicenters are West longitude (from -56 to -179), but a few epicenters in the Philippines and Aleutian Islands are East longitude (from +120 to +180).

Mag These are magnitudes as listed in United States Earthquakes, Earthquake History of the United States (either mb, MS, or ML), or the equivalent derived from intensities for pre-instrumental events. The magnitude is a measure of seismic energy. The magnitude scale is logarithmic. An increase of one in magnitude represents a tenfold increase in the recorded wave amplitude. However, the energy release associated with an increase of one in magnitude is not tenfold, but thirtyfold. For example, approximately 900 times more energy is released in an earthquake of magnitude 7 than in an earthquake of magnitude 5. Each increase in magnitude of one unit is equivalent to an increase of seismic energy of about 1,600,000,000,000 ergs.

Depth (km) Hypocentral Depth (positive downward) in kilometers from the surface.

Epi Dis Epicentral Distance in km that the reporting city (or locality) is located from the epicenter of the earthquake.

City Lat / City Long This is the geographic latitude and longitude of the city (or locality) where the Modified Mercalli Intensity was observed, expressed as decimal numbers. The units are degrees. The latitude range is +6.0 to +72.0, where "+" designates North latitude (there are no South latitudes in the database). The longitude range is -177.0 to +180.0, where "-" designates West longitude and "+" designates East longitude. Most of the reporting cities (or localities) are West longitude (from -29 to -177), but a few reporting cities (or localities) in the Philippines and Aleutian Islands are East longitude (from +119 to +180).

MMI Modified Mercalli Scale Intensity (MMI) is given in Roman Numerals. Values range from I to XII. (Roman Numerals were converted to numbers in the digital database. Values range from 1 to 12.) Macroseismic information is compiled from various sources including newspaper articles, foreign broadcasts, U.S. Geological Survey Earthquake reports and seismological station reports.

State Code Numerical i identifier for state, province, or country in which the earthquake was reported (felt) by residents: 01 Alabama 02 Alaska 03 Arizona 04 Arkansas 05 California 07 Colorado 08 Connecticut 09 Delaware 10 District of Columbia 11 Florida 12 Georgia 14 Hawaii 15 Idaho 16 Illinois 17 Indiana 18 Iowa 19 Kansas 20 Kentucky 21 Louisiana 22 Maine 23 Maryland 24 Massachusetts 25 Michigan 26 Minnesota 27 Mississippi 28 Missouri 29 Montana 30 Nebraska 31 Nevada 32 New Hampshire 33 New Jersey 34 New Mexico 35 New York 36 North Carolina 37 North Dakota 38 Ohio 39 Oklahoma 40 Oregon 41 Pennsylvania 42 Puerto Rico 43 Rhode Island 45 South Carolina 46 South Dakota 47 Tennessee 48 Texas 49 Utah 50 Vermont 51 Virginia 52 Virgin Islands 54 Washington 55 West Virginia 56 Wisconsin 57 Wyoming 58 West Indies 74 Panama 75 Philippine Is. 80 Mexico 81 Baja California 90 Canada 91 Alberta 92 Manitoba 93 Saskatchewan 94 British Columbia 95 Ontario 96 New Brunswick 97 Quebec 98 Nova Scotia 99 Yukon Territory City Name City (or locality) in which the earthquake was reported (felt) by residents.

Data Source This is a code referring to the source of one or more of the reported parameters (e.g., epicenter, city, and intensity). A = Source unknown; 1925 earthquake in Boston area (reports not listed in source H). B = Report by Bollinger and Stover, 1976. C = Quarterly Seismological Reports, 1925-27. D = Source unknown; 1937-1977 earthquakes in Hawaii, California, and the eastern U.S. H = Earthquake History of the United States (Coffman and others, 1982). K = Report by Carnegie Institution, 1908, 1910. M = Source unknown; 1899-1912 earthquakes in Alaska. N = Report by Nuttli, 1973. Q = Abstracts of Earthquake Reports for the United States, 1933-70. S = Unpublished report by Nina Scott, 1965. T = Source unknown; 1872-1904 earthquakes along U.S. west coast. U = United States Earthquakes, 1928-85.
W = Monthly Weather Service Seismological Reports, 1914-24.
Census of Population and Housing, 1990 [United States]: Public Law (P.L.)...
icpsr.umich.edu
ascii, sas, spss
Updated Jan 12, 2006
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States. Bureau of the Census (2006). Census of Population and Housing, 1990 [United States]: Public Law (P.L.) 94-171 Data [Dataset]. http://doi.org/10.3886/ICPSR09516.v1
Explore at:
ascii, spss, sasAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR09516.v1
Dataset updated
Jan 12, 2006
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States. Bureau of the Census
License
https://www.icpsr.umich.edu/web/ICPSR/studies/9516/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/9516/terms
Time period covered
1990
Area covered
Kansas, Washington, Missouri, Georgia, Iowa, Oklahoma, Indiana, Ohio, Mississippi, United States
Description
Public Law 94-171, enacted in 1975, requires the Census Bureau to provide redistricting data in a format requested by state governments. Within one year following the Decennial Census (by April 1, 1991), the Census Bureau must provide the governor and legislature of each state with the population data needed to redraw legislative districts. To meet this requirement, the Census Bureau established a voluntary program to allow states to receive data for voting districts (e.g., election precincts, city wards) in addition to standard census geographic areas such as counties, cities, census tracts, and blocks. These files contain data for voting districts for those counties for which a state outlined voting district boundaries around a set of census blocks on census maps, in accordance with the guidelines of the program. Each state file provides data for the state and its subareas in the following order: state, county, voting district, county subdivision, place, census tract, block group, and block. Additionally, complete summaries are provided for the following geographic areas: county subdivision, place, consolidated city, state portion of American Indian and Alaska Native area, and county portion of American Indian and Alaska Native area. Area characteristics such as land area, water area, latitude, and longitude are provided. Summary statistics are provided for all persons and housing units in the geographic areas. Counts by race and by Hispanic and non-Hispanic origin are also given.
B
ZipCodeWorld gold [United States] edition
borealisdata.ca
dataone.org
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hexasoft Development Sdn. Bhd.:Penang, Malaysia (2024). ZipCodeWorld gold [United States] edition [Dataset]. http://doi.org/10.5683/SP3/KKP7TP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/KKP7TP
Dataset updated
Sep 16, 2024
Dataset provided by
Borealis
Authors
Hexasoft Development Sdn. Bhd.:Penang, Malaysia
License
https://borealisdata.ca/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.5683/SP3/KKP7TPhttps://borealisdata.ca/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.5683/SP3/KKP7TP
Time period covered
2000
Area covered
United States
Description
The database includes ZIP code, city name, alias city name, state code, phone area code, city type, county name, country FIPS, time zone, day light saving flag, latitude, longitude, county elevation, Metropolitan Statistical Area (MSA), Primary Metropolitan Statistical Area (PMSA), Core Based Statistical Area (CBSA) and census 2000 data on population by race, average household income, and average house value.
g
Agency for Toxic Substances and Disease Registry(ATSDR), Hazardous Waste...
geocommons.com
Updated May 13, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). Agency for Toxic Substances and Disease Registry(ATSDR), Hazardous Waste Sites, USA and Territories, 5.08 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
May 13, 2008
Dataset provided by
Agency for Toxic Substances and Disease Registry(ATSDR)
data
Description
This dataset displays all the hazardous waste sites in the United States and it's Territories as of 5.08. The data comes from the Agency for Toxic Substances and Disease Registry(ATSDR). The dataset contains information about the site: Site ID Site Name CERCLIS # Address City State County Latitude Longitude Population Region # Congressional Districts Federal Facility National Priorities List Status Ownership Status Classification For more information go to the Agency for Toxic Substances and Disease Registry(ATSDR)website at http://www.atsdr.cdc.gov
a
Data from: State Capital
nifc.hub.arcgis.com
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Interagency Fire Center (2024). State Capital [Dataset]. https://nifc.hub.arcgis.com/datasets/9d4e126a2da743e9bcac9e5145c08f06
Explore at:
Dataset updated
Jul 12, 2024
Dataset authored and provided by
National Interagency Fire Center
Area covered

Description
This Wyoming Cities coverage contains data for 109 Wyoming towns, cities, and Census Designated Areas. The coverage was created from U.S. Census Bureau Tiger Data. It contains many of the same attributes as the Census Bureau .dbf files, but there are a few modifications. A code has been added to distinguish towns, cities, and CDPs. Also a Countyseat item has been added to provide a way to display the County Seats only. Latitude/Longitude coordinates were also converted from the Census Bureau format to be used in GENERATE.
H
Background database for GeoAPEXOL US
dataverse.harvard.edu
search.dataone.org
Updated Oct 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qingyu Feng (2019). Background database for GeoAPEXOL US [Dataset]. http://doi.org/10.7910/DVN/2CJW0M
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/2CJW0M
Dataset updated
Oct 19, 2019
Dataset provided by
Harvard Dataverse
Authors
Qingyu Feng
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes all tables which can be imported to the SQL database under the GeoAPEXOL interface. The dataset includes location (zipcode, city, state, latitude, longitude), soil (all variables required by setting up APEX model and the values were extracted from the SSURGO database for the US ), climate (datasets for over 2700 stations across the US from 1974 to 2013, including daily observed values for precipitation and temperature as well as monthly statistics values for cligen running.), management (template management files for normal agricultural operations and some non-structural BMP parameters. ) Currently, the datasets are prepared for the contiguous US.
Vehicular Simulation Dataset (VEINS OMNeT++)
kaggle.com
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The citation is currently not available for this dataset.
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 24, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rana Tariq
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This dataset is the output of an advanced vehicular simulation performed using the VEINS OMNeT++ framework. The simulation was designed to model realistic vehicle dynamics, sensor readings, and communication parameters over a network of urban roadways connecting major US cities. The dataset provides a comprehensive view of vehicular performance and environmental conditions over a 30-day period for 500 vehicles.

Authors: Muhammad Ali & Tariq Qayyum

Below is a detailed description of each feature included in the dataset:

Temporal and Geospatial Data

VehicleID: A unique identifier for each vehicle in the simulation. Timestamp: The exact date and time for each record, allowing temporal analysis of vehicle behavior. Day, Hour, Minute, Second: These fields break down the timestamp into finer granularity for detailed time-series analysis. Latitude and Longitude: The real-time geographic coordinates of the vehicle as it progresses along its route. StartingPointLatitude and StartingPointLongitude: The coordinates of the vehicle's origin (selected from major US cities). DestinationLatitude and DestinationLongitude: The target coordinates where the vehicle is headed. Mobility and Performance Metrics

Speed: The current speed of the vehicle (in km/h), adjusted based on simulated traffic, weather conditions, and road quality. Direction: The vehicle's heading in degrees, indicating its travel direction. Odometer: The cumulative distance traveled by the vehicle during the simulation. VehicleType: Categorical variable indicating the type of vehicle (Car, Truck, Bus, Motorcycle). VehicleAge: The age of the vehicle, measured in years. EngineTemperature: The simulated engine temperature (in °C), which responds dynamically to vehicle speed. FuelLevel: The remaining fuel level, tracking consumption over distance traveled. BatteryLevel: The current battery level, which decreases based on speed and usage. TirePressure, BrakeFluidLevel, CoolantLevel, OilLevel, WiperFluidLevel: These parameters simulate vehicle maintenance metrics, reflecting system status during operation. Environmental and Traffic Conditions

WeatherCondition: The prevailing weather conditions (Clear, Rain, Fog, or Snow) set for each simulation day. RoadCondition: A categorical assessment of road quality (Good, Moderate, or Poor). TrafficDensity: The simulated traffic level at each time step (Low, Medium, or High), affecting vehicle speed and behavior. Communication and Processing Attributes

CPU_Available and Memory_Available: Simulated computational resource levels that might impact on-board processing and task execution. NetworkLatency: The simulated network latency in milliseconds, indicating communication delays. SignalStrength: The wireless signal strength measured in dBm. GPSStatus, WiFiStatus, BluetoothStatus, CellularStatus: Boolean flags indicating the operational status of various communication systems on the vehicle. RadarStatus, LidarStatus, CameraStatus, IMUStatus: Status indicators for the vehicle’s sensor suite, providing situational awareness. TaskType: The category of a computational or communication task being executed (Navigation, Entertainment, DataAnalysis, or Safety). TaskSize: A numeric value representing the computational or data size requirement of the task. TaskPriority: The priority level of the task (High, Medium, or Low), potentially affecting its execution order. TaskOffloaded: A boolean flag indicating whether a task was offloaded, influenced by the vehicle’s battery level and simulated decision-making. Additional Vehicle Features

HeadlightStatus: A boolean indicating if the headlights are on, based on the time of day and weather conditions. BrakeLightStatus: Indicates whether brake lights are activated when the vehicle is decelerating. TurnSignalStatus: A boolean showing if the vehicle’s turn signal is active. HazardLightStatus: Indicates the status of the hazard lights. ABSStatus: Reflects the functioning status of the Anti-lock Braking System. AirbagStatus: Indicates whether the airbag system is active (typically remains enabled during the simulation). Simulation Environment

The dataset encapsulates a realistic simulation of vehicular movement and behavior in urban settings. Key aspects include:

Dynamic Speed Adjustment: Vehicles adjust speeds based on time-of-day (e.g., rush hour), weather impacts, and traffic density. Resource Management: The simulation models the gradual depletion of battery, fuel, and other vehicular maintenance parameters. Sensor and Communication Systems: Real-time statuses of sensor suites and communication modules are recorded, emulating modern connected vehicles. Route Progression: Each vehicle's journey is tracked from a starting city to a destination, with geospatial progression computed using a distance estimation formula. Potential Applications

Researchers and practitioners can utilize thi...

Facebook

Twitter

Click to copy link

Link copied

Cite

Work With Data (2024). Dataset of cities in the United States [Dataset]. https://www.workwithdata.com/datasets/cities?f=1&fcol0=country&fop0=%3D&fval0=United+States

Dataset of cities in the United States

Explore at:

Dataset updated

Nov 7, 2024

Dataset authored and provided by

Work With Data

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United States

Description

This dataset is about cities in the United States. It has 4,171 rows. It features 7 columns including country, population, latitude, and longitude.

Clear search

Close search

Google apps

Main menu

Dataset of cities in the United States

World cities database

US ZIP codes to longitude and latitude

Abstract

Documentation

A dataset of 5 million city trees from 63 US cities: species, location,...

Data from: Population of Counties, Towns, and Cities in the United States,...

Official USA Cities Simplified Roads Network

NORTH AMERICA: Daily mobility data for cities, metro areas, districts,...

Latitude and longitude coordinates, population size, and mean baseline...

lat and long google maps

Location

Geonames Cities with population > 5000

Historical Hourly Weather Data 2012-2017

Historical Hourly Weather Data

Content

Acknowledgements

Inspiration

Global RainSIM - Version 1.0

US Earthquake Intensity Database

Content

Columns Description

Census of Population and Housing, 1990 [United States]: Public Law (P.L.)...

ZipCodeWorld gold [United States] edition

Agency for Toxic Substances and Disease Registry(ATSDR), Hazardous Waste...

Data from: State Capital

Background database for GeoAPEXOL US

Vehicular Simulation Dataset (VEINS OMNeT++)

Dataset of cities in the United StatesSee More Versions

Dataset of cities in the United States