Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
Facebook
TwitterTABLE III. Deaths in 122 U.S. cities â 2016. 122 Cities Mortality Reporting System â Each week, the vital statistics offices of 122 cities across the United States report the total number of death certificates processed and the number of those for which pneumonia or influenza was listed as the underlying or contributing cause of death by age group (Under 28 days, 28 days â1 year, 1-14 years, 15-24 years, 25-44 years, 45-64 years, 65-74 years, 75-84 years, and â„ 85 years). FOOTNOTE: U: Unavailable. â: No reported cases. * Mortality data in this table are voluntarily reported from 122 cities in the United States, most of which have populations of 100,000 or more. A death is reported by the place of its occurrence and by the week that the death certificate was filed. Fetal deaths are not included. â Pneumonia and influenza. § Total includes unknown ages.
Facebook
TwitterVITAL SIGNS INDICATOR Housing Permits (LU3)
FULL MEASURE NAME Permitted housing units
LAST UPDATED October 2019
DESCRIPTION Housing growth is measured in terms of the number of units that local jurisdictions permit throughout a given year. A permitted unit is a unit that a city or county has authorized for construction.
DATA SOURCE Construction Industry Research Board Table 3: Residential Units and Valuation (1967-2010) No link available
California Housing Foundation/Construction Industry Research Board California Construction Trends (2011-2013) http://www.mychf.org/cirb/
Association of Bay Area Governments (ABAG) â Metropolitan Transportation Commission (MTC) Housing Permits Database (2014-2017) http://opendata.mtc.ca.gov
CONTACT INFORMATION vitalsigns.info@bayareametro.gov
METHODOLOGY NOTES (across all datasets for this indicator) Bay Area housing permits data prior to 2014 comes from the California Housing Foundation/Construction Industry Research Board. Data from 2014 to 2017 comes from the Association of Bay Area Governments (ABAG) â Metropolitan Transportation Commission (MTC) Housing Permits Database.
Single-family housing units include detached, semi-detached, row house and town house units. Row houses and town houses are included as single-family units when each unit is separated from the adjacent unit by an unbroken ground-to-roof party or fire wall. Condominiums are included as single-family units when they are of zero-lot-line or zero-property-line construction; when units are separated by an air space; or, when units are separated by an unbroken ground-to-roof party or fire wall. Multi-family housing includes duplexes, three-to-four-unit structures and apartment-type structures with five units or more. Multi-family also includes condominium units in structures of more than one living unit that do not meet the single-family housing definition. In the permits data from 2014 to 2017, single-family units include all units not strictly classified as multi-family. This may include secondary units.
Each multi-family unit is counted separately even though they may be in the same building. Total units is the sum of single-family and multi-family units. County data is available from 1967 whereas city data is available from 1990. City data is only available for incorporated cities and towns. All permits in unincorporated cities and towns are included under their respective countyâs unincorporated total. Permit data is not available for years when the city or town was not incorporated.
Affordable housing is the total number of permitted units affordable to low and very low income households. Housing affordable to very low income households are households making below 50% of the area median income. Housing affordable to low income households are households making between 50% and 80% of the area median income. Housing affordable to moderate income households are households making below 80% and 120% of the area median income. Housing affordable to above moderate income households are households making above 120% of the area median income.
Permit data is missing for the following cities and years: Clayton, 1990-2007 Lafayette, 1990-2007 Moraga, 1990-2007 Orinda, 1990-2007 San Ramon, 1990
Building permit data for metropolitan areas for each year is the sum of non-seasonally adjusted monthly estimates from the Building Permit Survey. The Bay Area values are the sum of the San Francisco-Oakland-Hayward MSA and the San Jose-Sunnyvale-Santa Clara MSA. The counties included in these areas are: San Francisco, Marin, Contra Costa, Alameda, San Mateo, Santa Clara, and San Benito.
Permit values reflect the number of units permitted in each respective year.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset reflects incidents of crime in the City of Los Angeles dating back to 2020. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. The dataset contains 2,083,227 rows and 29 columns.
| Column name | Description |
|---|---|
| DR_NO | Division of Records Number: Official file number made up of a 2 digit year, area ID, and 5 digits |
| Date Rptd | MM/DD/YYYY |
| DATE OCC | MM/DD/YYYY |
| TIME OCC | In 24 hour military time. |
| AREA | The LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21. |
| AREA NAME | The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. |
| Crm Cd | Indicates the crime committed. (Same as Crime Code 1) |
| Crm Cd Desc | Defines the Crime Code provided. |
| Mocodes | Modus Operandi: Activities associated with the suspect in commission of the crime |
| Vict Age | Victim age |
| Vict Sex | F - Female M - Male X - Unknown |
| Vict Descent | Descent Code: A - Other Asian B - Black C - Chinese D - Cambodian F - Filipino G - Guamanian H - Hispanic/Latin/Mexican I - American Indian/Alaskan Native J - Japanese K - Korean L - Laotian O - Other P - Pacific Islander S - Samoan U - Hawaiian V - Vietnamese W - White X - Unknown Z - Asian Indian |
| Premis Cd | The type of structure, vehicle, or location where the crime took place. |
| Premis Desc | Defines the Premise Code provided |
| Weapon Used Cd | The type of weapon used in the crime. |
| Weapon Desc | Defines the Weapon Used Code provided. |
| Status | Status of the case. (IC is the default) |
| Status Desc | Defines the Status Code provided. |
| Crm Cd 1 | Indicates the crime committed. Crime Code 1 is the primary and most serious one. Crime Code 2, 3, and 4 are respectively less serious offenses. Lower crime class numbers are more serious. |
| Crm Cd 2 | May contain a code for an additional crime, less serious than Crime Code 1. |
| Crm Cd 3 | May contain a code for an additional crime, less serious than Crime Code 1 |
| Crm Cd 4 | May contain a code for an additional crime, less serious than Crime Code 1. |
| LOCATION | Street address of crime incident rounded to the nearest hundred block to maintain anonymity. |
| Cross Street | Cross Street of rounded Address. |
| LAT | Latitude |
| LON | Longtitude |
Facebook
TwitterThis dataset contains aggregate data on violent index victimizations at the quarter level of each year (i.e., January â March, April â June, July â September, October â December), from 2001 to the present (1991 to present for Homicides), with a focus on those related to gun violence. Index crimes are 10 crime types selected by the FBI (codes 1-4) for special focus due to their seriousness and frequency. This dataset includes only those index crimes that involve bodily harm or the threat of bodily harm and are reported to the Chicago Police Department (CPD). Each row is aggregated up to victimization type, age group, sex, race, and whether the victimization was domestic-related. Aggregating at the quarter level provides large enough blocks of incidents to protect anonymity while allowing the end user to observe inter-year and intra-year variation. Any row where there were fewer than three incidents during a given quarter has been deleted to help prevent re-identification of victims. For example, if there were three domestic criminal sexual assaults during January to March 2020, all victims associated with those incidents have been removed from this dataset. Human trafficking victimizations have been aggregated separately due to the extremely small number of victimizations.
This dataset includes a " GUNSHOT_INJURY_I " column to indicate whether the victimization involved a shooting, showing either Yes ("Y"), No ("N"), or Unknown ("UKNOWN.") For homicides, injury descriptions are available dating back to 1991, so the "shooting" column will read either "Y" or "N" to indicate whether the homicide was a fatal shooting or not. For non-fatal shootings, data is only available as of 2010. As a result, for any non-fatal shootings that occurred from 2010 to the present, the shooting column will read as âY.â Non-fatal shooting victims will not be included in this dataset prior to 2010; they will be included in the authorized dataset, but with "UNKNOWN" in the shooting column.
The dataset is refreshed daily, but excludes the most recent complete day to allow CPD time to gather the best available information. Each time the dataset is refreshed, records can change as CPD learns more about each victimization, especially those victimizations that are most recent. The data on the Mayor's Office Violence Reduction Dashboard is updated daily with an approximately 48-hour lag. As cases are passed from the initial reporting officer to the investigating detectives, some recorded data about incidents and victimizations may change once additional information arises. Regularly updated datasets on the City's public portal may change to reflect new or corrected information.
How does this dataset classify victims?
The methodology by which this dataset classifies victims of violent crime differs by victimization type:
Homicide and non-fatal shooting victims: A victimization is considered a homicide victimization or non-fatal shooting victimization depending on its presence in CPD's homicide victims data table or its shooting victims data table. A victimization is considered a homicide only if it is present in CPD's homicide data table, while a victimization is considered a non-fatal shooting only if it is present in CPD's shooting data tables and absent from CPD's homicide data table.
To determine the IUCR code of homicide and non-fatal shooting victimizations, we defer to the incident IUCR code available in CPD's Crimes, 2001-present dataset (available on the City's open data portal). If the IUCR code in CPD's Crimes dataset is inconsistent with the homicide/non-fatal shooting categorization, we defer to CPD's Victims dataset.
For a criminal homicide, the only sensible IUCR codes are 0110 (first-degree murder) or 0130 (second-degree murder). For a non-fatal shooting, a sensible IUCR code must signify a criminal sexual assault, a robbery, or, most commonly, an aggravated battery. In rare instances, the IUCR code in CPD's Crimes and Victims dataset do not align with the homicide/non-fatal shooting categorization:
Other violent crime victims: For other violent crime types, we refer to the IUCR classification that exists in CPD's victim table, with only one exception:
Note: All businesses identified as victims in CPD data have been removed from this dataset.
Note: The definition of âhomicideâ (shooting or otherwise) does not include justifiable homicide or involuntary manslaughter. This dataset also excludes any cases that CPD considers to be âunfoundedâ or ânoncriminal.â
Note: In some instances, the police department's raw incident-level data and victim-level data that were inputs into this dataset do not align on the type of crime that occurred. In those instances, this dataset attempts to correct mismatches between incident and victim specific crime types. When it is not possible to determine which victims are associated with the most recent crime determination, the dataset will show empty cells in the respective demographic fields (age, sex, race, etc.).
Note: The initial reporting officer usually asks victims to report demographic data. If victims are unable to recall, the reporting officer will use their best judgment. âUnknownâ can be reported if it is truly unknown.
Facebook
TwitterThis table contains 135864 series, with data for years 2000/2002 - 2010/2012 (not all combinations necessarily have data for all years), and was last released on 2016-05-24. This table contains data described by the following dimensions (Not all combinations are available): Geography (148 items: Canada; Newfoundland and Labrador; Eastern Regional Integrated Health Authority, Newfoundland and Labrador; Central Regional Integrated Health Authority, Newfoundland and Labrador; ...), Sex (3 items: Both sexes; Males; Females), Indicators (2 items: Mortality; Potential years of life lost), Selected causes of death (ICD-10) (17 items: Total, all causes of death; All malignant neoplasms (cancers); Colorectal cancer; Lung cancer; ...), Characteristics (9 items: Number; Low 95% confidence interval, number; High 95% confidence interval, number; Rate; ...).
Facebook
TwitterThis table contains data from the 2016 City Property Tax Report--Appendix A. Data is from the Department of Revenue Property Tax Statistics Supplemental Report. An empty cell means missing information. 13 cities do not have a permanent property tax rate. *Denotes rates for urban renewal in "Other" category. ** Portland is the only city with GAP bond--$2.6671/thousand which is not in the table, but included in the "Total City Rate" column. ***In some instances cities were contacted to verify data.
Facebook
TwitterThis dataset shows new sidewalk added to the City of Austin's network by calendar year. Data in this table is limited to the full purpose jurisdiction of the City of Austin as of publication date. Sidewalk construction comes from many sources, including but not limited to the City of Austin, counties, state agencies, and private developers. Existing data does not support separating out city and non-city construction. This dataset supports the SD23 performance measure M.C.6a: Percent of missing sidewalks completed. Detailed sidewalk segment data is available in the dataset Strategic Measure_Sidewalk Segment Data. View more details and insights related to this data set on the story page: https://data.austintexas.gov/stories/s/Percentage-of-Missing-Sidewalk-Network-Completed/ffkw-wkiv/
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset provides daily historical weather data for major cities around the world â including all national capitals and large population centers â from 1995 to 2024
Data is sourced from the Open-Meteo Historical Weather API and Wikidata, processed and harmonized for easy analysis and visualization.
| File | Description |
|---|---|
cities_clean.parquet | Metadata of all selected cities (country, coordinates, population, capital flag). |
history.parquet | Full daily dataset (one row per city Ă day). |
history_latest.csv | Snapshot of the most recent day available. |
âMajor citiesâ are defined as:
Coordinates (lat, lon) and country ISO codes come from Wikidataâs structured data.
Population values are used only for ranking and filtering.
Daily values from Open-Meteoâs ERA5-based reanalysis:
| Variable | Unit | Description |
|---|---|---|
temp_max_c, temp_min_c | °C | Maximum / minimum 2 m air temperature |
temp_mean_c_approx | °C | Approximate daily mean ((max+min)/2) |
app_temp_max_c, app_temp_min_c | °C | Apparent (feels-like) temperature |
precip_mm, rain_mm, snow_mm | mm | Total precipitation, rain, snowfall |
windspeed_10m_max_kmh, windgusts_10m_max_kmh | km/h | Maximum daily windspeed / gusts |
wind_dir_dom_deg | ° | Dominant wind direction |
sunshine_duration_s, daylight_duration_s | s | Total sunshine / daylight duration |
shortwave_radiation_MJ_m2 | MJ/mÂČ | Daily sum of incoming shortwave radiation |
All timestamps are daily aggregates in UTC.
null. Data is available under CC BY 4.0.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">
This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.
| Feature | Description | Range |
|---|---|---|
| 10 Features | Economic, environmental & social indicators | Realistically scaled |
| 300 Cities | Europe, Asia, Americas, Africa, Oceania | Diverse distributions |
| Strong Correlations | Income â Rent (+0.8), Density â Pollution (+0.6) | ML-ready |
| No Missing Values | Clean, preprocessed data | Ready for analysis |
| 4-5 Natural Clusters | Metropolitan hubs, eco-towns, developing centers | Pre-validated |
â
Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
â
Regional Diversity: Each region has distinct economic and environmental characteristics
â
Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
â
Beginner-Friendly: No data cleaning required, includes example code
â
Documented: Comprehensive README with methodology and use cases
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)
# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)
# Analyze
print(df.groupby('cluster').mean())
After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics
| Cluster | Characteristics | Example Cities |
|---|---|---|
| Metropolitan Tech Hubs | High income, density, rent | Silicon Valley, Singapore |
| Eco-Friendly Towns | Low density, clean air, high happiness | Nordic cities |
| Developing Centers | Mid income, high density, poor air | Emerging markets |
| Low-Income Suburban | Low infrastructure, income | Rural areas |
| Industrial Mega-Cities | Very high density, pollution | Manufacturing hubs |
Unlike random synthetic data, this dataset was carefully engineered with: - âš Realistic correlation structures based on urban research - đ Regional characteristics matching real-world patterns - đŻ Optimal cluster separability (validated via silhouette scores) - đ Comprehensive documentation and starter code
â Learn clustering without data cleaning hassles
â Practice PCA and dimensionality reduction
â Create beautiful geographic visualizations
â Understand feature correlation in real-world contexts
â Build a portfolio project with clear business insights
This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.
Happy Clustering! đ
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
đ Overview
This dataset provides hourly air-quality measurements for 50 major global cities over a continuous 15-day period, including pollutant concentrations, meteorological conditions, geographical metadata, and an engineered AQI index.
All values are synthetically generated using historically consistent pollutant patterns and statistical ranges, allowing researchers and ML practitioners to work with realistic air-quality trends without licensing restrictions or data-collection barriers.
This dataset is ideal for time-series modeling, forecasting, environmental analytics, and machine-learning experimentation.
đ§ Cities Included
Covers all major regions:
North America â New York, Los Angeles, Toronto
Europe â London, Paris, Berlin, Zurich
Asia â Delhi, Tokyo, Seoul, Beijing, Singapore
Middle East â Dubai, Riyadh, Doha
Africa â Lagos, Cairo, Nairobi
Oceania â Sydney, Melbourne, Auckland
South America â SĂŁo Paulo, Buenos Aires
đ§± Dataset Structure
Each hourly record includes:
Air Pollutants
PM2.5 (”g/m³)
PM10 (”g/m³)
NOâ (ppb)
SOâ (ppb)
Oâ (ppb)
CO (ppm)
Weather Features
Temperature (°C)
Humidity (%)
Wind Speed (m/s)
Location Metadata
City
Country
Latitude
Longitude
Other
Timestamp (ISO-8601)
AQI (Computed index)
đ§č Data Quality & Formatting
No missing values â 100% complete
Numeric values rounded to 3 decimals
Clean column names (snake_case)
Consistent hourly frequency
Fully ML-ready
đ Example Use Cases
â AQI forecasting (LSTM, GRU, Transformers) â Multivariate time-series modeling â Clustering cities by pollution patterns â Environmental trend visualization â Weatherâpollution correlation studies â Anomaly detection (peak pollution events)
| Column | Description | Unit | Type |
|---|---|---|---|
| timestamp | Hourly timestamp (UTC) | â | datetime |
| city | City name | â | string |
| country | Country name | â | string |
| latitude | City latitude | ° | float |
| longitude | City longitude | ° | float |
| pm25 | Fine particulate matter | ”g/m³ | float |
| pm10 | Coarse particulate matter | ”g/m³ | float |
| no2 | Nitrogen dioxide | ppb | float |
| so2 | Sulfur dioxide | ppb | float |
| o3 | Ozone | ppb | float |
| co | Carbon monoxide | ppm | float |
| temperature | Ambient temperature | °C | float |
| humidity | Relative humidity | % | float |
| wind_speed | Wind speed | m/s | float |
| aqi | Derived Air Quality Index | â | int |
đ§Ș Data Generation Method (Provenance)
This dataset is synthetically generated using realistic pollutant behavior patterns based on historical studies and open-source environmental datasets.
Modeling steps included:
City-specific pollutant baseline ranges
Randomized variation using Gaussian noise
Temporal patterns using sinusoidal diurnal cycles (morning & evening peaks)
Weather-pollution correlation rules (e.g., low wind â higher PM)
AQI computed using standard US-EPA breakpoints
All numeric values standardized to 3-decimal precision
This ensures that although synthetic, the dataset follows realistic environmental dynamics.
đ File Information
global_air_quality_50_cities.csv
Rows: 18,000+
Columns: 16
Format: UTF-8 CSV
Facebook
TwitterThis table contains 33048 series, with data for years 2000/2002 - 2010/2012 (not all combinations necessarily have data for all years), and was last released on 2016-03-16. This table contains data described by the following dimensions (Not all combinations are available): Geography (36 items: Total, census metropolitan areas; St. John's, Newfoundland and Labrador; Halifax, Nova Scotia;Moncton, New Brunswick; ...), Sex (3 items: Both sexes; Males; Females), Indicators (2 items: Mortality; Potential years of life lost), Selected causes of death (ICD-10) (17 items: Total, all causes of death; All malignant neoplasms (cancers); Colorectal cancer; Lung cancer; ...), Characteristics (9 items: Number; Low 95% confidence interval, number; High 95% confidence interval, number; Rate; ...).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Foreign born excludes people born outside the United States to a parent who is a U.S. citizen..Tables for Workplace Geography are only available for States; Counties; Places; County Subdivisions in selected states (CT, ME, MA, MI, MN, NH, NJ, NY, PA, RI, VT, WI); Combined Statistical Areas; Metropolitan and Micropolitan Statistical Areas, and their associated Metropolitan Divisions and Principal Cities; Combined New England City and Town Areas; New England City and Town Areas, and their associated Divisions and Principal Cities. Tables B08601, B08602, B08603, and B08604 are also available for Place parts and County Subdivision parts for the 5-year ACS datasets..Workers include members of the Armed Forces and civilians who were at work last week..Industry titles and their 4-digit codes are based on the North American Industry Classification System (NAICS). The Census industry codes for 2018 and later years are based on the 2017 revision of the NAICS. To allow for the creation of multiyear tables, industry data in the multiyear files (prior to data year 2018) were recoded to the 2017 Census industry codes. We recommend using caution when comparing data coded using 2017 Census industry codes with data coded using Census industry codes prior to data year 2018. For more information on the Census industry code changes, please visit our website at https://www.census.gov/topics/employment/industry-occupation/guidance/code-lists.html..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..2019 ACS data products include updates to several categories of the existing means of transportation question. For more information, see: Change to Means of Transportation..Occupation titles and their 4-digit codes are based on the Standard Occupational Classification (SOC). The Census occupation codes for 2018 and later years are based on the 2018 revision of the SOC. To allow for the creation of the multiyear tables, occupation data in the multiyear files (prior to data year 2018) were recoded to the 2018 Census occupation codes. We recommend using caution when comparing data coded using 2018 Census occupation codes with data coded using Census occupation codes prior to data year 2018. For more information on the Census occupation code changes, please visit our website at https://www.census.gov/topics/employment /industry-occupation/guidance/code-lists.html..In 2019, methodological changes were made to the class of worker question. These changes involved modifications to the question wording, the category wording, and the visual format of the categories on the questionnaire. The format for the class of worker categories are now listed under the headings "Private Sector Employee," "Government Employee," and "Self-Employed or Other." Additionally, the category of Active Duty was added as one of the response categories under the "Government Employee" section for the mail questionnaire. For more detailed info...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CitiesGOER is a database that provides environmental data for 52,602 cities and 48 environmental variables, including 38 bioclimatic variables, 8 soil variables and 2 topographic variables. Data were extracted from the same 30 arc-seconds global grid layers that were prepared when making the TreeGOER (Tree Globally Observed Environmental Ranges) database that is available from https://doi.org/10.5281/zenodo.7922927. Details on the preparations of these layers are provided by Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology 29: 6303â6318. https://onlinelibrary.wiley.com/doi/10.1111/gcb.16914. CitiesGOER was designed to be used together with TreeGOER and possibly also with the GlobalUsefulNativeTrees database (Kindt et al. 2023) to allow users to filter suitable tree species based on environmental conditions of the planting site.
The identities and coordinates of cities were sourced from a data set with information for cities with a population size larger than 1000 that was created by Opendatasoft and made available from https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/table/?disjunctive.cou_name_en&sort=name. The data was downloaded on 22-JULY-2023 and afterwards filtered for cities with a population of 5000 or above. Cities where information on the country was missing were removed. The coordinates of cities were used to extract the environmental data via the terra package (Hijmans et al. 2022, version 1.6-47) in the R 4.2.1 environment.
Version 2023.08 provided median values from 23 Global Climate Models (GCMs) for Shared Socio-Economic Pathway (SSP) 1-2.6 and from 18 GCMs for SSP 3-7.0, both for the 2050s (2041-2060). Similar methods were used to calculate these median values as in the case studies for the TreeGOER manuscript (calculations were partially done via the BiodiversityR::ensemble.envirem.run function and with downscaled bioclimatic and monthly climate 2.5 arc-minutes future grid layers available from WorldClim 2.1).
Version 2023.09 used similar methods as for previous versions to provide median values from 13 GCMs for the 2090s (2081-2100) for SSP 5-8.5.
The locations of the 52,602 cities are mapped in one of the series available from the TreeGOER Global Zones atlas that can be obtained from https://doi.org/10.5281/zenodo.8252756.
Version 2024.10 includes a new data set that documents the location of the city locations in Holdridge Life Zones. Information is given for historical (1901-1920), contemporary (1979-2013) and future (2061-2080; separately for RCP 4.5 and RCP 8.5) climates inferred from global raster layers that are available for download from DRYAD and were created for the following article: Elsen et al. 2022. Accelerated shifts in terrestrial life zones under rapid climate change. Global Change Biology, 28, 918â935. https://doi.org/10.1111/gcb.15962. Version 2024.10 further includes Holdridge Life Zones for the climates that were available from the previous versions, calculating biotemperatures and life zones with similar methods as used by Holdridge (1947; 1967) and Elsen et al. (2022) (for future climates, median values were determined first for monthly maximum and minimum temperatures across GCMs ). The distributions of the 48,129 species documented in TreeGOER across the Holdridge Life Zones are given in this Zenodo archive: https://zenodo.org/records/14020914.
Version 2024.11 includes a new data set that documents the location of the city locations in Köppen-Geiger climate zones. Information is given for historical (1901-1930, 1931-1960, 1961-1990) and future (2041-2070 and 2071-2099) climates, with for the future climates seven scenarios each (SSP 1-1.9, SSP 1-2.6, SSP 2-4.5, SSP 3-7.0, SSP 4-3.4, SSP 4-6.0 and SSP 5-8.5). This data set was created from 30 arc-second raster layers available via: Beck, H.E., McVicar, T.R., Vergopolan, N. et al. High-resolution (1 km) Köppen-Geiger maps for 1901â2099 based on constrained CMIP6 projections. Sci Data 10, 724 (2023). https://doi.org/10.1038/s41597-023-02549-6
Version 2025.03 includes extra columns for the baseline, 2050s and 2090s datasets that partially correspond to climate zones used in the GlobalUsefulNativeTrees database. One of these zones are the Whittaker biome types, available as a polygon from the plotbiomes package (see also here). Whittaker biome types were extracted with similar R scripts as described by Kindt 2025 (these were also used to calculate environmental ranges of TreeGOER species, as archived here).
Version 2025.03 further includes information for the baseline climate on the steady state water table depth, obtained from a 30 arc-seconds raster layer calculated by the GLOBGM v1.0 model (Verkaik et al. 2024). Also included was the elevation, obtained from the same WorldClim 2.1 raster layer used to prepare TreeGOER.
As an alternative to CitiesGOER, the ClimateForecasts database (https://zenodo.org/records/10776414) documents the environmental conditions at the locations of 15,504 weather stations. ClimateForecasts was integrated in the GlobalUsefulNativeTrees database (see Kindt et al. 2023).
When using CitiesGOER in your work, cite this depository and the following:
Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1âkm spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12), 4302â4315. https://doi.org/10.1002/joc.5086
Title, P. O., & Bemmels, J. B. (2018). ENVIREM: An expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. Ecography, 41(2), 291â307. https://doi.org/10.1111/ecog.02880
Poggio, L., de Sousa, L. M., Batjes, N. H., Heuvelink, G. B. M., Kempen, B., Ribeiro, E., & Rossiter, D. (2021). SoilGrids 2.0: Producing soil information for the globe with quantified spatial uncertainty. SOIL, 7(1), 217â240. https://doi.org/10.5194/soil-7-217-2021
Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology 29: 6303â6318. https://onlinelibrary.wiley.com/doi/10.1111/gcb.16914.
Opendatasoft (2023) Geonames - All Cities with a population > 1000. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/information/?disjunctive.cou_name_en&sort=name (accessed 22-JULY-2023)
When using information from the Holdridge Life Zones, also cite:
Elsen, P. R., Saxon, E. C., Simmons, B. A., Ward, M., Williams, B. A., Grantham, H. S., Kark, S., Levin, N., Perez-Hammerle, K.-V., Reside, A. E., & Watson, J. E. M. (2022). Accelerated shifts in terrestrial life zones under rapid climate change. Global Change Biology, 28, 918â935. https://doi.org/10.1111/gcb.15962
When using information from Köppen-Geiger climate zones, also cite:
Beck, H.E., McVicar, T.R., Vergopolan, N., Berg, A., Lutsko, N.J., Dufour, A., Zeng, Z., Jiang, X., van Dijk, A.I. and Miralles, D.G. 2023. High-resolution (1 km) Köppen-Geiger maps for 1901â2099 based on constrained CMIP6 projections. Sci Data 10, 724. https://doi.org/10.1038/s41597-023-02549-6
When using information on the Whittaker biome types, also cite:
Ricklefs, R. E., Relyea, R. (2018). Ecology: The Economy of Nature. United States: W.H. Freeman.
Whittaker, R. H. (1970). Communities and ecosystems.
Valentin Ètefan, & Sam Levin. (2018). plotbiomes: R package for plotting Whittaker biomes with ggplot2 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7145245
When using information on the steady state water table depth, also cite:
Verkaik, J., Sutanudjaja, E. H., Oude Essink, G. H., Lin, H. X., & Bierkens, M. F. (2024). GLOBGM v1. 0: a parallel implementation of a 30 arcsec PCR-GLOBWB-MODFLOW global-scale groundwater model. Geoscientific Model Development, 17(1), 275-300. https://gmd.copernicus.org/articles/17/275/2024/
The development of CitiesGOER was supported by the Darwin Initiative to project DAREX001 of Developing a Global Biodiversity Standard certification for tree-planting and restoration, by Norwayâs International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia to the Provision of Adequate Tree Seed Portfolio project in Ethiopia, and by the Green Climate Fund through the IUCN-led Transforming the Eastern Province of Rwanda through Adaptation project. Development of version 2024.10 was further supported by the Green Climate Fund through the Readiness proposal on Climate Appropriate Portfolios of Tree Diversity for Burkina Faso project, by the Bezos Earth Fund to the Quality Tree Seed for Africa in Kenya and Rwanda project and by the German International Climate Initiative (IKI) to the regional tree seed programme on The Right Tree for the Right Place for the Right Purpose in Africa.
Facebook
TwitterVITAL SIGNS INDICATOR Housing Permits (LU3)
FULL MEASURE NAME Permitted housing units
LAST UPDATED October 2019
DESCRIPTION Housing growth is measured in terms of the number of units that local jurisdictions permit throughout a given year. A permitted unit is a unit that a city or county has authorized for construction.
DATA SOURCE Construction Industry Research Board Table 3: Residential Units and Valuation (1967-2010) No link available
California Housing Foundation/Construction Industry Research Board California Construction Trends (2011-2013) http://www.mychf.org/cirb/
Association of Bay Area Governments (ABAG) â Metropolitan Transportation Commission (MTC) Housing Permits Database (2014-2017) http://opendata.mtc.ca.gov
CONTACT INFORMATION vitalsigns.info@bayareametro.gov
METHODOLOGY NOTES (across all datasets for this indicator) Bay Area housing permits data prior to 2014 comes from the California Housing Foundation/Construction Industry Research Board. Data from 2014 to 2017 comes from the Association of Bay Area Governments (ABAG) â Metropolitan Transportation Commission (MTC) Housing Permits Database.
Single-family housing units include detached, semi-detached, row house and town house units. Row houses and town houses are included as single-family units when each unit is separated from the adjacent unit by an unbroken ground-to-roof party or fire wall. Condominiums are included as single-family units when they are of zero-lot-line or zero-property-line construction; when units are separated by an air space; or, when units are separated by an unbroken ground-to-roof party or fire wall. Multi-family housing includes duplexes, three-to-four-unit structures and apartment-type structures with five units or more. Multi-family also includes condominium units in structures of more than one living unit that do not meet the single-family housing definition. In the permits data from 2014 to 2017, single-family units include all units not strictly classified as multi-family. This may include secondary units.
Each multi-family unit is counted separately even though they may be in the same building. Total units is the sum of single-family and multi-family units. County data is available from 1967 whereas city data is available from 1990. City data is only available for incorporated cities and towns. All permits in unincorporated cities and towns are included under their respective countyâs unincorporated total. Permit data is not available for years when the city or town was not incorporated.
Affordable housing is the total number of permitted units affordable to low and very low income households. Housing affordable to very low income households are households making below 50% of the area median income. Housing affordable to low income households are households making between 50% and 80% of the area median income. Housing affordable to moderate income households are households making below 80% and 120% of the area median income. Housing affordable to above moderate income households are households making above 120% of the area median income.
Permit data is missing for the following cities and years: Clayton, 1990-2007 Lafayette, 1990-2007 Moraga, 1990-2007 Orinda, 1990-2007 San Ramon, 1990
Building permit data for metropolitan areas for each year is the sum of non-seasonally adjusted monthly estimates from the Building Permit Survey. The Bay Area values are the sum of the San Francisco-Oakland-Hayward MSA and the San Jose-Sunnyvale-Santa Clara MSA. The counties included in these areas are: San Francisco, Marin, Contra Costa, Alameda, San Mateo, Santa Clara, and San Benito.
Permit values reflect the number of units permitted in each respective year.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Police recorded crime figures by Police Force Area and Community Safety Partnership areas (which equate in the majority of instances, to local authorities).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The jobs.csv dataset contains 85k job postings from various cities and industries in Vietnam, providing insights into the job market as of April 22, 2025. The data includes details about job titles, locations, salaries, experience requirements, and job fields, making it a valuable resource for analyzing salary trends, regional job distribution, and industry demands.
jobs.csvThe dataset includes the following columns:
| Column Name | Description | Data Type | Example Value |
|---|---|---|---|
job_title | Title of the job posting | String | "Sales Executive" |
job_type | Type of employment (e.g., full-time, part-time) | String | "Full-time" |
position_level | Level of the position (e.g., Employee, Manager, Intern) | String | "NhĂąn viĂȘn" (Employee) |
city | City where the job is located | String | "Há» ChĂ Minh" |
experience | Required years of experience (e.g., "khĂŽng yĂȘu cáș§u", "2 - 5 nÄm") | String | "trĂȘn 1 nÄm" |
skills | Required skills for the job (comma-separated) | String | "English, Sales, Communication" |
job_fields | Industry or field of the job (comma-separated) | String | "Sales, Marketing, Retail" |
salary | General salary description (may be vague or blank) | String | "Thá»a thuáșn" (Negotiable) |
salary_min | Minimum salary offered (in VND or USD) | Float | 8000000 |
salary_max | Maximum salary offered (in VND or USD) | Float | 15000000 |
unit | Currency unit for salary (VND or USD) | String | "VND" |
experience field uses Vietnamese phrases like "khĂŽng yĂȘu cáș§u" (no experience required), "trĂȘn 1 nÄm" (over 1 year), or ranges like "2 - 5 nÄm". Parsing into numerical years is recommended for analysis.job_fields column contains comma-separated values, allowing a single job to belong to multiple industries (e.g., "Sales, Marketing").salary_min and salary_max, may be missing or zero. Filter out invalid entries for accurate analysis.This dataset can be used for:
To prepare the dataset for analysis, consider the following steps:
salary_min and salary_max are zero or missing, or impute using industry avera...
Facebook
TwitterBy joining the German Zollverein (Customs Union) in 1834, the Kingdom of WĂŒrttemberg committed itself to conduct a census in a fixed three-year rhythm according to uniform criteria and with a recording scheme that was as precise as possible. The data obtained in the process formed the basis for the distribution of the common revenues of the German Customs Union. The Kingdom of Wurttemberg conducted the first census as part of the Zollverein on 15 December 1834. The basis of the censuses was the ÂŽresidentÂŽ population, which according to the contemporary definition included all people who were present in the place on the reference date. Residents who were currently absent due to a journey were also taken into account. Men and women who were in transit in the census municipality were not included. Until 1858, the ÂŽlocalÂŽ population, i.e. the population living permanently in the village, was also counted. The data material of the Zollverein and Reich statistics was collected on the basis of the Oberamtslisten, which have survived in handwritten form (Landesarchiv Baden-WĂŒrttemberg, Staatsarchiv Ludwigsburg, Bestand E 258 VIII). The data is available at the municipal, Oberamts and district level. The figures reflect the territorial status valid at the time of the census as well as the contemporary administrative division. Four Excel tables are available for each census, in which the data for the municipalities and head offices of a district are summarised. A crossed-out place name indicates that the municipality in question belonged to another Oberamt at the time of the census. Municipalities that were newly assigned to a Oberamt between 1834 and 1925 are usually added at the end of the Oberamt list. Information on the change of office affiliation can be found in the comment field. An asterisk after a place name (name of the city or village) indicates such supplementary information. The comment field opens as soon as the cursor is placed on the field of the place (city or village) concerned. The primary researchers supplemented the data material with historical maps. The maps of the four WĂŒrttemberg districts are taken from the publication: ÂŽDas Königreich WĂŒrttembergÂŽ (The Kingdom of WĂŒrttemberg), which was published by the State Statistical Office in four volumes between 1904 and 1907.ÂŽ Explanation of symbols 0 = Less than half of 1 in the last filled position, but more than nothing- = Nothing present (exactly zero). = Numerical value unknown or to be kept secretx = Table compartment locked because statement does not make sense... = Statement to be made later/ = No statement, as the numerical value is not certain enough() = Statement value limited, as the numerical value may contain errors Discrepancies in the totals can be explained by rounding the numbers. Place names that have been crossed out indicate that the municipality in question belonged to a different Oberamt at the time of the census. * An asterisk after a place name indicates information about the records in the comment field.ÂŽ Publication: CD-ROM: »Königreich WĂŒrttemberg« VolkszĂ€hlungen 1834 bis 1925. Statistisches Landesamt Baden-WĂŒrttemberg. Zu bestellen unter: https://www.statistik-bw.de/Service/Veroeff/Statistische_Daten/900208001.bsE-Mail: vertrieb@stala.bwl.de
Facebook
TwitterNo description is available. Visit https://dataone.org/datasets/a0d9075d9dabafe15910b1e96b2e9a52 for complete metadata about this dataset.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset is about: (Table 5) Chemical composition of aragonite from the Lost City hydrothermal field according to electron microprobe analysis. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.765175 for more information.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.