31 datasets found

2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates...
data.census.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST1Y2021.S0101?q=S0101:+AGE+AND+SEX
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2021
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
TABLE III. Deaths in 122 U.S. cities
catalog.data.gov
healthdata.gov
+6more
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). TABLE III. Deaths in 122 U.S. cities [Dataset]. https://catalog.data.gov/dataset/table-iii-deaths-in-122-u-s-cities
Explore at:
Dataset updated
Jul 11, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Area covered
United States
Description
TABLE III. Deaths in 122 U.S. cities – 2016. 122 Cities Mortality Reporting System — Each week, the vital statistics offices of 122 cities across the United States report the total number of death certificates processed and the number of those for which pneumonia or influenza was listed as the underlying or contributing cause of death by age group (Under 28 days, 28 days –1 year, 1-14 years, 15-24 years, 25-44 years, 45-64 years, 65-74 years, 75-84 years, and ≥ 85 years). FOOTNOTE: U: Unavailable. —: No reported cases. * Mortality data in this table are voluntarily reported from 122 cities in the United States, most of which have populations of 100,000 or more. A death is reported by the place of its occurrence and by the week that the death certificate was filed. Fetal deaths are not included. † Pneumonia and influenza. § Total includes unknown ages.
T
Vital Signs: Housing Permits - by metro area
data.bayareametro.gov
csv, xlsx, xml
Updated Oct 31, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Vital Signs: Housing Permits - by metro area [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Housing-Permits-by-metro-area/9muq-ubre
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Oct 31, 2019
Description
VITAL SIGNS INDICATOR Housing Permits (LU3)

FULL MEASURE NAME Permitted housing units

LAST UPDATED October 2019

DESCRIPTION Housing growth is measured in terms of the number of units that local jurisdictions permit throughout a given year. A permitted unit is a unit that a city or county has authorized for construction.

DATA SOURCE Construction Industry Research Board Table 3: Residential Units and Valuation (1967-2010) No link available

California Housing Foundation/Construction Industry Research Board California Construction Trends (2011-2013) http://www.mychf.org/cirb/

Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database (2014-2017) http://opendata.mtc.ca.gov

CONTACT INFORMATION vitalsigns.info@bayareametro.gov

METHODOLOGY NOTES (across all datasets for this indicator) Bay Area housing permits data prior to 2014 comes from the California Housing Foundation/Construction Industry Research Board. Data from 2014 to 2017 comes from the Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database.

Single-family housing units include detached, semi-detached, row house and town house units. Row houses and town houses are included as single-family units when each unit is separated from the adjacent unit by an unbroken ground-to-roof party or fire wall. Condominiums are included as single-family units when they are of zero-lot-line or zero-property-line construction; when units are separated by an air space; or, when units are separated by an unbroken ground-to-roof party or fire wall. Multi-family housing includes duplexes, three-to-four-unit structures and apartment-type structures with five units or more. Multi-family also includes condominium units in structures of more than one living unit that do not meet the single-family housing definition. In the permits data from 2014 to 2017, single-family units include all units not strictly classified as multi-family. This may include secondary units.

Each multi-family unit is counted separately even though they may be in the same building. Total units is the sum of single-family and multi-family units. County data is available from 1967 whereas city data is available from 1990. City data is only available for incorporated cities and towns. All permits in unincorporated cities and towns are included under their respective county’s unincorporated total. Permit data is not available for years when the city or town was not incorporated.

Affordable housing is the total number of permitted units affordable to low and very low income households. Housing affordable to very low income households are households making below 50% of the area median income. Housing affordable to low income households are households making between 50% and 80% of the area median income. Housing affordable to moderate income households are households making below 80% and 120% of the area median income. Housing affordable to above moderate income households are households making above 120% of the area median income.

Permit data is missing for the following cities and years: Clayton, 1990-2007 Lafayette, 1990-2007 Moraga, 1990-2007 Orinda, 1990-2007 San Ramon, 1990

Building permit data for metropolitan areas for each year is the sum of non-seasonally adjusted monthly estimates from the Building Permit Survey. The Bay Area values are the sum of the San Francisco-Oakland-Hayward MSA and the San Jose-Sunnyvale-Santa Clara MSA. The counties included in these areas are: San Francisco, Marin, Contra Costa, Alameda, San Mateo, Santa Clara, and San Benito.

Permit values reflect the number of units permitted in each respective year.

City of Los Angeles Crime data

kaggle.com

zip

Updated Apr 29, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Ramin Huseyn (2024). City of Los Angeles Crime data [Dataset]. https://www.kaggle.com/datasets/raminhuseyn/crime-data-from-2020-to-present

Explore at:

zip(48433749 bytes)Available download formats

Dataset updated

Apr 29, 2024

Authors

Ramin Huseyn

License

https://www.usa.gov/government-works/https://www.usa.gov/government-works/

Area covered

Los Angeles

Description

This dataset reflects incidents of crime in the City of Los Angeles dating back to 2020. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. The dataset contains 2,083,227 rows and 29 columns.

Column name	Description
DR_NO	Division of Records Number: Official file number made up of a 2 digit year, area ID, and 5 digits
Date Rptd	MM/DD/YYYY
DATE OCC	MM/DD/YYYY
TIME OCC	In 24 hour military time.
AREA	The LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21.
AREA NAME	The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for.
Crm Cd	Indicates the crime committed. (Same as Crime Code 1)
Crm Cd Desc	Defines the Crime Code provided.
Mocodes	Modus Operandi: Activities associated with the suspect in commission of the crime
Vict Age	Victim age
Vict Sex	F - Female M - Male X - Unknown
Vict Descent	Descent Code: A - Other Asian B - Black C - Chinese D - Cambodian F - Filipino G - Guamanian H - Hispanic/Latin/Mexican I - American Indian/Alaskan Native J - Japanese K - Korean L - Laotian O - Other P - Pacific Islander S - Samoan U - Hawaiian V - Vietnamese W - White X - Unknown Z - Asian Indian
Premis Cd	The type of structure, vehicle, or location where the crime took place.
Premis Desc	Defines the Premise Code provided
Weapon Used Cd	The type of weapon used in the crime.
Weapon Desc	Defines the Weapon Used Code provided.
Status	Status of the case. (IC is the default)
Status Desc	Defines the Status Code provided.
Crm Cd 1	Indicates the crime committed. Crime Code 1 is the primary and most serious one. Crime Code 2, 3, and 4 are respectively less serious offenses. Lower crime class numbers are more serious.
Crm Cd 2	May contain a code for an additional crime, less serious than Crime Code 1.
Crm Cd 3	May contain a code for an additional crime, less serious than Crime Code 1
Crm Cd 4	May contain a code for an additional crime, less serious than Crime Code 1.
LOCATION	Street address of crime incident rounded to the nearest hundred block to maintain anonymity.
Cross Street	Cross Street of rounded Address.
LAT	Latitude
LON	Longtitude

C
Violence Reduction - Victim Demographics - Aggregated
data.cityofchicago.org
s.cnmilf.com
+1more
csv, xlsx, xml
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2025). Violence Reduction - Victim Demographics - Aggregated [Dataset]. https://data.cityofchicago.org/Public-Safety/Violence-Reduction-Victim-Demographics-Aggregated/gj7a-742p
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Dec 2, 2025
Dataset authored and provided by
City of Chicago
Description
This dataset contains aggregate data on violent index victimizations at the quarter level of each year (i.e., January – March, April – June, July – September, October – December), from 2001 to the present (1991 to present for Homicides), with a focus on those related to gun violence. Index crimes are 10 crime types selected by the FBI (codes 1-4) for special focus due to their seriousness and frequency. This dataset includes only those index crimes that involve bodily harm or the threat of bodily harm and are reported to the Chicago Police Department (CPD). Each row is aggregated up to victimization type, age group, sex, race, and whether the victimization was domestic-related. Aggregating at the quarter level provides large enough blocks of incidents to protect anonymity while allowing the end user to observe inter-year and intra-year variation. Any row where there were fewer than three incidents during a given quarter has been deleted to help prevent re-identification of victims. For example, if there were three domestic criminal sexual assaults during January to March 2020, all victims associated with those incidents have been removed from this dataset. Human trafficking victimizations have been aggregated separately due to the extremely small number of victimizations.

This dataset includes a " GUNSHOT_INJURY_I " column to indicate whether the victimization involved a shooting, showing either Yes ("Y"), No ("N"), or Unknown ("UKNOWN.") For homicides, injury descriptions are available dating back to 1991, so the "shooting" column will read either "Y" or "N" to indicate whether the homicide was a fatal shooting or not. For non-fatal shootings, data is only available as of 2010. As a result, for any non-fatal shootings that occurred from 2010 to the present, the shooting column will read as “Y.” Non-fatal shooting victims will not be included in this dataset prior to 2010; they will be included in the authorized dataset, but with "UNKNOWN" in the shooting column.

The dataset is refreshed daily, but excludes the most recent complete day to allow CPD time to gather the best available information. Each time the dataset is refreshed, records can change as CPD learns more about each victimization, especially those victimizations that are most recent. The data on the Mayor's Office Violence Reduction Dashboard is updated daily with an approximately 48-hour lag. As cases are passed from the initial reporting officer to the investigating detectives, some recorded data about incidents and victimizations may change once additional information arises. Regularly updated datasets on the City's public portal may change to reflect new or corrected information.

How does this dataset classify victims?

The methodology by which this dataset classifies victims of violent crime differs by victimization type:

Homicide and non-fatal shooting victims: A victimization is considered a homicide victimization or non-fatal shooting victimization depending on its presence in CPD's homicide victims data table or its shooting victims data table. A victimization is considered a homicide only if it is present in CPD's homicide data table, while a victimization is considered a non-fatal shooting only if it is present in CPD's shooting data tables and absent from CPD's homicide data table.

To determine the IUCR code of homicide and non-fatal shooting victimizations, we defer to the incident IUCR code available in CPD's Crimes, 2001-present dataset (available on the City's open data portal). If the IUCR code in CPD's Crimes dataset is inconsistent with the homicide/non-fatal shooting categorization, we defer to CPD's Victims dataset.

For a criminal homicide, the only sensible IUCR codes are 0110 (first-degree murder) or 0130 (second-degree murder). For a non-fatal shooting, a sensible IUCR code must signify a criminal sexual assault, a robbery, or, most commonly, an aggravated battery. In rare instances, the IUCR code in CPD's Crimes and Victims dataset do not align with the homicide/non-fatal shooting categorization:

In instances where a homicide victimization does not correspond to an IUCR code 0110 or 0130, we set the IUCR code to "01XX" to indicate that the victimization was a homicide but we do not know whether it was a first-degree murder (IUCR code = 0110) or a second-degree murder (IUCR code = 0130).

When a non-fatal shooting victimization does not correspond to an IUCR code that signifies a criminal sexual assault, robbery, or aggravated battery, we enter “UNK” in the IUCR column, “YES” in the GUNSHOT_I column, and “NON-FATAL” in the PRIMARY column to indicate that the victim was non-fatally shot, but the precise IUCR code is unknown.

Other violent crime victims: For other violent crime types, we refer to the IUCR classification that exists in CPD's victim table, with only one exception:

When there is an incident that is associated with no victim with a matching IUCR code, we assume that this is an error. Every crime should have at least 1 victim with a matching IUCR code. In these cases, we change the IUCR code to reflect the incident IUCR code because CPD's incident table is considered to be more reliable than the victim table.

Note: All businesses identified as victims in CPD data have been removed from this dataset.

Note: The definition of “homicide” (shooting or otherwise) does not include justifiable homicide or involuntary manslaughter. This dataset also excludes any cases that CPD considers to be “unfounded” or “noncriminal.”

Note: In some instances, the police department's raw incident-level data and victim-level data that were inputs into this dataset do not align on the type of crime that occurred. In those instances, this dataset attempts to correct mismatches between incident and victim specific crime types. When it is not possible to determine which victims are associated with the most recent crime determination, the dataset will show empty cells in the respective demographic fields (age, sex, race, etc.).

Note: The initial reporting officer usually asks victims to report demographic data. If victims are unable to recall, the reporting officer will use their best judgment. “Unknown” can be reported if it is truly unknown.
Mortality and potential years of life lost, by selected causes of death and...
www150.statcan.gc.ca
open.canada.ca
+1more
Updated May 24, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2016). Mortality and potential years of life lost, by selected causes of death and sex, three-year average, Canada, provinces, territories, health regions and peer groups occasional (number) [Dataset]. http://doi.org/10.25318/1310074201-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310074201-eng
Dataset updated
May 24, 2016
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
This table contains 135864 series, with data for years 2000/2002 - 2010/2012 (not all combinations necessarily have data for all years), and was last released on 2016-05-24. This table contains data described by the following dimensions (Not all combinations are available): Geography (148 items: Canada; Newfoundland and Labrador; Eastern Regional Integrated Health Authority, Newfoundland and Labrador; Central Regional Integrated Health Authority, Newfoundland and Labrador; ...), Sex (3 items: Both sexes; Males; Females), Indicators (2 items: Mortality; Potential years of life lost), Selected causes of death (ICD-10) (17 items: Total, all causes of death; All malignant neoplasms (cancers); Colorectal cancer; Lung cancer; ...), Characteristics (9 items: Number; Low 95% confidence interval, number; High 95% confidence interval, number; Rate; ...).
O
City Property Tax Data Appendix A
data.orcities.org
splitgraph.com
csv, xlsx, xml
Updated Apr 19, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). City Property Tax Data Appendix A [Dataset]. https://data.orcities.org/w/gqi8-s84n/default?cur=HzxCeVxIMMr&from=CsljHS0hqWg
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Apr 19, 2016
Description
This table contains data from the 2016 City Property Tax Report--Appendix A. Data is from the Department of Revenue Property Tax Statistics Supplemental Report. An empty cell means missing information. 13 cities do not have a permanent property tax rate. *Denotes rates for urban renewal in "Other" category. ** Portland is the only city with GAP bond--$2.6671/thousand which is not in the table, but included in the "Total City Rate" column. ***In some instances cities were contacted to verify data.
d
Strategic Measure_Aggregated Sidewalk Construction Data
catalog.data.gov
Updated Apr 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2022). Strategic Measure_Aggregated Sidewalk Construction Data [Dataset]. https://catalog.data.gov/no/dataset/strategic-measure-aggregated-sidewalk-construction-data
Explore at:
Dataset updated
Apr 28, 2022
Dataset provided by
data.austintexas.gov
Description
This dataset shows new sidewalk added to the City of Austin's network by calendar year. Data in this table is limited to the full purpose jurisdiction of the City of Austin as of publication date. Sidewalk construction comes from many sources, including but not limited to the City of Austin, counties, state agencies, and private developers. Existing data does not support separating out city and non-city construction. This dataset supports the SD23 performance measure M.C.6a: Percent of missing sidewalks completed. Detailed sidewalk segment data is available in the dataset Strategic Measure_Sidewalk Segment Data. View more details and insights related to this data set on the story page: https://data.austintexas.gov/stories/s/Percentage-of-Missing-Sidewalk-Network-Completed/ffkw-wkiv/

Major Cities Weather Data 1995-present

kaggle.com

zip

Updated Nov 10, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Wafaa EL HUSSEINI (2025). Major Cities Weather Data 1995-present [Dataset]. https://www.kaggle.com/datasets/wafaaelhusseini/major-cities-weather-data

Explore at:

zip(5099166 bytes)Available download formats

Dataset updated

Nov 10, 2025

Authors

Wafaa EL HUSSEINI

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

🌍 Major Cities Daily Weather (1995 – present) [CURRENTLY UNDER CONSTRUCTION]

This dataset provides daily historical weather data for major cities around the world — including all national capitals and large population centers — from 1995 to 2024
Data is sourced from the Open-Meteo Historical Weather API and Wikidata, processed and harmonized for easy analysis and visualization.

📦 Contents

File	Description
`cities_clean.parquet`	Metadata of all selected cities (country, coordinates, population, capital flag).
`history.parquet`	Full daily dataset (one row per city × day).
`history_latest.csv`	Snapshot of the most recent day available.

🌆 City Selection Methodology

“Major cities” are defined as:

All national capitals, plus
Cities with population ≥ 300 000, and
The top 10 most populated cities for each country (based on Wikidata).

Coordinates (lat, lon) and country ISO codes come from Wikidata’s structured data.
Population values are used only for ranking and filtering.

☀️ Weather Variables

Daily values from Open-Meteo’s ERA5-based reanalysis:

Variable	Unit	Description
`temp_max_c`, `temp_min_c`	°C	Maximum / minimum 2 m air temperature
`temp_mean_c_approx`	°C	Approximate daily mean ((max+min)/2)
`app_temp_max_c`, `app_temp_min_c`	°C	Apparent (feels-like) temperature
`precip_mm`, `rain_mm`, `snow_mm`	mm	Total precipitation, rain, snowfall
`windspeed_10m_max_kmh`, `windgusts_10m_max_kmh`	km/h	Maximum daily windspeed / gusts
`wind_dir_dom_deg`	°	Dominant wind direction
`sunshine_duration_s`, `daylight_duration_s`	s	Total sunshine / daylight duration
`shortwave_radiation_MJ_m2`	MJ/m²	Daily sum of incoming shortwave radiation

All timestamps are daily aggregates in UTC.

🧠 Notes

The dataset merges 29 years of global reanalysis data (1995 – 2024).
Missing or obviously invalid values are left as null.
Each record is uniquely identified by (date, country, city).
Weather data are physically modelled, not observed station data.

⚖️ License & Attribution

Data is available under CC BY 4.0.

🌆 City Lifestyle Segmentation Dataset

kaggle.com

zip

Updated Nov 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

UmutUygurr (2025). 🌆 City Lifestyle Segmentation Dataset [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/city-lifestyle-segmentation-dataset

Explore at:

zip(11274 bytes)Available download formats

Dataset updated

Nov 15, 2025

Authors

UmutUygurr

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">

🌆 About This Dataset

This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.

🎯 Perfect For:

📊 K-Means, DBSCAN, Agglomerative Clustering
🔬 PCA & t-SNE Dimensionality Reduction
🗺️ Geospatial Visualization (Plotly, Folium)
📈 Correlation Analysis & Feature Engineering
🎓 Educational Projects (Beginner to Intermediate)

📦 What's Inside?

Feature	Description	Range
10 Features	Economic, environmental & social indicators	Realistically scaled
300 Cities	Europe, Asia, Americas, Africa, Oceania	Diverse distributions
Strong Correlations	Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6)	ML-ready
No Missing Values	Clean, preprocessed data	Ready for analysis
4-5 Natural Clusters	Metropolitan hubs, eco-towns, developing centers	Pre-validated

🔥 Key Features

✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases

🚀 Quick Start Example

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)

# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)

# Analyze
print(df.groupby('cluster').mean())

🎓 Learning Outcomes

After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics

📚 Ideal For These Projects

🏆 Kaggle Competitions: Practice clustering techniques
📝 Academic Projects: Urban planning, sociology, environmental science
💼 Portfolio Work: Showcase ML skills to employers
🎓 Learning: Hands-on practice with unsupervised learning
🔬 Research: Urban lifestyle segmentation studies

🌍 Expected Clusters

Cluster	Characteristics	Example Cities
Metropolitan Tech Hubs	High income, density, rent	Silicon Valley, Singapore
Eco-Friendly Towns	Low density, clean air, high happiness	Nordic cities
Developing Centers	Mid income, high density, poor air	Emerging markets
Low-Income Suburban	Low infrastructure, income	Rural areas
Industrial Mega-Cities	Very high density, pollution	Manufacturing hubs

🛠️ Technical Details

Format: CSV (UTF-8)
Size: ~300 rows × 10 columns
Missing Values: 0%
Data Types: 2 categorical, 8 numerical
Target Variable: None (unsupervised)
Correlation Strength: Pre-validated (r: 0.4 to 0.8)

📖 What Makes This Dataset Special?

Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code

🏅 Use This Dataset If You Want To:

✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights

📊 Acknowledgments

This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.

Happy Clustering! 🎉

Global Air Quality Data(15 Days Hourly, 50 Cities)

kaggle.com

zip

Updated Nov 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Smeet Raichura (2025). Global Air Quality Data(15 Days Hourly, 50 Cities) [Dataset]. https://www.kaggle.com/datasets/smeet888/global-air-quality-data15-days-hourly-50-cities

Explore at:

zip(598546 bytes)Available download formats

Dataset updated

Nov 19, 2025

Authors

Smeet Raichura

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

📘 Overview

This dataset provides hourly air-quality measurements for 50 major global cities over a continuous 15-day period, including pollutant concentrations, meteorological conditions, geographical metadata, and an engineered AQI index.

All values are synthetically generated using historically consistent pollutant patterns and statistical ranges, allowing researchers and ML practitioners to work with realistic air-quality trends without licensing restrictions or data-collection barriers.

This dataset is ideal for time-series modeling, forecasting, environmental analytics, and machine-learning experimentation.

🧭 Cities Included

Covers all major regions:

North America — New York, Los Angeles, Toronto

Europe — London, Paris, Berlin, Zurich

Asia — Delhi, Tokyo, Seoul, Beijing, Singapore

Middle East — Dubai, Riyadh, Doha

Africa — Lagos, Cairo, Nairobi

Oceania — Sydney, Melbourne, Auckland

South America — São Paulo, Buenos Aires

🧱 Dataset Structure

Each hourly record includes:

Air Pollutants

PM2.5 (µg/m³)

PM10 (µg/m³)

NO₂ (ppb)

SO₂ (ppb)

O₃ (ppb)

CO (ppm)

Weather Features

Temperature (°C)

Humidity (%)

Wind Speed (m/s)

Location Metadata

City

Country

Latitude

Longitude

Other

Timestamp (ISO-8601)

AQI (Computed index)

🧹 Data Quality & Formatting

No missing values — 100% complete

Numeric values rounded to 3 decimals

Clean column names (snake_case)

Consistent hourly frequency

Fully ML-ready

📊 Example Use Cases

✔ AQI forecasting (LSTM, GRU, Transformers) ✔ Multivariate time-series modeling ✔ Clustering cities by pollution patterns ✔ Environmental trend visualization ✔ Weather–pollution correlation studies ✔ Anomaly detection (peak pollution events)

Column	Description	Unit	Type
timestamp	Hourly timestamp (UTC)	—	datetime
city	City name	—	string
country	Country name	—	string
latitude	City latitude	°	float
longitude	City longitude	°	float
pm25	Fine particulate matter	µg/m³	float
pm10	Coarse particulate matter	µg/m³	float
no2	Nitrogen dioxide	ppb	float
so2	Sulfur dioxide	ppb	float
o3	Ozone	ppb	float
co	Carbon monoxide	ppm	float
temperature	Ambient temperature	°C	float
humidity	Relative humidity	%	float
wind_speed	Wind speed	m/s	float
aqi	Derived Air Quality Index	—	int

🧪 Data Generation Method (Provenance)

This dataset is synthetically generated using realistic pollutant behavior patterns based on historical studies and open-source environmental datasets.

Modeling steps included:

City-specific pollutant baseline ranges

Randomized variation using Gaussian noise

Temporal patterns using sinusoidal diurnal cycles (morning & evening peaks)

Weather-pollution correlation rules (e.g., low wind → higher PM)

AQI computed using standard US-EPA breakpoints

All numeric values standardized to 3-decimal precision

This ensures that although synthetic, the dataset follows realistic environmental dynamics.

📁 File Information

global_air_quality_50_cities.csv

Rows: 18,000+

Columns: 16

Format: UTF-8 CSV

Mortality and potential years of life lost, by selected causes of death and...
www150.statcan.gc.ca
data.urbandatacentre.ca
+2more
Updated Mar 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2016). Mortality and potential years of life lost, by selected causes of death and sex, three-year average, census metropolitan areas occasional (number) [Dataset]. http://doi.org/10.25318/1310074101-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310074101-eng
Dataset updated
Mar 16, 2016
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
This table contains 33048 series, with data for years 2000/2002 - 2010/2012 (not all combinations necessarily have data for all years), and was last released on 2016-03-16. This table contains data described by the following dimensions (Not all combinations are available): Geography (36 items: Total, census metropolitan areas; St. John's, Newfoundland and Labrador; Halifax, Nova Scotia;Moncton, New Brunswick; ...), Sex (3 items: Both sexes; Males; Females), Indicators (2 items: Mortality; Potential years of life lost), Selected causes of death (ICD-10) (17 items: Total, all causes of death; All malignant neoplasms (cancers); Colorectal cancer; Lung cancer; ...), Characteristics (9 items: Number; Low 95% confidence interval, number; High 95% confidence interval, number; Rate; ...).
2020 American Community Survey: S0804 | MEANS OF TRANSPORTATION TO WORK BY...
data.census.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2020 American Community Survey: S0804 | MEANS OF TRANSPORTATION TO WORK BY SELECTED CHARACTERISTICS FOR WORKPLACE GEOGRAPHY (ACS 5-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST5Y2020.S0804?q=Cimarron+city,+Kansas+Employment&y=2020
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2020
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Foreign born excludes people born outside the United States to a parent who is a U.S. citizen..Tables for Workplace Geography are only available for States; Counties; Places; County Subdivisions in selected states (CT, ME, MA, MI, MN, NH, NJ, NY, PA, RI, VT, WI); Combined Statistical Areas; Metropolitan and Micropolitan Statistical Areas, and their associated Metropolitan Divisions and Principal Cities; Combined New England City and Town Areas; New England City and Town Areas, and their associated Divisions and Principal Cities. Tables B08601, B08602, B08603, and B08604 are also available for Place parts and County Subdivision parts for the 5-year ACS datasets..Workers include members of the Armed Forces and civilians who were at work last week..Industry titles and their 4-digit codes are based on the North American Industry Classification System (NAICS). The Census industry codes for 2018 and later years are based on the 2017 revision of the NAICS. To allow for the creation of multiyear tables, industry data in the multiyear files (prior to data year 2018) were recoded to the 2017 Census industry codes. We recommend using caution when comparing data coded using 2017 Census industry codes with data coded using Census industry codes prior to data year 2018. For more information on the Census industry code changes, please visit our website at https://www.census.gov/topics/employment/industry-occupation/guidance/code-lists.html..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..2019 ACS data products include updates to several categories of the existing means of transportation question. For more information, see: Change to Means of Transportation..Occupation titles and their 4-digit codes are based on the Standard Occupational Classification (SOC). The Census occupation codes for 2018 and later years are based on the 2018 revision of the SOC. To allow for the creation of the multiyear tables, occupation data in the multiyear files (prior to data year 2018) were recoded to the 2018 Census occupation codes. We recommend using caution when comparing data coded using 2018 Census occupation codes with data coded using Census occupation codes prior to data year 2018. For more information on the Census occupation code changes, please visit our website at https://www.census.gov/topics/employment /industry-occupation/guidance/code-lists.html..In 2019, methodological changes were made to the class of worker question. These changes involved modifications to the question wording, the category wording, and the visual format of the categories on the questionnaire. The format for the class of worker categories are now listed under the headings "Private Sector Employee," "Government Employee," and "Self-Employed or Other." Additionally, the category of Active Duty was added as one of the response categories under the "Government Employee" section for the mail questionnaire. For more detailed info...
Z
CitiesGOER: Globally Observed Environmental Data for 52,602 Cities with a...
data-staging.niaid.nih.gov
zenodo.org
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kindt, Roeland (2025). CitiesGOER: Globally Observed Environmental Data for 52,602 Cities with a Population ≥ 5000 [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8175429
Explore at:
Dataset updated
Mar 19, 2025
Dataset provided by
CIFOR-ICRAF
Authors
Kindt, Roeland
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CitiesGOER is a database that provides environmental data for 52,602 cities and 48 environmental variables, including 38 bioclimatic variables, 8 soil variables and 2 topographic variables. Data were extracted from the same 30 arc-seconds global grid layers that were prepared when making the TreeGOER (Tree Globally Observed Environmental Ranges) database that is available from https://doi.org/10.5281/zenodo.7922927. Details on the preparations of these layers are provided by Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology 29: 6303–6318. https://onlinelibrary.wiley.com/doi/10.1111/gcb.16914. CitiesGOER was designed to be used together with TreeGOER and possibly also with the GlobalUsefulNativeTrees database (Kindt et al. 2023) to allow users to filter suitable tree species based on environmental conditions of the planting site.

The identities and coordinates of cities were sourced from a data set with information for cities with a population size larger than 1000 that was created by Opendatasoft and made available from https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/table/?disjunctive.cou_name_en&sort=name. The data was downloaded on 22-JULY-2023 and afterwards filtered for cities with a population of 5000 or above. Cities where information on the country was missing were removed. The coordinates of cities were used to extract the environmental data via the terra package (Hijmans et al. 2022, version 1.6-47) in the R 4.2.1 environment.

Version 2023.08 provided median values from 23 Global Climate Models (GCMs) for Shared Socio-Economic Pathway (SSP) 1-2.6 and from 18 GCMs for SSP 3-7.0, both for the 2050s (2041-2060). Similar methods were used to calculate these median values as in the case studies for the TreeGOER manuscript (calculations were partially done via the BiodiversityR::ensemble.envirem.run function and with downscaled bioclimatic and monthly climate 2.5 arc-minutes future grid layers available from WorldClim 2.1).

Version 2023.09 used similar methods as for previous versions to provide median values from 13 GCMs for the 2090s (2081-2100) for SSP 5-8.5.

The locations of the 52,602 cities are mapped in one of the series available from the TreeGOER Global Zones atlas that can be obtained from https://doi.org/10.5281/zenodo.8252756.

Version 2024.10 includes a new data set that documents the location of the city locations in Holdridge Life Zones. Information is given for historical (1901-1920), contemporary (1979-2013) and future (2061-2080; separately for RCP 4.5 and RCP 8.5) climates inferred from global raster layers that are available for download from DRYAD and were created for the following article: Elsen et al. 2022. Accelerated shifts in terrestrial life zones under rapid climate change. Global Change Biology, 28, 918–935. https://doi.org/10.1111/gcb.15962. Version 2024.10 further includes Holdridge Life Zones for the climates that were available from the previous versions, calculating biotemperatures and life zones with similar methods as used by Holdridge (1947; 1967) and Elsen et al. (2022) (for future climates, median values were determined first for monthly maximum and minimum temperatures across GCMs ). The distributions of the 48,129 species documented in TreeGOER across the Holdridge Life Zones are given in this Zenodo archive: https://zenodo.org/records/14020914.

Version 2024.11 includes a new data set that documents the location of the city locations in Köppen-Geiger climate zones. Information is given for historical (1901-1930, 1931-1960, 1961-1990) and future (2041-2070 and 2071-2099) climates, with for the future climates seven scenarios each (SSP 1-1.9, SSP 1-2.6, SSP 2-4.5, SSP 3-7.0, SSP 4-3.4, SSP 4-6.0 and SSP 5-8.5). This data set was created from 30 arc-second raster layers available via: Beck, H.E., McVicar, T.R., Vergopolan, N. et al. High-resolution (1 km) Köppen-Geiger maps for 1901–2099 based on constrained CMIP6 projections. Sci Data 10, 724 (2023). https://doi.org/10.1038/s41597-023-02549-6

Version 2025.03 includes extra columns for the baseline, 2050s and 2090s datasets that partially correspond to climate zones used in the GlobalUsefulNativeTrees database. One of these zones are the Whittaker biome types, available as a polygon from the plotbiomes package (see also here). Whittaker biome types were extracted with similar R scripts as described by Kindt 2025 (these were also used to calculate environmental ranges of TreeGOER species, as archived here).

Version 2025.03 further includes information for the baseline climate on the steady state water table depth, obtained from a 30 arc-seconds raster layer calculated by the GLOBGM v1.0 model (Verkaik et al. 2024). Also included was the elevation, obtained from the same WorldClim 2.1 raster layer used to prepare TreeGOER.

As an alternative to CitiesGOER, the ClimateForecasts database (https://zenodo.org/records/10776414) documents the environmental conditions at the locations of 15,504 weather stations. ClimateForecasts was integrated in the GlobalUsefulNativeTrees database (see Kindt et al. 2023).

When using CitiesGOER in your work, cite this depository and the following:

Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1‐km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12), 4302–4315. https://doi.org/10.1002/joc.5086

Title, P. O., & Bemmels, J. B. (2018). ENVIREM: An expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. Ecography, 41(2), 291–307. https://doi.org/10.1111/ecog.02880

Poggio, L., de Sousa, L. M., Batjes, N. H., Heuvelink, G. B. M., Kempen, B., Ribeiro, E., & Rossiter, D. (2021). SoilGrids 2.0: Producing soil information for the globe with quantified spatial uncertainty. SOIL, 7(1), 217–240. https://doi.org/10.5194/soil-7-217-2021

Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology 29: 6303–6318. https://onlinelibrary.wiley.com/doi/10.1111/gcb.16914.

Opendatasoft (2023) Geonames - All Cities with a population > 1000. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/information/?disjunctive.cou_name_en&sort=name (accessed 22-JULY-2023)

When using information from the Holdridge Life Zones, also cite:

Elsen, P. R., Saxon, E. C., Simmons, B. A., Ward, M., Williams, B. A., Grantham, H. S., Kark, S., Levin, N., Perez-Hammerle, K.-V., Reside, A. E., & Watson, J. E. M. (2022). Accelerated shifts in terrestrial life zones under rapid climate change. Global Change Biology, 28, 918–935. https://doi.org/10.1111/gcb.15962

When using information from Köppen-Geiger climate zones, also cite:

Beck, H.E., McVicar, T.R., Vergopolan, N., Berg, A., Lutsko, N.J., Dufour, A., Zeng, Z., Jiang, X., van Dijk, A.I. and Miralles, D.G. 2023. High-resolution (1 km) Köppen-Geiger maps for 1901–2099 based on constrained CMIP6 projections. Sci Data 10, 724. https://doi.org/10.1038/s41597-023-02549-6

When using information on the Whittaker biome types, also cite:

Ricklefs, R. E., Relyea, R. (2018). Ecology: The Economy of Nature. United States: W.H. Freeman.

Whittaker, R. H. (1970). Communities and ecosystems.

Valentin Ștefan, & Sam Levin. (2018). plotbiomes: R package for plotting Whittaker biomes with ggplot2 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7145245

When using information on the steady state water table depth, also cite:

Verkaik, J., Sutanudjaja, E. H., Oude Essink, G. H., Lin, H. X., & Bierkens, M. F. (2024). GLOBGM v1. 0: a parallel implementation of a 30 arcsec PCR-GLOBWB-MODFLOW global-scale groundwater model. Geoscientific Model Development, 17(1), 275-300. https://gmd.copernicus.org/articles/17/275/2024/

The development of CitiesGOER was supported by the Darwin Initiative to project DAREX001 of Developing a Global Biodiversity Standard certification for tree-planting and restoration, by Norway’s International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia to the Provision of Adequate Tree Seed Portfolio project in Ethiopia, and by the Green Climate Fund through the IUCN-led Transforming the Eastern Province of Rwanda through Adaptation project. Development of version 2024.10 was further supported by the Green Climate Fund through the Readiness proposal on Climate Appropriate Portfolios of Tree Diversity for Burkina Faso project, by the Bezos Earth Fund to the Quality Tree Seed for Africa in Kenya and Rwanda project and by the German International Climate Initiative (IKI) to the regional tree seed programme on The Right Tree for the Right Place for the Right Purpose in Africa.
T
Vital Signs: Housing Permits - Bay Area
data.bayareametro.gov
csv, xlsx, xml
Updated Mar 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ABAG Housing Permit Database (2022). Vital Signs: Housing Permits - Bay Area [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Housing-Permits-Bay-Area/wbvu-rmp6
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Mar 11, 2022
Dataset authored and provided by
ABAG Housing Permit Database
Area covered
San Francisco Bay Area
Description
VITAL SIGNS INDICATOR Housing Permits (LU3)

FULL MEASURE NAME Permitted housing units

LAST UPDATED October 2019

DESCRIPTION Housing growth is measured in terms of the number of units that local jurisdictions permit throughout a given year. A permitted unit is a unit that a city or county has authorized for construction.

DATA SOURCE Construction Industry Research Board Table 3: Residential Units and Valuation (1967-2010) No link available

California Housing Foundation/Construction Industry Research Board California Construction Trends (2011-2013) http://www.mychf.org/cirb/

Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database (2014-2017) http://opendata.mtc.ca.gov

CONTACT INFORMATION vitalsigns.info@bayareametro.gov

METHODOLOGY NOTES (across all datasets for this indicator) Bay Area housing permits data prior to 2014 comes from the California Housing Foundation/Construction Industry Research Board. Data from 2014 to 2017 comes from the Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database.

Single-family housing units include detached, semi-detached, row house and town house units. Row houses and town houses are included as single-family units when each unit is separated from the adjacent unit by an unbroken ground-to-roof party or fire wall. Condominiums are included as single-family units when they are of zero-lot-line or zero-property-line construction; when units are separated by an air space; or, when units are separated by an unbroken ground-to-roof party or fire wall. Multi-family housing includes duplexes, three-to-four-unit structures and apartment-type structures with five units or more. Multi-family also includes condominium units in structures of more than one living unit that do not meet the single-family housing definition. In the permits data from 2014 to 2017, single-family units include all units not strictly classified as multi-family. This may include secondary units.

Each multi-family unit is counted separately even though they may be in the same building. Total units is the sum of single-family and multi-family units. County data is available from 1967 whereas city data is available from 1990. City data is only available for incorporated cities and towns. All permits in unincorporated cities and towns are included under their respective county’s unincorporated total. Permit data is not available for years when the city or town was not incorporated.

Affordable housing is the total number of permitted units affordable to low and very low income households. Housing affordable to very low income households are households making below 50% of the area median income. Housing affordable to low income households are households making between 50% and 80% of the area median income. Housing affordable to moderate income households are households making below 80% and 120% of the area median income. Housing affordable to above moderate income households are households making above 120% of the area median income.

Permit data is missing for the following cities and years: Clayton, 1990-2007 Lafayette, 1990-2007 Moraga, 1990-2007 Orinda, 1990-2007 San Ramon, 1990

Building permit data for metropolitan areas for each year is the sum of non-seasonally adjusted monthly estimates from the Building Permit Survey. The Bay Area values are the sum of the San Francisco-Oakland-Hayward MSA and the San Jose-Sunnyvale-Santa Clara MSA. The counties included in these areas are: San Francisco, Marin, Contra Costa, Alameda, San Mateo, Santa Clara, and San Benito.

Permit values reflect the number of units permitted in each respective year.
Crime in England and Wales: Police Force Area data tables
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Crime in England and Wales: Police Force Area data tables [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/policeforceareadatatables
Explore at:
xlsxAvailable download formats
Dataset updated
Oct 23, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Police recorded crime figures by Police Force Area and Community Safety Partnership areas (which equate in the majority of instances, to local authorities).

Vietnam Jobs Dataset

kaggle.com

zip

Updated Apr 23, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

nguyen chi tinh (2025). Vietnam Jobs Dataset [Dataset]. https://www.kaggle.com/datasets/nguyenchitinh/vietnam-jobs-dataset/code

Explore at:

zip(3213064 bytes)Available download formats

Dataset updated

Apr 23, 2025

Authors

nguyen chi tinh

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Area covered

Vietnam

Description

Vietnam Jobs Dataset

Overview

The jobs.csv dataset contains 85k job postings from various cities and industries in Vietnam, providing insights into the job market as of April 22, 2025. The data includes details about job titles, locations, salaries, experience requirements, and job fields, making it a valuable resource for analyzing salary trends, regional job distribution, and industry demands.

Dataset Description

File Name: jobs.csv
Format: Comma-Separated Values (CSV)
Size: 215 rows (excluding header)
Source: Internal data collection (specific source not provided)
Date: Reflects job market data as of April 22, 2025

Data Schema

The dataset includes the following columns:

Column Name	Description	Data Type	Example Value
`job_title`	Title of the job posting	String	"Sales Executive"
`job_type`	Type of employment (e.g., full-time, part-time)	String	"Full-time"
`position_level`	Level of the position (e.g., Employee, Manager, Intern)	String	"Nhân viên" (Employee)
`city`	City where the job is located	String	"Hồ Chí Minh"
`experience`	Required years of experience (e.g., "không yêu cầu", "2 - 5 năm")	String	"trên 1 năm"
`skills`	Required skills for the job (comma-separated)	String	"English, Sales, Communication"
`job_fields`	Industry or field of the job (comma-separated)	String	"Sales, Marketing, Retail"
`salary`	General salary description (may be vague or blank)	String	"Thỏa thuận" (Negotiable)
`salary_min`	Minimum salary offered (in VND or USD)	Float	8000000
`salary_max`	Maximum salary offered (in VND or USD)	Float	15000000
`unit`	Currency unit for salary (VND or USD)	String	"VND"

Notes on Data

Salary Values: Salaries are provided in VND (Vietnamese Dong) or USD. For consistency, convert USD to VND using an exchange rate (e.g., 1 USD = 25,000 VND).
Experience: The experience field uses Vietnamese phrases like "không yêu cầu" (no experience required), "trên 1 năm" (over 1 year), or ranges like "2 - 5 năm". Parsing into numerical years is recommended for analysis.
City Names: City names may vary in format (e.g., "Hồ Chí Minh", "HCM", "hà nội"). Standardize to "Ho Chi Minh City" and "Hanoi" for consistency.
Job Fields: The job_fields column contains comma-separated values, allowing a single job to belong to multiple industries (e.g., "Sales, Marketing").
Missing Data: Some fields, particularly salary_min and salary_max, may be missing or zero. Filter out invalid entries for accurate analysis.

Usage

This dataset can be used for:

Market Analysis: Identify high-paying industries (e.g., banking, management) and salary trends by experience or position level.
Regional Insights: Analyze job distribution across cities, with a focus on urban centers like Ho Chi Minh City and Hanoi.
Career Planning: Understand entry-level opportunities, particularly in customer service and telesales, which offer competitive salaries for candidates with no experience.
Policy and Research: Study labor market dynamics, such as the demand for English-speaking talent or regional salary disparities.

Example Questions

Which cities have the highest number of job postings and average salaries?
What are the top-paying job fields, and how do they compare to entry-level roles?
How does experience impact salaries across different industries?
Are there niche roles (e.g., technical or medical) with unusually high salaries?

Data Cleaning Recommendations

To prepare the dataset for analysis, consider the following steps:

Standardize City Names: Convert variations like "HCM" or "hà nội" to "Ho Chi Minh City" and "Hanoi".
Convert Salaries: Multiply USD salaries by 25,000 to convert to VND for consistency.
Parse Experience: Convert text-based experience (e.g., "2 - 5 năm") to numerical values (e.g., 3.5 years). Treat "không yêu cầu" as 0 years.
Handle Missing Salaries: Filter out rows where both salary_min and salary_max are zero or missing, or impute using industry avera...

d
Censuses in Wuerttemberg, 1834 to 1925
da-ra.de
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wolfgang Zimmermann; Gabriele Däumling; Julia Grosse; Melanie Prangen; Andrea Jautz; Regina Koch-Richter; Claudia Hierath; Bodo Heizmann; Florian Lenz (2023). Censuses in Wuerttemberg, 1834 to 1925 [Dataset]. http://doi.org/10.4232/1.14072
Explore at:
Unique identifier
https://doi.org/10.4232/1.14072
Dataset updated
Feb 21, 2023
Dataset provided by
GESIS
da|ra
Authors
Wolfgang Zimmermann; Gabriele Däumling; Julia Grosse; Melanie Prangen; Andrea Jautz; Regina Koch-Richter; Claudia Hierath; Bodo Heizmann; Florian Lenz
Time period covered
1834 - 1925
Area covered
Baden-Württemberg
Description
By joining the German Zollverein (Customs Union) in 1834, the Kingdom of Württemberg committed itself to conduct a census in a fixed three-year rhythm according to uniform criteria and with a recording scheme that was as precise as possible. The data obtained in the process formed the basis for the distribution of the common revenues of the German Customs Union. The Kingdom of Wurttemberg conducted the first census as part of the Zollverein on 15 December 1834. The basis of the censuses was the ´resident´ population, which according to the contemporary definition included all people who were present in the place on the reference date. Residents who were currently absent due to a journey were also taken into account. Men and women who were in transit in the census municipality were not included. Until 1858, the ´local´ population, i.e. the population living permanently in the village, was also counted. The data material of the Zollverein and Reich statistics was collected on the basis of the Oberamtslisten, which have survived in handwritten form (Landesarchiv Baden-Württemberg, Staatsarchiv Ludwigsburg, Bestand E 258 VIII). The data is available at the municipal, Oberamts and district level. The figures reflect the territorial status valid at the time of the census as well as the contemporary administrative division. Four Excel tables are available for each census, in which the data for the municipalities and head offices of a district are summarised. A crossed-out place name indicates that the municipality in question belonged to another Oberamt at the time of the census. Municipalities that were newly assigned to a Oberamt between 1834 and 1925 are usually added at the end of the Oberamt list. Information on the change of office affiliation can be found in the comment field. An asterisk after a place name (name of the city or village) indicates such supplementary information. The comment field opens as soon as the cursor is placed on the field of the place (city or village) concerned. The primary researchers supplemented the data material with historical maps. The maps of the four Württemberg districts are taken from the publication: ´Das Königreich Württemberg´ (The Kingdom of Württemberg), which was published by the State Statistical Office in four volumes between 1904 and 1907.´ Explanation of symbols 0 = Less than half of 1 in the last filled position, but more than nothing- = Nothing present (exactly zero). = Numerical value unknown or to be kept secretx = Table compartment locked because statement does not make sense... = Statement to be made later/ = No statement, as the numerical value is not certain enough() = Statement value limited, as the numerical value may contain errors Discrepancies in the totals can be explained by rounding the numbers. Place names that have been crossed out indicate that the municipality in question belonged to a different Oberamt at the time of the census. * An asterisk after a place name indicates information about the records in the comment field.´ Publication: CD-ROM: »Königreich Württemberg« Volkszählungen 1834 bis 1925. Statistisches Landesamt Baden-Württemberg. Zu bestellen unter: https://www.statistik-bw.de/Service/Veroeff/Statistische_Daten/900208001.bsE-Mail: vertrieb@stala.bwl.de
d
Data from: (Table 2) Contents of rock-forming components in samples from...
search.dataone.org
doi.pangaea.de
Updated Jan 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lein, Alla Yu; Bogdanova, Olga Yu; Bogdanov, Yury A; Magazina, Larissa O (2018). (Table 2) Contents of rock-forming components in samples from carbonate mounds of the Lost City hydrothemal field [Dataset]. http://doi.org/10.1594/PANGAEA.765160
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.765160
Dataset updated
Jan 7, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Lein, Alla Yu; Bogdanova, Olga Yu; Bogdanov, Yury A; Magazina, Larissa O
Area covered

Description
No description is available. Visit https://dataone.org/datasets/a0d9075d9dabafe15910b1e96b2e9a52 for complete metadata about this dataset.
Data from: (Table 5) Chemical composition of aragonite from the Lost City...
doi.pangaea.de
search.dataone.org
html, tsv
Updated 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alla Yu Lein; Olga Yu Bogdanova; Yury A Bogdanov; Larissa O Magazina (2007). (Table 5) Chemical composition of aragonite from the Lost City hydrothermal field according to electron microprobe analysis [Dataset]. http://doi.org/10.1594/PANGAEA.765163
Explore at:
tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.765163
Dataset updated
2007
Dataset provided by
PANGAEA
Authors
Alla Yu Lein; Olga Yu Bogdanova; Yury A Bogdanov; Larissa O Magazina
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Variables measured
Sodium oxide, Calcium oxide, Carbon dioxide, Strontium oxide, Number of observations
Description
This dataset is about: (Table 5) Chemical composition of aragonite from the Lost City hydrothermal field according to electron microprobe analysis. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.765175 for more information.

Facebook

Twitter

Click to copy link

Link copied

Cite

ACS, 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST1Y2021.S0101?q=S0101:+AGE+AND+SEX

2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables)

2021: ACS 1-Year Estimates Subject Tables

Explore at:

Dataset provided by

United States Census Bureauhttp://census.gov/

Authors

ACS

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered

2021

Description

Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

Clear search

Close search

Google apps

Main menu

2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates...

TABLE III. Deaths in 122 U.S. cities

Vital Signs: Housing Permits - by metro area

City of Los Angeles Crime data

Violence Reduction - Victim Demographics - Aggregated

Mortality and potential years of life lost, by selected causes of death and...

City Property Tax Data Appendix A

Strategic Measure_Aggregated Sidewalk Construction Data

Major Cities Weather Data 1995-present

🌍 Major Cities Daily Weather (1995 – present) [CURRENTLY UNDER CONSTRUCTION]

📦 Contents

🌆 City Selection Methodology

☀️ Weather Variables

🧠 Notes

⚖️ License & Attribution

🌆 City Lifestyle Segmentation Dataset

🌆 About This Dataset

🎯 Perfect For:

📦 What's Inside?

🔥 Key Features

🚀 Quick Start Example

🎓 Learning Outcomes

📚 Ideal For These Projects

🌍 Expected Clusters

🛠️ Technical Details

📖 What Makes This Dataset Special?

🏅 Use This Dataset If You Want To:

📊 Acknowledgments

Global Air Quality Data(15 Days Hourly, 50 Cities)

Mortality and potential years of life lost, by selected causes of death and...

2020 American Community Survey: S0804 | MEANS OF TRANSPORTATION TO WORK BY...

CitiesGOER: Globally Observed Environmental Data for 52,602 Cities with a...

Vital Signs: Housing Permits - Bay Area

Crime in England and Wales: Police Force Area data tables

Vietnam Jobs Dataset

Vietnam Jobs Dataset

Overview

Dataset Description

Data Schema

Notes on Data

Usage

Example Questions

Data Cleaning Recommendations

Censuses in Wuerttemberg, 1834 to 1925

Data from: (Table 2) Contents of rock-forming components in samples from...

Data from: (Table 5) Chemical composition of aragonite from the Lost City...

2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables)

2021: ACS 1-Year Estimates Subject Tables