31 datasets found
  1. 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates...

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST1Y2021.S0101?q=S0101:+AGE+AND+SEX
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2021
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

  2. TABLE III. Deaths in 122 U.S. cities

    • catalog.data.gov
    • healthdata.gov
    • +6more
    Updated Jul 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). TABLE III. Deaths in 122 U.S. cities [Dataset]. https://catalog.data.gov/dataset/table-iii-deaths-in-122-u-s-cities
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Area covered
    United States
    Description

    TABLE III. Deaths in 122 U.S. cities – 2016. 122 Cities Mortality Reporting System — Each week, the vital statistics offices of 122 cities across the United States report the total number of death certificates processed and the number of those for which pneumonia or influenza was listed as the underlying or contributing cause of death by age group (Under 28 days, 28 days –1 year, 1-14 years, 15-24 years, 25-44 years, 45-64 years, 65-74 years, 75-84 years, and ≄ 85 years). FOOTNOTE: U: Unavailable. —: No reported cases. * Mortality data in this table are voluntarily reported from 122 cities in the United States, most of which have populations of 100,000 or more. A death is reported by the place of its occurrence and by the week that the death certificate was filed. Fetal deaths are not included. † Pneumonia and influenza. § Total includes unknown ages.

  3. T

    Vital Signs: Housing Permits - by metro area

    • data.bayareametro.gov
    csv, xlsx, xml
    Updated Oct 31, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Vital Signs: Housing Permits - by metro area [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Housing-Permits-by-metro-area/9muq-ubre
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Oct 31, 2019
    Description

    VITAL SIGNS INDICATOR Housing Permits (LU3)

    FULL MEASURE NAME Permitted housing units

    LAST UPDATED October 2019

    DESCRIPTION Housing growth is measured in terms of the number of units that local jurisdictions permit throughout a given year. A permitted unit is a unit that a city or county has authorized for construction.

    DATA SOURCE Construction Industry Research Board Table 3: Residential Units and Valuation (1967-2010) No link available

    California Housing Foundation/Construction Industry Research Board California Construction Trends (2011-2013) http://www.mychf.org/cirb/

    Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database (2014-2017) http://opendata.mtc.ca.gov

    CONTACT INFORMATION vitalsigns.info@bayareametro.gov

    METHODOLOGY NOTES (across all datasets for this indicator) Bay Area housing permits data prior to 2014 comes from the California Housing Foundation/Construction Industry Research Board. Data from 2014 to 2017 comes from the Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database.

    Single-family housing units include detached, semi-detached, row house and town house units. Row houses and town houses are included as single-family units when each unit is separated from the adjacent unit by an unbroken ground-to-roof party or fire wall. Condominiums are included as single-family units when they are of zero-lot-line or zero-property-line construction; when units are separated by an air space; or, when units are separated by an unbroken ground-to-roof party or fire wall. Multi-family housing includes duplexes, three-to-four-unit structures and apartment-type structures with five units or more. Multi-family also includes condominium units in structures of more than one living unit that do not meet the single-family housing definition. In the permits data from 2014 to 2017, single-family units include all units not strictly classified as multi-family. This may include secondary units.

    Each multi-family unit is counted separately even though they may be in the same building. Total units is the sum of single-family and multi-family units. County data is available from 1967 whereas city data is available from 1990. City data is only available for incorporated cities and towns. All permits in unincorporated cities and towns are included under their respective county’s unincorporated total. Permit data is not available for years when the city or town was not incorporated.

    Affordable housing is the total number of permitted units affordable to low and very low income households. Housing affordable to very low income households are households making below 50% of the area median income. Housing affordable to low income households are households making between 50% and 80% of the area median income. Housing affordable to moderate income households are households making below 80% and 120% of the area median income. Housing affordable to above moderate income households are households making above 120% of the area median income.

    Permit data is missing for the following cities and years: Clayton, 1990-2007 Lafayette, 1990-2007 Moraga, 1990-2007 Orinda, 1990-2007 San Ramon, 1990

    Building permit data for metropolitan areas for each year is the sum of non-seasonally adjusted monthly estimates from the Building Permit Survey. The Bay Area values are the sum of the San Francisco-Oakland-Hayward MSA and the San Jose-Sunnyvale-Santa Clara MSA. The counties included in these areas are: San Francisco, Marin, Contra Costa, Alameda, San Mateo, Santa Clara, and San Benito.

    Permit values reflect the number of units permitted in each respective year.

  4. City of Los Angeles Crime data

    • kaggle.com
    zip
    Updated Apr 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramin Huseyn (2024). City of Los Angeles Crime data [Dataset]. https://www.kaggle.com/datasets/raminhuseyn/crime-data-from-2020-to-present
    Explore at:
    zip(48433749 bytes)Available download formats
    Dataset updated
    Apr 29, 2024
    Authors
    Ramin Huseyn
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Area covered
    Los Angeles
    Description

    This dataset reflects incidents of crime in the City of Los Angeles dating back to 2020. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. The dataset contains 2,083,227 rows and 29 columns.

    Column nameDescription
    DR_NODivision of Records Number: Official file number made up of a 2 digit year, area ID, and 5 digits
    Date RptdMM/DD/YYYY
    DATE OCCMM/DD/YYYY
    TIME OCCIn 24 hour military time.
    AREAThe LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21.
    AREA NAMEThe 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for.
    Crm CdIndicates the crime committed. (Same as Crime Code 1)
    Crm Cd DescDefines the Crime Code provided.
    MocodesModus Operandi: Activities associated with the suspect in commission of the crime
    Vict AgeVictim age
    Vict SexF - Female M - Male X - Unknown
    Vict DescentDescent Code: A - Other Asian B - Black C - Chinese D - Cambodian F - Filipino G - Guamanian H - Hispanic/Latin/Mexican I - American Indian/Alaskan Native J - Japanese K - Korean L - Laotian O - Other P - Pacific Islander S - Samoan U - Hawaiian V - Vietnamese W - White X - Unknown Z - Asian Indian
    Premis CdThe type of structure, vehicle, or location where the crime took place.
    Premis DescDefines the Premise Code provided
    Weapon Used CdThe type of weapon used in the crime.
    Weapon DescDefines the Weapon Used Code provided.
    StatusStatus of the case. (IC is the default)
    Status DescDefines the Status Code provided.
    Crm Cd 1Indicates the crime committed. Crime Code 1 is the primary and most serious one. Crime Code 2, 3, and 4 are respectively less serious offenses. Lower crime class numbers are more serious.
    Crm Cd 2May contain a code for an additional crime, less serious than Crime Code 1.
    Crm Cd 3May contain a code for an additional crime, less serious than Crime Code 1
    Crm Cd 4May contain a code for an additional crime, less serious than Crime Code 1.
    LOCATIONStreet address of crime incident rounded to the nearest hundred block to maintain anonymity.
    Cross StreetCross Street of rounded Address.
    LATLatitude
    LONLongtitude
  5. C

    Violence Reduction - Victim Demographics - Aggregated

    • data.cityofchicago.org
    • s.cnmilf.com
    • +1more
    csv, xlsx, xml
    Updated Dec 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Violence Reduction - Victim Demographics - Aggregated [Dataset]. https://data.cityofchicago.org/Public-Safety/Violence-Reduction-Victim-Demographics-Aggregated/gj7a-742p
    Explore at:
    xml, xlsx, csvAvailable download formats
    Dataset updated
    Dec 2, 2025
    Dataset authored and provided by
    City of Chicago
    Description

    This dataset contains aggregate data on violent index victimizations at the quarter level of each year (i.e., January – March, April – June, July – September, October – December), from 2001 to the present (1991 to present for Homicides), with a focus on those related to gun violence. Index crimes are 10 crime types selected by the FBI (codes 1-4) for special focus due to their seriousness and frequency. This dataset includes only those index crimes that involve bodily harm or the threat of bodily harm and are reported to the Chicago Police Department (CPD). Each row is aggregated up to victimization type, age group, sex, race, and whether the victimization was domestic-related. Aggregating at the quarter level provides large enough blocks of incidents to protect anonymity while allowing the end user to observe inter-year and intra-year variation. Any row where there were fewer than three incidents during a given quarter has been deleted to help prevent re-identification of victims. For example, if there were three domestic criminal sexual assaults during January to March 2020, all victims associated with those incidents have been removed from this dataset. Human trafficking victimizations have been aggregated separately due to the extremely small number of victimizations.

    This dataset includes a " GUNSHOT_INJURY_I " column to indicate whether the victimization involved a shooting, showing either Yes ("Y"), No ("N"), or Unknown ("UKNOWN.") For homicides, injury descriptions are available dating back to 1991, so the "shooting" column will read either "Y" or "N" to indicate whether the homicide was a fatal shooting or not. For non-fatal shootings, data is only available as of 2010. As a result, for any non-fatal shootings that occurred from 2010 to the present, the shooting column will read as “Y.” Non-fatal shooting victims will not be included in this dataset prior to 2010; they will be included in the authorized dataset, but with "UNKNOWN" in the shooting column.

    The dataset is refreshed daily, but excludes the most recent complete day to allow CPD time to gather the best available information. Each time the dataset is refreshed, records can change as CPD learns more about each victimization, especially those victimizations that are most recent. The data on the Mayor's Office Violence Reduction Dashboard is updated daily with an approximately 48-hour lag. As cases are passed from the initial reporting officer to the investigating detectives, some recorded data about incidents and victimizations may change once additional information arises. Regularly updated datasets on the City's public portal may change to reflect new or corrected information.

    How does this dataset classify victims?

    The methodology by which this dataset classifies victims of violent crime differs by victimization type:

    Homicide and non-fatal shooting victims: A victimization is considered a homicide victimization or non-fatal shooting victimization depending on its presence in CPD's homicide victims data table or its shooting victims data table. A victimization is considered a homicide only if it is present in CPD's homicide data table, while a victimization is considered a non-fatal shooting only if it is present in CPD's shooting data tables and absent from CPD's homicide data table.

    To determine the IUCR code of homicide and non-fatal shooting victimizations, we defer to the incident IUCR code available in CPD's Crimes, 2001-present dataset (available on the City's open data portal). If the IUCR code in CPD's Crimes dataset is inconsistent with the homicide/non-fatal shooting categorization, we defer to CPD's Victims dataset.

    For a criminal homicide, the only sensible IUCR codes are 0110 (first-degree murder) or 0130 (second-degree murder). For a non-fatal shooting, a sensible IUCR code must signify a criminal sexual assault, a robbery, or, most commonly, an aggravated battery. In rare instances, the IUCR code in CPD's Crimes and Victims dataset do not align with the homicide/non-fatal shooting categorization:

    1. In instances where a homicide victimization does not correspond to an IUCR code 0110 or 0130, we set the IUCR code to "01XX" to indicate that the victimization was a homicide but we do not know whether it was a first-degree murder (IUCR code = 0110) or a second-degree murder (IUCR code = 0130).
    2. When a non-fatal shooting victimization does not correspond to an IUCR code that signifies a criminal sexual assault, robbery, or aggravated battery, we enter “UNK” in the IUCR column, “YES” in the GUNSHOT_I column, and “NON-FATAL” in the PRIMARY column to indicate that the victim was non-fatally shot, but the precise IUCR code is unknown.

    Other violent crime victims: For other violent crime types, we refer to the IUCR classification that exists in CPD's victim table, with only one exception:

    1. When there is an incident that is associated with no victim with a matching IUCR code, we assume that this is an error. Every crime should have at least 1 victim with a matching IUCR code. In these cases, we change the IUCR code to reflect the incident IUCR code because CPD's incident table is considered to be more reliable than the victim table.

    Note: All businesses identified as victims in CPD data have been removed from this dataset.

    Note: The definition of “homicide” (shooting or otherwise) does not include justifiable homicide or involuntary manslaughter. This dataset also excludes any cases that CPD considers to be “unfounded” or “noncriminal.”

    Note: In some instances, the police department's raw incident-level data and victim-level data that were inputs into this dataset do not align on the type of crime that occurred. In those instances, this dataset attempts to correct mismatches between incident and victim specific crime types. When it is not possible to determine which victims are associated with the most recent crime determination, the dataset will show empty cells in the respective demographic fields (age, sex, race, etc.).

    Note: The initial reporting officer usually asks victims to report demographic data. If victims are unable to recall, the reporting officer will use their best judgment. “Unknown” can be reported if it is truly unknown.

  6. Mortality and potential years of life lost, by selected causes of death and...

    • www150.statcan.gc.ca
    • open.canada.ca
    • +1more
    Updated May 24, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2016). Mortality and potential years of life lost, by selected causes of death and sex, three-year average, Canada, provinces, territories, health regions and peer groups occasional (number) [Dataset]. http://doi.org/10.25318/1310074201-eng
    Explore at:
    Dataset updated
    May 24, 2016
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    This table contains 135864 series, with data for years 2000/2002 - 2010/2012 (not all combinations necessarily have data for all years), and was last released on 2016-05-24. This table contains data described by the following dimensions (Not all combinations are available): Geography (148 items: Canada; Newfoundland and Labrador; Eastern Regional Integrated Health Authority, Newfoundland and Labrador; Central Regional Integrated Health Authority, Newfoundland and Labrador; ...), Sex (3 items: Both sexes; Males; Females), Indicators (2 items: Mortality; Potential years of life lost), Selected causes of death (ICD-10) (17 items: Total, all causes of death; All malignant neoplasms (cancers); Colorectal cancer; Lung cancer; ...), Characteristics (9 items: Number; Low 95% confidence interval, number; High 95% confidence interval, number; Rate; ...).

  7. O

    City Property Tax Data Appendix A

    • data.orcities.org
    • splitgraph.com
    csv, xlsx, xml
    Updated Apr 19, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). City Property Tax Data Appendix A [Dataset]. https://data.orcities.org/w/gqi8-s84n/default?cur=HzxCeVxIMMr&from=CsljHS0hqWg
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Apr 19, 2016
    Description

    This table contains data from the 2016 City Property Tax Report--Appendix A. Data is from the Department of Revenue Property Tax Statistics Supplemental Report. An empty cell means missing information. 13 cities do not have a permanent property tax rate. *Denotes rates for urban renewal in "Other" category. ** Portland is the only city with GAP bond--$2.6671/thousand which is not in the table, but included in the "Total City Rate" column. ***In some instances cities were contacted to verify data.

  8. d

    Strategic Measure_Aggregated Sidewalk Construction Data

    • catalog.data.gov
    Updated Apr 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2022). Strategic Measure_Aggregated Sidewalk Construction Data [Dataset]. https://catalog.data.gov/no/dataset/strategic-measure-aggregated-sidewalk-construction-data
    Explore at:
    Dataset updated
    Apr 28, 2022
    Dataset provided by
    data.austintexas.gov
    Description

    This dataset shows new sidewalk added to the City of Austin's network by calendar year. Data in this table is limited to the full purpose jurisdiction of the City of Austin as of publication date. Sidewalk construction comes from many sources, including but not limited to the City of Austin, counties, state agencies, and private developers. Existing data does not support separating out city and non-city construction. This dataset supports the SD23 performance measure M.C.6a: Percent of missing sidewalks completed. Detailed sidewalk segment data is available in the dataset Strategic Measure_Sidewalk Segment Data. View more details and insights related to this data set on the story page: https://data.austintexas.gov/stories/s/Percentage-of-Missing-Sidewalk-Network-Completed/ffkw-wkiv/

  9. Major Cities Weather Data 1995-present

    • kaggle.com
    zip
    Updated Nov 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wafaa EL HUSSEINI (2025). Major Cities Weather Data 1995-present [Dataset]. https://www.kaggle.com/datasets/wafaaelhusseini/major-cities-weather-data
    Explore at:
    zip(5099166 bytes)Available download formats
    Dataset updated
    Nov 10, 2025
    Authors
    Wafaa EL HUSSEINI
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🌍 Major Cities Daily Weather (1995 – present) [CURRENTLY UNDER CONSTRUCTION]

    This dataset provides daily historical weather data for major cities around the world — including all national capitals and large population centers — from 1995 to 2024
    Data is sourced from the Open-Meteo Historical Weather API and Wikidata, processed and harmonized for easy analysis and visualization.

    📩 Contents

    FileDescription
    cities_clean.parquetMetadata of all selected cities (country, coordinates, population, capital flag).
    history.parquetFull daily dataset (one row per city × day).
    history_latest.csvSnapshot of the most recent day available.

    🌆 City Selection Methodology

    “Major cities” are defined as:

    • All national capitals, plus
    • Cities with population ≄ 300 000, and
    • The top 10 most populated cities for each country (based on Wikidata).

    Coordinates (lat, lon) and country ISO codes come from Wikidata’s structured data.
    Population values are used only for ranking and filtering.

    ☀ Weather Variables

    Daily values from Open-Meteo’s ERA5-based reanalysis:

    VariableUnitDescription
    temp_max_c, temp_min_c°CMaximum / minimum 2 m air temperature
    temp_mean_c_approx°CApproximate daily mean ((max+min)/2)
    app_temp_max_c, app_temp_min_c°CApparent (feels-like) temperature
    precip_mm, rain_mm, snow_mmmmTotal precipitation, rain, snowfall
    windspeed_10m_max_kmh, windgusts_10m_max_kmhkm/hMaximum daily windspeed / gusts
    wind_dir_dom_deg°Dominant wind direction
    sunshine_duration_s, daylight_duration_ssTotal sunshine / daylight duration
    shortwave_radiation_MJ_m2MJ/mÂČDaily sum of incoming shortwave radiation

    All timestamps are daily aggregates in UTC.

    🧠 Notes

    • The dataset merges 29 years of global reanalysis data (1995 – 2024).
    • Missing or obviously invalid values are left as null.
    • Each record is uniquely identified by (date, country, city).
    • Weather data are physically modelled, not observed station data.

    ⚖ License & Attribution

    Data is available under CC BY 4.0.

  10. 🌆 City Lifestyle Segmentation Dataset

    • kaggle.com
    zip
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UmutUygurr (2025). 🌆 City Lifestyle Segmentation Dataset [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/city-lifestyle-segmentation-dataset
    Explore at:
    zip(11274 bytes)Available download formats
    Dataset updated
    Nov 15, 2025
    Authors
    UmutUygurr
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">

    🌆 About This Dataset

    This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.

    🎯 Perfect For:

    • 📊 K-Means, DBSCAN, Agglomerative Clustering
    • 🔬 PCA & t-SNE Dimensionality Reduction
    • đŸ—ș Geospatial Visualization (Plotly, Folium)
    • 📈 Correlation Analysis & Feature Engineering
    • 🎓 Educational Projects (Beginner to Intermediate)

    📩 What's Inside?

    FeatureDescriptionRange
    10 FeaturesEconomic, environmental & social indicatorsRealistically scaled
    300 CitiesEurope, Asia, Americas, Africa, OceaniaDiverse distributions
    Strong CorrelationsIncome ↔ Rent (+0.8), Density ↔ Pollution (+0.6)ML-ready
    No Missing ValuesClean, preprocessed dataReady for analysis
    4-5 Natural ClustersMetropolitan hubs, eco-towns, developing centersPre-validated

    đŸ”„ Key Features

    ✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
    ✅ Regional Diversity: Each region has distinct economic and environmental characteristics
    ✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
    ✅ Beginner-Friendly: No data cleaning required, includes example code
    ✅ Documented: Comprehensive README with methodology and use cases

    🚀 Quick Start Example

    import pandas as pd
    from sklearn.cluster import KMeans
    from sklearn.preprocessing import StandardScaler
    
    # Load and prepare
    df = pd.read_csv('city_lifestyle_dataset.csv')
    X = df.drop(['city_name', 'country'], axis=1)
    X_scaled = StandardScaler().fit_transform(X)
    
    # Cluster
    kmeans = KMeans(n_clusters=5, random_state=42)
    df['cluster'] = kmeans.fit_predict(X_scaled)
    
    # Analyze
    print(df.groupby('cluster').mean())
    

    🎓 Learning Outcomes

    After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics

    📚 Ideal For These Projects

    • 🏆 Kaggle Competitions: Practice clustering techniques
    • 📝 Academic Projects: Urban planning, sociology, environmental science
    • đŸ’Œ Portfolio Work: Showcase ML skills to employers
    • 🎓 Learning: Hands-on practice with unsupervised learning
    • 🔬 Research: Urban lifestyle segmentation studies

    🌍 Expected Clusters

    ClusterCharacteristicsExample Cities
    Metropolitan Tech HubsHigh income, density, rentSilicon Valley, Singapore
    Eco-Friendly TownsLow density, clean air, high happinessNordic cities
    Developing CentersMid income, high density, poor airEmerging markets
    Low-Income SuburbanLow infrastructure, incomeRural areas
    Industrial Mega-CitiesVery high density, pollutionManufacturing hubs

    đŸ› ïž Technical Details

    • Format: CSV (UTF-8)
    • Size: ~300 rows × 10 columns
    • Missing Values: 0%
    • Data Types: 2 categorical, 8 numerical
    • Target Variable: None (unsupervised)
    • Correlation Strength: Pre-validated (r: 0.4 to 0.8)

    📖 What Makes This Dataset Special?

    Unlike random synthetic data, this dataset was carefully engineered with: - ✹ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code

    🏅 Use This Dataset If You Want To:

    ✓ Learn clustering without data cleaning hassles
    ✓ Practice PCA and dimensionality reduction
    ✓ Create beautiful geographic visualizations
    ✓ Understand feature correlation in real-world contexts
    ✓ Build a portfolio project with clear business insights

    📊 Acknowledgments

    This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.

    Happy Clustering! 🎉

  11. Global Air Quality Data(15 Days Hourly, 50 Cities)

    • kaggle.com
    zip
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smeet Raichura (2025). Global Air Quality Data(15 Days Hourly, 50 Cities) [Dataset]. https://www.kaggle.com/datasets/smeet888/global-air-quality-data15-days-hourly-50-cities
    Explore at:
    zip(598546 bytes)Available download formats
    Dataset updated
    Nov 19, 2025
    Authors
    Smeet Raichura
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📘 Overview

    This dataset provides hourly air-quality measurements for 50 major global cities over a continuous 15-day period, including pollutant concentrations, meteorological conditions, geographical metadata, and an engineered AQI index.

    All values are synthetically generated using historically consistent pollutant patterns and statistical ranges, allowing researchers and ML practitioners to work with realistic air-quality trends without licensing restrictions or data-collection barriers.

    This dataset is ideal for time-series modeling, forecasting, environmental analytics, and machine-learning experimentation.

    🧭 Cities Included

    Covers all major regions:

    North America — New York, Los Angeles, Toronto

    Europe — London, Paris, Berlin, Zurich

    Asia — Delhi, Tokyo, Seoul, Beijing, Singapore

    Middle East — Dubai, Riyadh, Doha

    Africa — Lagos, Cairo, Nairobi

    Oceania — Sydney, Melbourne, Auckland

    South America — São Paulo, Buenos Aires

    đŸ§± Dataset Structure

    Each hourly record includes:

    Air Pollutants

    PM2.5 (”g/m³)

    PM10 (”g/m³)

    NO₂ (ppb)

    SO₂ (ppb)

    O₃ (ppb)

    CO (ppm)

    Weather Features

    Temperature (°C)

    Humidity (%)

    Wind Speed (m/s)

    Location Metadata

    City

    Country

    Latitude

    Longitude

    Other

    Timestamp (ISO-8601)

    AQI (Computed index)

    đŸ§č Data Quality & Formatting

    No missing values — 100% complete

    Numeric values rounded to 3 decimals

    Clean column names (snake_case)

    Consistent hourly frequency

    Fully ML-ready

    📊 Example Use Cases

    ✔ AQI forecasting (LSTM, GRU, Transformers) ✔ Multivariate time-series modeling ✔ Clustering cities by pollution patterns ✔ Environmental trend visualization ✔ Weather–pollution correlation studies ✔ Anomaly detection (peak pollution events)

    ColumnDescriptionUnitType
    timestampHourly timestamp (UTC)—datetime
    cityCity name—string
    countryCountry name—string
    latitudeCity latitude°float
    longitudeCity longitude°float
    pm25Fine particulate matter”g/m³float
    pm10Coarse particulate matter”g/m³float
    no2Nitrogen dioxideppbfloat
    so2Sulfur dioxideppbfloat
    o3Ozoneppbfloat
    coCarbon monoxideppmfloat
    temperatureAmbient temperature°Cfloat
    humidityRelative humidity%float
    wind_speedWind speedm/sfloat
    aqiDerived Air Quality Index—int

    đŸ§Ș Data Generation Method (Provenance)

    This dataset is synthetically generated using realistic pollutant behavior patterns based on historical studies and open-source environmental datasets.

    Modeling steps included:

    City-specific pollutant baseline ranges

    Randomized variation using Gaussian noise

    Temporal patterns using sinusoidal diurnal cycles (morning & evening peaks)

    Weather-pollution correlation rules (e.g., low wind → higher PM)

    AQI computed using standard US-EPA breakpoints

    All numeric values standardized to 3-decimal precision

    This ensures that although synthetic, the dataset follows realistic environmental dynamics.

    📁 File Information

    global_air_quality_50_cities.csv

    Rows: 18,000+

    Columns: 16

    Format: UTF-8 CSV

  12. Mortality and potential years of life lost, by selected causes of death and...

    • www150.statcan.gc.ca
    • data.urbandatacentre.ca
    • +2more
    Updated Mar 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2016). Mortality and potential years of life lost, by selected causes of death and sex, three-year average, census metropolitan areas occasional (number) [Dataset]. http://doi.org/10.25318/1310074101-eng
    Explore at:
    Dataset updated
    Mar 16, 2016
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    This table contains 33048 series, with data for years 2000/2002 - 2010/2012 (not all combinations necessarily have data for all years), and was last released on 2016-03-16. This table contains data described by the following dimensions (Not all combinations are available): Geography (36 items: Total, census metropolitan areas; St. John's, Newfoundland and Labrador; Halifax, Nova Scotia;Moncton, New Brunswick; ...), Sex (3 items: Both sexes; Males; Females), Indicators (2 items: Mortality; Potential years of life lost), Selected causes of death (ICD-10) (17 items: Total, all causes of death; All malignant neoplasms (cancers); Colorectal cancer; Lung cancer; ...), Characteristics (9 items: Number; Low 95% confidence interval, number; High 95% confidence interval, number; Rate; ...).

  13. 2020 American Community Survey: S0804 | MEANS OF TRANSPORTATION TO WORK BY...

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2020 American Community Survey: S0804 | MEANS OF TRANSPORTATION TO WORK BY SELECTED CHARACTERISTICS FOR WORKPLACE GEOGRAPHY (ACS 5-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST5Y2020.S0804?q=Cimarron+city,+Kansas+Employment&y=2020
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2020
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Foreign born excludes people born outside the United States to a parent who is a U.S. citizen..Tables for Workplace Geography are only available for States; Counties; Places; County Subdivisions in selected states (CT, ME, MA, MI, MN, NH, NJ, NY, PA, RI, VT, WI); Combined Statistical Areas; Metropolitan and Micropolitan Statistical Areas, and their associated Metropolitan Divisions and Principal Cities; Combined New England City and Town Areas; New England City and Town Areas, and their associated Divisions and Principal Cities. Tables B08601, B08602, B08603, and B08604 are also available for Place parts and County Subdivision parts for the 5-year ACS datasets..Workers include members of the Armed Forces and civilians who were at work last week..Industry titles and their 4-digit codes are based on the North American Industry Classification System (NAICS). The Census industry codes for 2018 and later years are based on the 2017 revision of the NAICS. To allow for the creation of multiyear tables, industry data in the multiyear files (prior to data year 2018) were recoded to the 2017 Census industry codes. We recommend using caution when comparing data coded using 2017 Census industry codes with data coded using Census industry codes prior to data year 2018. For more information on the Census industry code changes, please visit our website at https://www.census.gov/topics/employment/industry-occupation/guidance/code-lists.html..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..2019 ACS data products include updates to several categories of the existing means of transportation question. For more information, see: Change to Means of Transportation..Occupation titles and their 4-digit codes are based on the Standard Occupational Classification (SOC). The Census occupation codes for 2018 and later years are based on the 2018 revision of the SOC. To allow for the creation of the multiyear tables, occupation data in the multiyear files (prior to data year 2018) were recoded to the 2018 Census occupation codes. We recommend using caution when comparing data coded using 2018 Census occupation codes with data coded using Census occupation codes prior to data year 2018. For more information on the Census occupation code changes, please visit our website at https://www.census.gov/topics/employment /industry-occupation/guidance/code-lists.html..In 2019, methodological changes were made to the class of worker question. These changes involved modifications to the question wording, the category wording, and the visual format of the categories on the questionnaire. The format for the class of worker categories are now listed under the headings "Private Sector Employee," "Government Employee," and "Self-Employed or Other." Additionally, the category of Active Duty was added as one of the response categories under the "Government Employee" section for the mail questionnaire. For more detailed info...

  14. Z

    CitiesGOER: Globally Observed Environmental Data for 52,602 Cities with a...

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kindt, Roeland (2025). CitiesGOER: Globally Observed Environmental Data for 52,602 Cities with a Population ≄ 5000 [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8175429
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    CIFOR-ICRAF
    Authors
    Kindt, Roeland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CitiesGOER is a database that provides environmental data for 52,602 cities and 48 environmental variables, including 38 bioclimatic variables, 8 soil variables and 2 topographic variables. Data were extracted from the same 30 arc-seconds global grid layers that were prepared when making the TreeGOER (Tree Globally Observed Environmental Ranges) database that is available from https://doi.org/10.5281/zenodo.7922927. Details on the preparations of these layers are provided by Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology 29: 6303–6318. https://onlinelibrary.wiley.com/doi/10.1111/gcb.16914. CitiesGOER was designed to be used together with TreeGOER and possibly also with the GlobalUsefulNativeTrees database (Kindt et al. 2023) to allow users to filter suitable tree species based on environmental conditions of the planting site.

    The identities and coordinates of cities were sourced from a data set with information for cities with a population size larger than 1000 that was created by Opendatasoft and made available from https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/table/?disjunctive.cou_name_en&sort=name. The data was downloaded on 22-JULY-2023 and afterwards filtered for cities with a population of 5000 or above. Cities where information on the country was missing were removed. The coordinates of cities were used to extract the environmental data via the terra package (Hijmans et al. 2022, version 1.6-47) in the R 4.2.1 environment.

    Version 2023.08 provided median values from 23 Global Climate Models (GCMs) for Shared Socio-Economic Pathway (SSP) 1-2.6 and from 18 GCMs for SSP 3-7.0, both for the 2050s (2041-2060). Similar methods were used to calculate these median values as in the case studies for the TreeGOER manuscript (calculations were partially done via the BiodiversityR::ensemble.envirem.run function and with downscaled bioclimatic and monthly climate 2.5 arc-minutes future grid layers available from WorldClim 2.1).

    Version 2023.09 used similar methods as for previous versions to provide median values from 13 GCMs for the 2090s (2081-2100) for SSP 5-8.5.

    The locations of the 52,602 cities are mapped in one of the series available from the TreeGOER Global Zones atlas that can be obtained from https://doi.org/10.5281/zenodo.8252756.

    Version 2024.10 includes a new data set that documents the location of the city locations in Holdridge Life Zones. Information is given for historical (1901-1920), contemporary (1979-2013) and future (2061-2080; separately for RCP 4.5 and RCP 8.5) climates inferred from global raster layers that are available for download from DRYAD and were created for the following article: Elsen et al. 2022. Accelerated shifts in terrestrial life zones under rapid climate change. Global Change Biology, 28, 918–935. https://doi.org/10.1111/gcb.15962. Version 2024.10 further includes Holdridge Life Zones for the climates that were available from the previous versions, calculating biotemperatures and life zones with similar methods as used by Holdridge (1947; 1967) and Elsen et al. (2022) (for future climates, median values were determined first for monthly maximum and minimum temperatures across GCMs ). The distributions of the 48,129 species documented in TreeGOER across the Holdridge Life Zones are given in this Zenodo archive: https://zenodo.org/records/14020914.

    Version 2024.11 includes a new data set that documents the location of the city locations in Köppen-Geiger climate zones. Information is given for historical (1901-1930, 1931-1960, 1961-1990) and future (2041-2070 and 2071-2099) climates, with for the future climates seven scenarios each (SSP 1-1.9, SSP 1-2.6, SSP 2-4.5, SSP 3-7.0, SSP 4-3.4, SSP 4-6.0 and SSP 5-8.5). This data set was created from 30 arc-second raster layers available via: Beck, H.E., McVicar, T.R., Vergopolan, N. et al. High-resolution (1 km) Köppen-Geiger maps for 1901–2099 based on constrained CMIP6 projections. Sci Data 10, 724 (2023). https://doi.org/10.1038/s41597-023-02549-6

    Version 2025.03 includes extra columns for the baseline, 2050s and 2090s datasets that partially correspond to climate zones used in the GlobalUsefulNativeTrees database. One of these zones are the Whittaker biome types, available as a polygon from the plotbiomes package (see also here). Whittaker biome types were extracted with similar R scripts as described by Kindt 2025 (these were also used to calculate environmental ranges of TreeGOER species, as archived here).

    Version 2025.03 further includes information for the baseline climate on the steady state water table depth, obtained from a 30 arc-seconds raster layer calculated by the GLOBGM v1.0 model (Verkaik et al. 2024). Also included was the elevation, obtained from the same WorldClim 2.1 raster layer used to prepare TreeGOER.

    As an alternative to CitiesGOER, the ClimateForecasts database (https://zenodo.org/records/10776414) documents the environmental conditions at the locations of 15,504 weather stations. ClimateForecasts was integrated in the GlobalUsefulNativeTrees database (see Kindt et al. 2023).

    When using CitiesGOER in your work, cite this depository and the following:

    Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1‐km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12), 4302–4315. https://doi.org/10.1002/joc.5086

    Title, P. O., & Bemmels, J. B. (2018). ENVIREM: An expanded set of bioclimatic and topographic variables increases flexibility and improves performance of ecological niche modeling. Ecography, 41(2), 291–307. https://doi.org/10.1111/ecog.02880

    Poggio, L., de Sousa, L. M., Batjes, N. H., Heuvelink, G. B. M., Kempen, B., Ribeiro, E., & Rossiter, D. (2021). SoilGrids 2.0: Producing soil information for the globe with quantified spatial uncertainty. SOIL, 7(1), 217–240. https://doi.org/10.5194/soil-7-217-2021

    Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology 29: 6303–6318. https://onlinelibrary.wiley.com/doi/10.1111/gcb.16914.

    Opendatasoft (2023) Geonames - All Cities with a population > 1000. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/information/?disjunctive.cou_name_en&sort=name (accessed 22-JULY-2023)

    When using information from the Holdridge Life Zones, also cite:

    Elsen, P. R., Saxon, E. C., Simmons, B. A., Ward, M., Williams, B. A., Grantham, H. S., Kark, S., Levin, N., Perez-Hammerle, K.-V., Reside, A. E., & Watson, J. E. M. (2022). Accelerated shifts in terrestrial life zones under rapid climate change. Global Change Biology, 28, 918–935. https://doi.org/10.1111/gcb.15962

    When using information from Köppen-Geiger climate zones, also cite:

    Beck, H.E., McVicar, T.R., Vergopolan, N., Berg, A., Lutsko, N.J., Dufour, A., Zeng, Z., Jiang, X., van Dijk, A.I. and Miralles, D.G. 2023. High-resolution (1 km) Köppen-Geiger maps for 1901–2099 based on constrained CMIP6 projections. Sci Data 10, 724. https://doi.org/10.1038/s41597-023-02549-6

    When using information on the Whittaker biome types, also cite:

    Ricklefs, R. E., Relyea, R. (2018). Ecology: The Economy of Nature. United States: W.H. Freeman.

    Whittaker, R. H. (1970). Communities and ecosystems.

    Valentin Ștefan, & Sam Levin. (2018). plotbiomes: R package for plotting Whittaker biomes with ggplot2 (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.7145245

    When using information on the steady state water table depth, also cite:

    Verkaik, J., Sutanudjaja, E. H., Oude Essink, G. H., Lin, H. X., & Bierkens, M. F. (2024). GLOBGM v1. 0: a parallel implementation of a 30 arcsec PCR-GLOBWB-MODFLOW global-scale groundwater model. Geoscientific Model Development, 17(1), 275-300. https://gmd.copernicus.org/articles/17/275/2024/

    The development of CitiesGOER was supported by the Darwin Initiative to project DAREX001 of Developing a Global Biodiversity Standard certification for tree-planting and restoration, by Norway’s International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia to the Provision of Adequate Tree Seed Portfolio project in Ethiopia, and by the Green Climate Fund through the IUCN-led Transforming the Eastern Province of Rwanda through Adaptation project. Development of version 2024.10 was further supported by the Green Climate Fund through the Readiness proposal on Climate Appropriate Portfolios of Tree Diversity for Burkina Faso project, by the Bezos Earth Fund to the Quality Tree Seed for Africa in Kenya and Rwanda project and by the German International Climate Initiative (IKI) to the regional tree seed programme on The Right Tree for the Right Place for the Right Purpose in Africa.

  15. T

    Vital Signs: Housing Permits - Bay Area

    • data.bayareametro.gov
    csv, xlsx, xml
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABAG Housing Permit Database (2022). Vital Signs: Housing Permits - Bay Area [Dataset]. https://data.bayareametro.gov/dataset/Vital-Signs-Housing-Permits-Bay-Area/wbvu-rmp6
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Mar 11, 2022
    Dataset authored and provided by
    ABAG Housing Permit Database
    Area covered
    San Francisco Bay Area
    Description

    VITAL SIGNS INDICATOR Housing Permits (LU3)

    FULL MEASURE NAME Permitted housing units

    LAST UPDATED October 2019

    DESCRIPTION Housing growth is measured in terms of the number of units that local jurisdictions permit throughout a given year. A permitted unit is a unit that a city or county has authorized for construction.

    DATA SOURCE Construction Industry Research Board Table 3: Residential Units and Valuation (1967-2010) No link available

    California Housing Foundation/Construction Industry Research Board California Construction Trends (2011-2013) http://www.mychf.org/cirb/

    Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database (2014-2017) http://opendata.mtc.ca.gov

    CONTACT INFORMATION vitalsigns.info@bayareametro.gov

    METHODOLOGY NOTES (across all datasets for this indicator) Bay Area housing permits data prior to 2014 comes from the California Housing Foundation/Construction Industry Research Board. Data from 2014 to 2017 comes from the Association of Bay Area Governments (ABAG) – Metropolitan Transportation Commission (MTC) Housing Permits Database.

    Single-family housing units include detached, semi-detached, row house and town house units. Row houses and town houses are included as single-family units when each unit is separated from the adjacent unit by an unbroken ground-to-roof party or fire wall. Condominiums are included as single-family units when they are of zero-lot-line or zero-property-line construction; when units are separated by an air space; or, when units are separated by an unbroken ground-to-roof party or fire wall. Multi-family housing includes duplexes, three-to-four-unit structures and apartment-type structures with five units or more. Multi-family also includes condominium units in structures of more than one living unit that do not meet the single-family housing definition. In the permits data from 2014 to 2017, single-family units include all units not strictly classified as multi-family. This may include secondary units.

    Each multi-family unit is counted separately even though they may be in the same building. Total units is the sum of single-family and multi-family units. County data is available from 1967 whereas city data is available from 1990. City data is only available for incorporated cities and towns. All permits in unincorporated cities and towns are included under their respective county’s unincorporated total. Permit data is not available for years when the city or town was not incorporated.

    Affordable housing is the total number of permitted units affordable to low and very low income households. Housing affordable to very low income households are households making below 50% of the area median income. Housing affordable to low income households are households making between 50% and 80% of the area median income. Housing affordable to moderate income households are households making below 80% and 120% of the area median income. Housing affordable to above moderate income households are households making above 120% of the area median income.

    Permit data is missing for the following cities and years: Clayton, 1990-2007 Lafayette, 1990-2007 Moraga, 1990-2007 Orinda, 1990-2007 San Ramon, 1990

    Building permit data for metropolitan areas for each year is the sum of non-seasonally adjusted monthly estimates from the Building Permit Survey. The Bay Area values are the sum of the San Francisco-Oakland-Hayward MSA and the San Jose-Sunnyvale-Santa Clara MSA. The counties included in these areas are: San Francisco, Marin, Contra Costa, Alameda, San Mateo, Santa Clara, and San Benito.

    Permit values reflect the number of units permitted in each respective year.

  16. Crime in England and Wales: Police Force Area data tables

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). Crime in England and Wales: Police Force Area data tables [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/policeforceareadatatables
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Police recorded crime figures by Police Force Area and Community Safety Partnership areas (which equate in the majority of instances, to local authorities).

  17. Vietnam Jobs Dataset

    • kaggle.com
    zip
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nguyen chi tinh (2025). Vietnam Jobs Dataset [Dataset]. https://www.kaggle.com/datasets/nguyenchitinh/vietnam-jobs-dataset/code
    Explore at:
    zip(3213064 bytes)Available download formats
    Dataset updated
    Apr 23, 2025
    Authors
    nguyen chi tinh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Vietnam
    Description

    Vietnam Jobs Dataset

    Overview

    The jobs.csv dataset contains 85k job postings from various cities and industries in Vietnam, providing insights into the job market as of April 22, 2025. The data includes details about job titles, locations, salaries, experience requirements, and job fields, making it a valuable resource for analyzing salary trends, regional job distribution, and industry demands.

    Dataset Description

    • File Name: jobs.csv
    • Format: Comma-Separated Values (CSV)
    • Size: 215 rows (excluding header)
    • Source: Internal data collection (specific source not provided)
    • Date: Reflects job market data as of April 22, 2025

    Data Schema

    The dataset includes the following columns:

    Column NameDescriptionData TypeExample Value
    job_titleTitle of the job postingString"Sales Executive"
    job_typeType of employment (e.g., full-time, part-time)String"Full-time"
    position_levelLevel of the position (e.g., Employee, Manager, Intern)String"NhĂąn viĂȘn" (Employee)
    cityCity where the job is locatedString"Hồ ChĂ­ Minh"
    experienceRequired years of experience (e.g., "khĂŽng yĂȘu cáș§u", "2 - 5 năm")String"trĂȘn 1 năm"
    skillsRequired skills for the job (comma-separated)String"English, Sales, Communication"
    job_fieldsIndustry or field of the job (comma-separated)String"Sales, Marketing, Retail"
    salaryGeneral salary description (may be vague or blank)String"Thỏa thuáș­n" (Negotiable)
    salary_minMinimum salary offered (in VND or USD)Float8000000
    salary_maxMaximum salary offered (in VND or USD)Float15000000
    unitCurrency unit for salary (VND or USD)String"VND"

    Notes on Data

    • Salary Values: Salaries are provided in VND (Vietnamese Dong) or USD. For consistency, convert USD to VND using an exchange rate (e.g., 1 USD = 25,000 VND).
    • Experience: The experience field uses Vietnamese phrases like "khĂŽng yĂȘu cáș§u" (no experience required), "trĂȘn 1 năm" (over 1 year), or ranges like "2 - 5 năm". Parsing into numerical years is recommended for analysis.
    • City Names: City names may vary in format (e.g., "Hồ ChĂ­ Minh", "HCM", "hĂ  nội"). Standardize to "Ho Chi Minh City" and "Hanoi" for consistency.
    • Job Fields: The job_fields column contains comma-separated values, allowing a single job to belong to multiple industries (e.g., "Sales, Marketing").
    • Missing Data: Some fields, particularly salary_min and salary_max, may be missing or zero. Filter out invalid entries for accurate analysis.

    Usage

    This dataset can be used for:

    • Market Analysis: Identify high-paying industries (e.g., banking, management) and salary trends by experience or position level.
    • Regional Insights: Analyze job distribution across cities, with a focus on urban centers like Ho Chi Minh City and Hanoi.
    • Career Planning: Understand entry-level opportunities, particularly in customer service and telesales, which offer competitive salaries for candidates with no experience.
    • Policy and Research: Study labor market dynamics, such as the demand for English-speaking talent or regional salary disparities.

    Example Questions

    • Which cities have the highest number of job postings and average salaries?
    • What are the top-paying job fields, and how do they compare to entry-level roles?
    • How does experience impact salaries across different industries?
    • Are there niche roles (e.g., technical or medical) with unusually high salaries?

    Data Cleaning Recommendations

    To prepare the dataset for analysis, consider the following steps:

    1. Standardize City Names: Convert variations like "HCM" or "hĂ  nội" to "Ho Chi Minh City" and "Hanoi".
    2. Convert Salaries: Multiply USD salaries by 25,000 to convert to VND for consistency.
    3. Parse Experience: Convert text-based experience (e.g., "2 - 5 năm") to numerical values (e.g., 3.5 years). Treat "khĂŽng yĂȘu cáș§u" as 0 years.
    4. Handle Missing Salaries: Filter out rows where both salary_min and salary_max are zero or missing, or impute using industry avera...
  18. d

    Censuses in Wuerttemberg, 1834 to 1925

    • da-ra.de
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Zimmermann; Gabriele DĂ€umling; Julia Grosse; Melanie Prangen; Andrea Jautz; Regina Koch-Richter; Claudia Hierath; Bodo Heizmann; Florian Lenz (2023). Censuses in Wuerttemberg, 1834 to 1925 [Dataset]. http://doi.org/10.4232/1.14072
    Explore at:
    Dataset updated
    Feb 21, 2023
    Dataset provided by
    GESIS
    da|ra
    Authors
    Wolfgang Zimmermann; Gabriele DĂ€umling; Julia Grosse; Melanie Prangen; Andrea Jautz; Regina Koch-Richter; Claudia Hierath; Bodo Heizmann; Florian Lenz
    Time period covered
    1834 - 1925
    Area covered
    Baden-WĂŒrttemberg
    Description

    By joining the German Zollverein (Customs Union) in 1834, the Kingdom of WĂŒrttemberg committed itself to conduct a census in a fixed three-year rhythm according to uniform criteria and with a recording scheme that was as precise as possible. The data obtained in the process formed the basis for the distribution of the common revenues of the German Customs Union. The Kingdom of Wurttemberg conducted the first census as part of the Zollverein on 15 December 1834. The basis of the censuses was the ÂŽresidentÂŽ population, which according to the contemporary definition included all people who were present in the place on the reference date. Residents who were currently absent due to a journey were also taken into account. Men and women who were in transit in the census municipality were not included. Until 1858, the ÂŽlocalÂŽ population, i.e. the population living permanently in the village, was also counted. The data material of the Zollverein and Reich statistics was collected on the basis of the Oberamtslisten, which have survived in handwritten form (Landesarchiv Baden-WĂŒrttemberg, Staatsarchiv Ludwigsburg, Bestand E 258 VIII). The data is available at the municipal, Oberamts and district level. The figures reflect the territorial status valid at the time of the census as well as the contemporary administrative division. Four Excel tables are available for each census, in which the data for the municipalities and head offices of a district are summarised. A crossed-out place name indicates that the municipality in question belonged to another Oberamt at the time of the census. Municipalities that were newly assigned to a Oberamt between 1834 and 1925 are usually added at the end of the Oberamt list. Information on the change of office affiliation can be found in the comment field. An asterisk after a place name (name of the city or village) indicates such supplementary information. The comment field opens as soon as the cursor is placed on the field of the place (city or village) concerned. The primary researchers supplemented the data material with historical maps. The maps of the four WĂŒrttemberg districts are taken from the publication: ÂŽDas Königreich WĂŒrttembergÂŽ (The Kingdom of WĂŒrttemberg), which was published by the State Statistical Office in four volumes between 1904 and 1907.ÂŽ Explanation of symbols 0 = Less than half of 1 in the last filled position, but more than nothing- = Nothing present (exactly zero). = Numerical value unknown or to be kept secretx = Table compartment locked because statement does not make sense... = Statement to be made later/ = No statement, as the numerical value is not certain enough() = Statement value limited, as the numerical value may contain errors Discrepancies in the totals can be explained by rounding the numbers. Place names that have been crossed out indicate that the municipality in question belonged to a different Oberamt at the time of the census. * An asterisk after a place name indicates information about the records in the comment field.ÂŽ Publication: CD-ROM: »Königreich WĂŒrttemberg« VolkszĂ€hlungen 1834 bis 1925. Statistisches Landesamt Baden-WĂŒrttemberg. Zu bestellen unter: https://www.statistik-bw.de/Service/Veroeff/Statistische_Daten/900208001.bsE-Mail: vertrieb@stala.bwl.de

  19. d

    Data from: (Table 2) Contents of rock-forming components in samples from...

    • search.dataone.org
    • doi.pangaea.de
    Updated Jan 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lein, Alla Yu; Bogdanova, Olga Yu; Bogdanov, Yury A; Magazina, Larissa O (2018). (Table 2) Contents of rock-forming components in samples from carbonate mounds of the Lost City hydrothemal field [Dataset]. http://doi.org/10.1594/PANGAEA.765160
    Explore at:
    Dataset updated
    Jan 7, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Lein, Alla Yu; Bogdanova, Olga Yu; Bogdanov, Yury A; Magazina, Larissa O
    Area covered
    Description

    No description is available. Visit https://dataone.org/datasets/a0d9075d9dabafe15910b1e96b2e9a52 for complete metadata about this dataset.

  20. Data from: (Table 5) Chemical composition of aragonite from the Lost City...

    • doi.pangaea.de
    • search.dataone.org
    html, tsv
    Updated 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alla Yu Lein; Olga Yu Bogdanova; Yury A Bogdanov; Larissa O Magazina (2007). (Table 5) Chemical composition of aragonite from the Lost City hydrothermal field according to electron microprobe analysis [Dataset]. http://doi.org/10.1594/PANGAEA.765163
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    2007
    Dataset provided by
    PANGAEA
    Authors
    Alla Yu Lein; Olga Yu Bogdanova; Yury A Bogdanov; Larissa O Magazina
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Variables measured
    Sodium oxide, Calcium oxide, Carbon dioxide, Strontium oxide, Number of observations
    Description

    This dataset is about: (Table 5) Chemical composition of aragonite from the Lost City hydrothermal field according to electron microprobe analysis. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.765175 for more information.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ACS, 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST1Y2021.S0101?q=S0101:+AGE+AND+SEX
Organization logo

2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables)

2021: ACS 1-Year Estimates Subject Tables

Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered
2021
Description

Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

Search
Clear search
Close search
Google apps
Main menu