This child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
This file provides population counts, population centroids and geographic codes for tracts and part tracts within places as well as places and county sub-divisions which were part of Standard Metropolitan Statistical Areas that were not tracted in 1960. Some records appear to refer to Enumeration Districts. The geographic codes include census state codes, census county codes, place codes, and census tracted area codes as well as remainders of parts not within the place. There are different types of records for tracted and un-tracted areas. Each record type has a slightly different data layout. There are about 71,920 records in the file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Census ZIP Code Tabulation AreasThis feature layer, utilizing National Geospatial Data Asset (NGDA) data from the U.S. Census Bureau, displays ZIP Code Tabulation Areas in the United States. Per the USCB, “ZIP Code Tabulation Areas (ZCTAs) are approximate area representations of U.S. Postal Service (USPS) ZIP Code service areas that the Census Bureau creates to present statistical data for each decennial census. Data users should not use ZCTAs to identify the official USPS ZIP Code for mail delivery. The USPS makes periodic changes to ZIP Codes to support more efficient mail delivery.”Tabulation Area: 90069Data currency: This cached Esri federal service is checked weekly for updates from its enterprise federal source (ZIP Code Tabulation Areas) and will support mapping, analysis, data exports and OGC API – Feature access.NGDAID: 58 (Series Information for 2020 Census 5-Digit ZIP Code Tabulation Area (ZCTA5) National TIGER/Line Shapefiles, Current)OGC API Features Link: (Census ZIP Code Tabulation Areas - OGC Features) copy this link to embed it in OGC Compliant viewersFor more information, please visit: ZIP Code Tabulation Areas (ZCTAs)For feedback please contact: Esri_US_Federal_Data@esri.comThumbnail source: Esri BasemapsNGDA Data SetThis data set is part of the NGDA Governmental Units, and Administrative and Statistical Boundaries Theme Community. Per the Federal Geospatial Data Committee (FGDC), this theme is defined as the "boundaries that delineate geographic areas for uses such as governance and the general provision of services (e.g., states, American Indian reservations, counties, cities, towns, etc.), administration and/or for a specific purpose (e.g., congressional districts, school districts, fire districts, Alaska Native Regional Corporations, etc.), and/or provision of statistical data (census tracts, census blocks, metropolitan and micropolitan statistical areas, etc.). Boundaries for these various types of geographic areas are either defined through a documented legal description or through criteria and guidelines. Other boundaries may include international limits, those of federal land ownership, the extent of administrative regions for various federal agencies, as well as the jurisdictional offshore limits of U.S. sovereignty. Boundaries associated solely with natural resources and/or cultural entities are excluded from this theme and are included in the appropriate subject themes."For other NGDA Content: Esri Federal Datasets
This data comes from the 2010 Census Profile of General Population and Housing Characteristics. Zip codes are limited to those that fall at least partially within LA city boundaries. The dataset will be updated after the next census in 2020. To view all possible columns and access the data directly, visit http://factfinder.census.gov/faces/affhelp/jsf/pages/metadata.xhtml?lang=en&type=table&id=table.en.DEC_10_SF1_SF1DP1#main_content.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Census ZIP Code Tabulation AreasThis feature layer, utilizing National Geospatial Data Asset (NGDA) data from the U.S. Census Bureau, displays ZIP Code Tabulation Areas. Per the USCB, “ZIP Code Tabulation Areas (ZCTAs) are approximate area representations of U.S. Postal Service (USPS) ZIP Code service areas that the Census Bureau creates to present statistical data for each decennial census. Data users should not use ZCTAs to identify the official USPS ZIP Code for mail delivery. The USPS makes periodic changes to ZIP Codes to support more efficient mail delivery.”Tabulation Area: 90069NGDAID: 58 (Series Information for 2020 Census 5-Digit ZIP Code Tabulation Area (ZCTA5) National TIGER/Line Shapefiles, Current)OGC API Features Link: (Census ZIP Code Tabulation Areas - OGC Features) copy this link to embed it in OGC Compliant viewersFor more information, please visit: ZIP Code Tabulation Areas (ZCTAs)For feedback please contact: Esri_US_Federal_Data@esri.comNGDA Data SetThis data set is part of the NGDA Governmental Units, and Administrative and Statistical Boundaries Theme Community. Per the Federal Geospatial Data Committee (FGDC), this theme is defined as the "boundaries that delineate geographic areas for uses such as governance and the general provision of services (e.g., states, American Indian reservations, counties, cities, towns, etc.), administration and/or for a specific purpose (e.g., congressional districts, school districts, fire districts, Alaska Native Regional Corporations, etc.), and/or provision of statistical data (census tracts, census blocks, metropolitan and micropolitan statistical areas, etc.). Boundaries for these various types of geographic areas are either defined through a documented legal description or through criteria and guidelines. Other boundaries may include international limits, those of federal land ownership, the extent of administrative regions for various federal agencies, as well as the jurisdictional offshore limits of U.S. sovereignty. Boundaries associated solely with natural resources and/or cultural entities are excluded from this theme and are included in the appropriate subject themes."For other NGDA Content: Esri Federal Datasets
[Metadata]
- 2015 Zip Code Tabulation Areas (ZCTA) with population figures from American
Community Survey 5-year estimates. Source: U.S. Census Bureau, 2016.
The
American Community Survey (ACS) is an ongoing survey that provides data
every year ... the 5-year estimates from the ACS are "period" estimates
that represent data collected over a period of time, from 2011 to
2015. For more information about the ACS, please visit https://www.census.gov/programs-surveys/acs/.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Table showing all variables, classifications and codes included within the Census 2021 microdata samples. This covers the secure, safeguarded and public samples.
California - Census ZIP Code Tabulation Areas (ZCTA)This data is a subset of the National ZCTA data from the US Census Bureau. This layer was created by using the Select by Layer tool in ArcGIS Pro. First, the polygon for the California was selected from the United State County Borders, then the features from the ZCTA layer within the CA polygon were selected to create a new California only ZCTA layer.Census ZIP Code Tabulation AreasThis feature layer, utilizing National Geospatial Data Asset (NGDA) data from the U.S. Census Bureau, displays ZIP Code Tabulation Areas. Per the USCB, “ZIP Code Tabulation Areas (ZCTAs) are approximate area representations of U.S. Postal Service (USPS) ZIP Code service areas that the Census Bureau creates to present statistical data for each decennial census. Data users should not use ZCTAs to identify the official USPS ZIP Code for mail delivery. The USPS makes periodic changes to ZIP Codes to support more efficient mail delivery.”Tabulation Area: 90069NGDAID: 58 (Series Information for 2020 Census 5-Digit ZIP Code Tabulation Area (ZCTA5) National TIGER/Line Shapefiles, Current)OGC API Features Link: (Census ZIP Code Tabulation Areas - OGC Features) copy this link to embed it in OGC Compliant viewersFor more information, please visit: ZIP Code Tabulation Areas (ZCTAs)For feedback please contact: Esri_US_Federal_Data@esri.comNGDA Data SetThis data set is part of the NGDA Governmental Units, and Administrative and Statistical Boundaries Theme Community. Per the Federal Geospatial Data Committee (FGDC), this theme is defined as the "boundaries that delineate geographic areas for uses such as governance and the general provision of services (e.g., states, American Indian reservations, counties, cities, towns, etc.), administration and/or for a specific purpose (e.g., congressional districts, school districts, fire districts, Alaska Native Regional Corporations, etc.), and/or provision of statistical data (census tracts, census blocks, metropolitan and micropolitan statistical areas, etc.). Boundaries for these various types of geographic areas are either defined through a documented legal description or through criteria and guidelines. Other boundaries may include international limits, those of federal land ownership, the extent of administrative regions for various federal agencies, as well as the jurisdictional offshore limits of U.S. sovereignty. Boundaries associated solely with natural resources and/or cultural entities are excluded from this theme and are included in the appropriate subject themes."For other NGDA Content: Esri Federal Datasets
This data package has the purpose to offer data for demographic indicators, part of 5-years American Community Census, that could be needed in the analysis made along with health-related data or as stand-alone. The American Community Survey based on 5-years estimates is, according to U.S Census Bureau, the most reliable, because the samples used are the largest and the data collected cover all country areas, regardless of the population number.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Ancestry listed in this table refers to the total number of people who responded with a particular ancestry; for example, the estimate given for German represents the number of people who listed German as either their first or second ancestry. This table lists only the largest ancestry groups; see the Detailed Tables for more categories. Race and Hispanic origin groups are not included in this table because data for those groups come from the Race and Hispanic origin questions rather than the ancestry question (see Demographic Table)..Data for year of entry of the native population reflect the year of entry into the U.S. by people who were born in Puerto Rico or U.S. Island Areas or born outside the U.S. to a U.S. citizen parent and who subsequently moved to the U.S..The category "with a broadband Internet subscription" refers to those who said "Yes" to at least one of the following types of Internet subscriptions: Broadband such as cable, fiber optic, or DSL; a cellular data plan; satellite; a fixed wireless subscription; or other non-dial up subscription types..An Internet "subscription" refers to a type of service that someone pays for to access the Internet such as a cellular data plan, broadband such as cable, fiber optic or DSL, or other type of service. This will normally refer to a service that someone is billed for directly for Internet alone or sometimes as part of a bundle.."With a computer" includes those who said "Yes" to at least one of the following types of computers: Desktop or laptop; smartphone; tablet or other portable wireless computer; or some other type of computer..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- ...
https://www.icpsr.umich.edu/web/ICPSR/studies/38528/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38528/terms
These datasets contain measures of socioeconomic and demographic characteristics by U.S. census tract for the years 1990-2022 and ZIP code tabulation area (ZCTA) for the years 2008-2022. Example measures include population density; population distribution by race, ethnicity, age, and income; income inequality by race and ethnicity; and proportion of population living below the poverty level, receiving public assistance, and female-headed or single parent families with kids. The datasets also contain a set of theoretically derived measures capturing neighborhood socioeconomic disadvantage and affluence, as well as a neighborhood index of Hispanic, foreign born, and limited English.
https://www.nconemap.gov/pages/termshttps://www.nconemap.gov/pages/terms
The 2020 TIGER/Line Shapefiles contain current geographic extent and boundaries of both legal and statistical entities (which have no governmental standing) for the United States, the District of Columbia, Puerto Rico, and the Island areas. This vintage includes boundaries of governmental units that match the data from the surveys that use 2020 geography (e.g., 2020 Population Estimates and the 2020 American Community Survey). In addition to geographic boundaries, the 2020 TIGER/Line Shapefiles also include geographic feature shapefiles and relationship files. Feature shapefiles represent the point, line and polygon features in the MTDB (e.g., roads and rivers). Relationship files contain additional attribute information users can join to the shapefiles. Both the feature shapefiles and relationship files reflect updates made in the database through September 2020. To see how the geographic entities, relate to one another, please see our geographic hierarchy diagrams here.Census Urbanized Areashttps://www2.census.gov/geo/tiger/TIGER2020/UACCensus Urban/Rural Census Block Shapefileshttps://www.census.gov/cgi-bin/geo/shapefiles/index.php2020 TIGER/Line and Redistricting shapefiles:https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2020.htmlTechnical documentation:https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2020/TGRSHP2020_TechDoc.pdfTIGERweb REST Services:https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_restmapservice.htmlTIGERweb WMS Services:https://tigerweb.geo.census.gov/tigerwebmain/TIGERweb_wms.htmlThe legal entities included in these shapefiles are:American Indian Off-Reservation Trust LandsAmerican Indian Reservations – FederalAmerican Indian Reservations – StateAmerican Indian Tribal Subdivisions (within legal American Indian areas)Alaska Native Regional CorporationsCongressional Districts – 116th CongressConsolidated CitiesCounties and Equivalent Entities (except census areas in Alaska)Estates (US Virgin Islands only)Hawaiian Home LandsIncorporated PlacesMinor Civil DivisionsSchool Districts – ElementarySchool Districts – SecondarySchool Districts – UnifiedStates and Equivalent EntitiesState Legislative Districts – UpperState Legislative Districts – LowerSubminor Civil Divisions (Subbarrios in Puerto Rico)The statistical entities included in these shapefiles are:Alaska Native Village Statistical AreasAmerican Indian/Alaska Native Statistical AreasAmerican Indian Tribal Subdivisions (within Oklahoma Tribal Statistical Areas)Block Groups3-5Census AreasCensus BlocksCensus County Divisions (Census Subareas in Alaska)Unorganized Territories (statistical county subdivisions)Census Designated Places (CDPs)Census TractsCombined New England City and Town AreasCombined Statistical AreasMetropolitan and Micropolitan Statistical Areas and related statistical areasMetropolitan DivisionsNew England City and Town AreasNew England City and Town Area DivisionsOklahoma Tribal Statistical AreasPublic Use Microdata Areas (PUMAs)State Designated Tribal Statistical AreasTribal Designated Statistical AreasUrban AreasZIP Code Tabulation Areas (ZCTAs)Shapefiles - Features:Address Range-FeatureAll Lines (called Edges)All RoadsArea HydrographyArea LandmarkCoastlineLinear HydrographyMilitary InstallationPoint LandmarkPrimary RoadsPrimary and Secondary RoadsTopological Faces (polygons with all geocodes)Relationship Files:Address Range-Feature NameAddress RangesFeature NamesTopological Faces – Area LandmarkTopological Faces – Area HydrographyTopological Faces – Military Installations
The units of geography used for the 2010 Census maps displayed here are the Zip Code Tabulation Area (ZCTA). ZCTAs are statistical geographic areas produced by the Census Bureau by aggregating census blocks to create generalized areas closely resembling the U.S. Postal Service's postal zip codes. The data collected on the short form survey are general demographic characteristics such as age, race, ethnicity, household relationship, housing vacancy and tenure (owner/renter).This is a MD iMAP hosted service. Find more information at https://imap.maryland.gov.Feature Service Link:https://mdgeodata.md.gov/imap/rest/services/Demographics/MD_CensusData/FeatureServer/1
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.
The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa
https://cloud.google.com/bigquery/public-data/us-census
Dataset Source: United States Census Bureau
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What are the ten most populous zip codes in the US in the 2010 census?
What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?
https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png">
https://cloud.google.com/bigquery/images/census-population-map.png
US Census Bureau conducts American Census Survey 1 and 5 Yr surveys that record various demographics and provide public access through APIs. I have attempted to call the APIs through the python environment using the requests library, Clean, and organize the data in a usable format.
ACS Subject data [2011-2019] was accessed using Python by following the below API Link:
https://api.census.gov/data/2011/acs/acs1?get=group(B08301)&for=county:*
The data was obtained in JSON format by calling the above API, then imported as Python Pandas Dataframe. The 84 variables returned have 21 Estimate values for various metrics, 21 pairs of respective Margin of Error, and respective Annotation values for Estimate and Margin of Error Values. This data was then undergone through various cleaning processes using Python, where excess variables were removed, and the column names were renamed. Web-Scraping was carried out to extract the variables' names and replace the codes in the column names in raw data.
The above step was carried out for multiple ACS/ACS-1 datasets spanning 2011-2019 and then merged into a single Python Pandas Dataframe. The columns were rearranged, and the "NAME" column was split into two columns, namely 'StateName' and 'CountyName.' The counties for which no data was available were also removed from the Dataframe. Once the Dataframe was ready, it was separated into two new dataframes for separating State and County Data and exported into '.csv' format
More information about the source of Data can be found at the URL below:
US Census Bureau. (n.d.). About: Census Bureau API. Retrieved from Census.gov
https://www.census.gov/data/developers/about.html
I hope this data helps you to create something beautiful, and awesome. I will be posting a lot more databases shortly, if I get more time from assignments, submissions, and Semester Projects 🧙🏼♂️. Good Luck.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset contains model-based census tract level estimates for the PLACES 2021 release in GIS-friendly format. PLACES is the expansion of the original 500 Cities project and covers the entire United States—50 states and the District of Columbia (DC)—at county, place, census tract, and ZIP Code Tabulation Area (ZCTA) levels. It represents a first-of-its kind effort to release information uniformly on this large scale for local areas at 4 geographic levels. Estimates were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. PLACES was funded by the Robert Wood Johnson Foundation (RWJF) in conjunction with the CDC Foundation. Data sources used to generate these model-based estimates include Behavioral Risk Factor Surveillance System (BRFSS) 2019 or 2018 data, Census Bureau 2010 population estimates, and American Community Survey (ACS) 2015–2019 or 2014–2018 estimates. The 2021 release uses 2019 BRFSS data for 22 measures and 2018 BRFSS data for 7 measures (all teeth lost, dental visits, mammograms, cervical cancer screening, colorectal cancer screening, core preventive services among older adults, and sleeping less than 7 hours a night). Seven measures are based on the 2018 BRFSS data because the relevant questions are only asked every other year in the BRFSS. These data can be joined with the census tract 2015 boundary file in a GIS system to produce maps for 29 measures at the census tract level. An ArcGIS Online feature service is also available for users to make maps online or to add data to desktop GIS software. https://cdcarcgis.maps.arcgis.com/home/item.html?id=024cf3f6f59e49fe8c70e0e5410fe3cf
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The 2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File (NMF) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Abowd, J. et al [2022] https://doi.org/10.1162/99608f92.529e3cb9, and implemented in the DAS 2020 Redistricting Production Code). The NMF was generated using the Census Bureau's implementation of the Discrete Gaussian Mechanism, calibrated to satisfy zero-Concentrated Differential Privacy with bounded neighbors.
The NMF values, called noisy measurements are the output of applying the Discrete Gaussian Mechanism to counts from the 2020 Census Edited File (CEF). They are generally inconsistent with one another (for example, in a county composed of two tracts, the noisy measurement for the county's total population may not equal the sum of the noisy measurements of the two tracts' total population), and frequently negative (especially when the population being measured was small), but are integer-valued. The NMF was later post-processed as part of the DAS code to take the form of microdata and to satisfy various constraints. The NMF documented here contains both the noisy measurements themselves as well as the data needed to represent the DAS constraints; thus, the NMF could be used to reproduce the steps taken by the DAS code to produce microdata from the noisy measurements by applying the production code base.
The 2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File includes zero-Concentrated Differentially Private (zCDP) (Bun, M. and Steinke, T [2016]) noisy measurements, implemented via the discrete Gaussian mechanism. These are estimated counts of individuals and housing units included in the 2020 Census Edited File (CEF), which includes confidential data initially collected in the 2020 Census of Population and Housing. The noisy measurements included in this file were subsequently post-processed by the TopDown Algorithm (TDA) to produce the 2020 Census Redistricting Data (P.L. 94-171) Summary File.
The NMF provides estimates of counts of persons in the CEF by various characteristics and combinations of characteristics including their reported race and ethnicity, whether they were of voting age, whether they resided in a housing unit or one of 7 group quarters types, and their census block of residence after the addition of discrete Gaussian noise (with the scale parameter determined by the privacy-loss budget allocation for that particular query under zCDP). Noisy measurements of the counts of occupied and vacant housing units by census block are also included. Lastly, data on constraints--information into which no noise was infused by the Disclosure Avoidance System (DAS) and used by the TDA to post-process the noisy measurements into the 2020 Census Redistricting Data (P.L. 94-171) Summary File --are provided.
https://www.icpsr.umich.edu/web/ICPSR/studies/38598/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38598/terms
This collection contains measures of land cover (e.g., low-, medium-, or high-density development, forest, wetland, open water) derived from the National Land Cover Database (NLCD) and aggregated by United States census tract and ZIP code tabulation area (ZCTA). For each land type, land cover is measured both in total square meters and as a proportion of all land of that type within the tract or the ZCTA.
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole. The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
This child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.