https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.
The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa
https://cloud.google.com/bigquery/public-data/us-census
Dataset Source: United States Census Bureau
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What are the ten most populous zip codes in the US in the 2010 census?
What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?
https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png">
https://cloud.google.com/bigquery/images/census-population-map.png
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 50 states in the United States by Non-Hispanic Asian population, as estimated by the United States Census Bureau. It also highlights population changes in each states over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Announcement Beginning October 20, 2022, CDC will report and publish aggregate case and death data from jurisdictional and state partners on a weekly basis rather than daily. As a result, community transmission levels data reported on data.cdc.gov will be updated weekly on Thursdays, typically by 8 PM ET, instead of daily. This public use dataset has 7 data elements reflecting historical data for community transmission levels for all available counties. This dataset contains historical data for the county level of community transmission and includes updated data submitted by states and jurisdictions. Each day, the dataset is appended to contain the most recent day's data. This dataset includes data from January 1, 2021. Transmission level is set to low, moderate, substantial, or high using the calculation rules below. Currently, CDC provides the public with two versions of COVID-19 county-level community transmission level data: this dataset with the levels for each county from January 1, 2021 (Historical Changes dataset) and a dataset with the levels as originally posted (Originally Posted dataset), updated daily with the most recent day’s data. Methods for calculating county level of community transmission indicator The County Level of Community Transmission indicator uses two metrics: (1) total new COVID-19 cases per 100,000 persons in the last 7 days and (2) percentage of positive SARS-CoV-2 diagnostic nucleic acid amplification tests (NAAT) in the last 7 days. For each of these metrics, CDC classifies transmission values as low, moderate, substantial, or high (below and here). If the values for each of these two metrics differ (e.g., one indicates moderate and the other low), then the higher of the two should be used for decision-making. CDC core metrics of and thresholds for community transmission levels of SARS-CoV-2 Total New Case Rate Metric: "New cases per 100,000 persons in the past 7 days" is calculated by adding the number of new cases in the county (or other administrative level) in the last 7 days divided by the population in the county (or other administrative level) and multiplying by 100,000. "New cases per 100,000 persons in the past 7 days" is considered to have transmission level of Low (0-9.99); Moderate (10.00-49.99); Substantial (50.00-99.99); and High (greater than or equal to 100.00). Test Percent Positivity Metric: "Percentage of positive NAAT in the past 7 days" is calculated by dividing the number of positive tests in the county (or other administrative level) during the last 7 days by the total number of tests resulted over the last 7 days. "Percentage of positive NAAT in the past 7 days" is considered to have transmission level of Low (less than 5.00); Moderate (5.00-7.99); Substantial (8.00-9.99); and High (greater than or equal to 10.00). If the two metrics suggest different transmission levels, the higher level is selected. If one metric is missing, the other metric is used for the indicator. Transmission categories include: Low Transmission Threshold: Counties with fewer than 10 total cases per 100,000 population in the past 7 days, and a NAAT percent test positivity in the past 7 days below 5%; Moderate Transmission Threshold: Counties with 10-49 total cases per 100,000 population in the past 7 days or a NAAT test percent positivity in the past 7 days of 5.0-7.99%; Substantial Transmission Threshold: Counties with 50-99 total cases per 100,000 population in the past 7 days or a NAAT test percent positivity in the past 7 days of 8.0-9.99%; High Transmission Threshold: Counties with 100
In the United States, city governments provide many services: they run public school districts, administer certain welfare and health programs, build roads and manage airports, provide police and fire protection, inspect buildings, and often run water and utility systems. Cities also get revenues through certain local taxes, various fees and permit costs, sale of property, and through the fees they charge for the utilities they run.
It would be interesting to compare all these expenses and revenues across cities and over time, but also quite difficult. Cities share many of these service responsibilities with other government agencies: in one particular city, some roads may be maintained by the state government, some law enforcement provided by the county sheriff, some schools run by independent school districts with their own tax revenue, and some utilities run by special independent utility districts. These governmental structures vary greatly by state and by individual city. It would be hard to make a fair comparison without taking into account all these differences.
This dataset takes into account all those differences. The Lincoln Institute of Land Policy produces what they call “Fiscally Standardized Cities” (FiSCs), aggregating all services provided to city residents regardless of how they may be divided up by different government agencies and jurisdictions. Using this, we can study city expenses and revenues, and how the proportions of different costs vary over time.
The dataset tracks over 200 American cities between 1977 and 2020. Each row represents one city for one year. Revenue and expenditures are broken down into more than 120 categories.
Values are available for FiSCs and also for the entities that make it up: the city, the county, independent school districts, and any special districts, such as utility districts. There are hence five versions of each variable, with suffixes indicating the entity. For example, taxes gives the FiSC’s tax revenue, while taxes_city, taxes_cnty, taxes_schl, and taxes_spec break it down for the city, county, school districts, and special districts.
The values are organized hierarchically. For example, taxes is the sum of tax_property (property taxes), tax_sales_general (sales taxes), tax_income (income tax), and tax_other (other taxes). And tax_income is itself the sum of tax_income_indiv (individual income tax) and tax_income_corp (corporate income tax) subcategories.
The revenue and expenses variables are described in this detailed table. Further documentation is available on the FiSC Database website, linked in References below.
All monetary data is already adjusted for inflation, and is given in terms of 2020 US dollars per capita. The Consumer Price Index is provided for each year if you prefer to use numbers not adjusted for inflation, scaled so that 2020 is 1; simply divide each value by the CPI to get the value in that year’s nominal dollars. The total population is also provided if you want total values instead of per-capita values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 50 states in the United States by Hispanic Some Other Race (SOR) population, as estimated by the United States Census Bureau. It also highlights population changes in each states over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
By Health [source]
The Behavioral Risk Factor Surveillance System (BRFSS) offers an expansive collection of data on the health-related quality of life (HRQOL) from 1993 to 2010. Over this time period, the Health-Related Quality of Life dataset consists of a comprehensive survey reflecting the health and well-being of non-institutionalized US adults aged 18 years or older. The data collected can help track and identify unmet population health needs, recognize trends, identify disparities in healthcare, determine determinants of public health, inform decision making and policy development, as well as evaluate programs within public healthcare services.
The HRQOL surveillance system has developed a compact set of HRQOL measures such as a summary measure indicating unhealthy days which have been validated for population health surveillance purposes and have been widely implemented in practice since 1993. Within this study's dataset you will be able to access information such as year recorded, location abbreviations & descriptions, category & topic overviews, questions asked in surveys and much more detailed information including types & units regarding data values retrieved from respondents along with their sample sizes & geographical locations involved!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset tracks the Health-Related Quality of Life (HRQOL) from 1993 to 2010 using data from the Behavioral Risk Factor Surveillance System (BRFSS). This dataset includes information on the year, location abbreviation, location description, type and unit of data value, sample size, category and topic of survey questions.
Using this dataset on BRFSS: HRQOL data between 1993-2010 will allow for a variety of analyses related to population health needs. The compact set of HRQOL measures can be used to identify trends in population health needs as well as determine disparities among various locations. Additionally, responses to survey questions can be used to inform decision making and program and policy development in public health initiatives.
- Analyzing trends in HRQOL over the years by location to identify disparities in health outcomes between different populations and develop targeted policy interventions.
- Developing new models for predicting HRQOL indicators at a regional level, and using this information to inform medical practice and public health implementation efforts.
- Using the data to understand differences between states in terms of their HRQOL scores and establish best practices for healthcare provision based on that understanding, including areas such as access to care, preventative care services availability, etc
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: rows.csv | Column name | Description | |:-------------------------------|:----------------------------------------------------------| | Year | Year of survey. (Integer) | | LocationAbbr | Abbreviation of location. (String) | | LocationDesc | Description of location. (String) | | Category | Category of survey. (String) | | Topic | Topic of survey. (String) | | Question | Question asked in survey. (String) | | DataSource | Source of data. (String) | | Data_Value_Unit | Unit of data value. (String) | | Data_Value_Type | Type of data value. (String) | | Data_Value_Footnote_Symbol | Footnote symbol for data value. (String) | | Data_Value_Std_Err | Standard error of the data value. (Float) | | Sample_Size | Sample size used in sample. (Integer) | | Break_Out | Break out categories used. (String) | | Break_Out_Category | Type break out assessed. (String) | | **GeoLocation*...
How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov. The Low- to Moderate-Income (LMI) New York State (NYS) Census Population Analysis dataset is resultant from the LMI market database designed by APPRISE as part of the NYSERDA LMI Market Characterization Study (https://www.nyserda.ny.gov/lmi-tool). All data are derived from the U.S. Census Bureau’s American Community Survey (ACS) 1-year Public Use Microdata Sample (PUMS) files for 2013, 2014, and 2015. Each row in the LMI dataset is an individual record for a household that responded to the survey and each column is a variable of interest for analyzing the low- to moderate-income population. The LMI dataset includes: county/county group, households with elderly, households with children, economic development region, income groups, percent of poverty level, low- to moderate-income groups, household type, non-elderly disabled indicator, race/ethnicity, linguistic isolation, housing unit type, owner-renter status, main heating fuel type, home energy payment method, housing vintage, LMI study region, LMI population segment, mortgage indicator, time in home, head of household education level, head of household age, and household weight. The LMI NYS Census Population Analysis dataset is intended for users who want to explore the underlying data that supports the LMI Analysis Tool. The majority of those interested in LMI statistics and generating custom charts should use the interactive LMI Analysis Tool at https://www.nyserda.ny.gov/lmi-tool. This underlying LMI dataset is intended for users with experience working with survey data files and producing weighted survey estimates using statistical software packages (such as SAS, SPSS, or Stata).
In 2023, Washington, D.C. had the highest population density in the United States, with 11,130.69 people per square mile. As a whole, there were about 94.83 residents per square mile in the U.S., and Alaska was the state with the lowest population density, with 1.29 residents per square mile. The problem of population density Simply put, population density is the population of a country divided by the area of the country. While this can be an interesting measure of how many people live in a country and how large the country is, it does not account for the degree of urbanization, or the share of people who live in urban centers. For example, Russia is the largest country in the world and has a comparatively low population, so its population density is very low. However, much of the country is uninhabited, so cities in Russia are much more densely populated than the rest of the country. Urbanization in the United States While the United States is not very densely populated compared to other countries, its population density has increased significantly over the past few decades. The degree of urbanization has also increased, and well over half of the population lives in urban centers.
Note: This COVID-19 data set is no longer being updated as of December 1, 2023. Access current COVID-19 data on the CDPH respiratory virus dashboard (https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/Respiratory-Viruses/RespiratoryDashboard.aspx) or in open data format (https://data.chhs.ca.gov/dataset/respiratory-virus-dashboard-metrics). As of August 17, 2023, data is being updated each Friday. For death data after December 31, 2022, California uses Provisional Deaths from the Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS) National Vital Statistics System (NVSS). Prior to January 1, 2023, death data was sourced from the COVID-19 registry. The change in data source occurred in July 2023 and was applied retroactively to all 2023 data to provide a consistent source of death data for the year of 2023. As of May 11, 2023, data on cases, deaths, and testing is being updated each Thursday. Metrics by report date have been removed, but previous versions of files with report date metrics are archived below. All metrics include people in state and federal prisons, US Immigration and Customs Enforcement facilities, US Marshal detention facilities, and Department of State Hospitals facilities. Members of California's tribal communities are also included. The "Total Tests" and "Positive Tests" columns show totals based on the collection date. There is a lag between when a specimen is collected and when it is reported in this dataset. As a result, the most recent dates on the table will temporarily show NONE in the "Total Tests" and "Positive Tests" columns. This should not be interpreted as no tests being conducted on these dates. Instead, these values will be updated with the number of tests conducted as data is received.
Insurance rates for vehicles is a major market that is subject to a lot of variance. This simple and small dataset contains the insurance rate for all US states.
- Explore insurance rates per state, find optimal prices
- More datasets
Sources
License
License was not specified at the source
Splash banner
Photo by Sarah Brown on Unsplash.
Splash Icon
Icons made by Kiranshastry from www.flaticon.com.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.
This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.
The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.
Using these data, the COVID-19 community level was classified as low, medium, or high.
COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.
For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.
Archived Data Notes:
This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.
March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.
March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.
March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.
March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.
March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).
March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.
April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.
April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.
May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for several Florida counties for the week of May 19th, 2022, have been corrected for a data processing error. Of note, Broward, Miami-Dade, Palm Beach Counties should have appeared in the high CCL category, and Osceola County should have appeared in the medium CCL category. These corrections are reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for Orange County, New York for the week of May 26, 2022 displayed an erroneous case rate of zero and a CCL category of low due to a data source error. This county should have appeared in the medium CCL category.
June 2, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a data processing error. Tolland County, CT should have appeared in the medium community level category during the week of May 26, 2022. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a misspelling. The medium community level category for Tolland County, CT on the week of May 26, 2022 was misspelled as “meduim” in the data set. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Mississippi counties for the week of June 9, 2022 should be interpreted with caution due to a reporting cadence change over the Memorial Day holiday that resulted in artificially inflated case rates in the state.
July 7, 2022: COVID-19 Community Level (CCL) data released for Rock County, Minnesota for the week of July 7, 2022 displayed an artificially low case rate and CCL category due to a data source error. This county should have appeared in the high CCL category.
July 14, 2022: COVID-19 Community Level (CCL) data released for Massachusetts counties for the week of July 14, 2022 should be interpreted with caution due to a reporting cadence change that resulted in lower than expected case rates and CCL categories in the state.
July 28, 2022: COVID-19 Community Level (CCL) data released for all Montana counties for the week of July 21, 2022 had case rates of 0 due to a reporting issue. The case rates have been corrected in this update.
July 28, 2022: COVID-19 Community Level (CCL) data released for Alaska for all weeks prior to July 21, 2022 included non-resident cases. The case rates for the time series have been corrected in this update.
July 28, 2022: A laboratory in Nevada reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate will be inflated in Clark County, NV for the week of July 28, 2022.
August 4, 2022: COVID-19 Community Level (CCL) data was updated on August 2, 2022 in error during performance testing. Data for the week of July 28, 2022 was changed during this update due to additional case and hospital data as a result of late reporting between July 28, 2022 and August 2, 2022. Since the purpose of this data set is to provide point-in-time views of COVID-19 Community Levels on Thursdays, any changes made to the data set during the August 2, 2022 update have been reverted in this update.
August 4, 2022: COVID-19 Community Level (CCL) data for the week of July 28, 2022 for 8 counties in Utah (Beaver County, Daggett County, Duchesne County, Garfield County, Iron County, Kane County, Uintah County, and Washington County) case data was missing due to data collection issues. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 4, 2022: Due to a reporting cadence change, case rates for all Alabama counties will be lower than expected. As a result, the CCL levels published on August 4, 2022 should be interpreted with caution.
August 11, 2022: COVID-19 Community Level (CCL) data for the week of August 4, 2022 for South Carolina have been updated to correct a data collection error that resulted in incorrect case data. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 18, 2022: COVID-19 Community Level (CCL) data for the week of August 11, 2022 for Connecticut have been updated to correct a data ingestion error that inflated the CT case rates. CDC, in collaboration with CT, has resolved the issue and the correction is reflected in this update.
August 25, 2022: A laboratory in Tennessee reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate may be inflated in many counties and the CCLs published on August 25, 2022 should be interpreted with caution.
August 25, 2022: Due to a data source error, the 7-day case rate for St. Louis County, Missouri, is reported as zero in the COVID-19 Community Level data released on August 25, 2022. Therefore, the COVID-19 Community Level for this county should be interpreted with caution.
September 1, 2022: Due to a reporting issue, case rates for all Nebraska counties will include 6 days of data instead of 7 days in the COVID-19 Community Level (CCL) data released on September 1, 2022. Therefore, the CCLs for all Nebraska counties should be interpreted with caution.
September 8, 2022: Due to a data processing error, the case rate for Philadelphia County, Pennsylvania,
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
This dataset, named "state_trends.csv," contains information about different U.S. states. Let's break down the attributes and understand what each column represents:
In summary, this dataset provides a variety of information about U.S. states, including demographic data, geographical region, psychological region, personality traits, and scores related to interests or proficiencies in various fields such as data science, art, and sports.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
This list ranks the 50 states in the United States by Multi-Racial Black or African American population, as estimated by the United States Census Bureau. It also highlights population changes in each states over the past five years.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates, including:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Annual Resident Population Estimates, Estimated Components of Resident Population Change, and Rates of the Components of Resident Population Change for States and Counties // Source: U.S. Census Bureau, Population Division // Note: Total population change includes a residual. This residual represents the change in population that cannot be attributed to any specific demographic component. See Population Estimates Terms and Definitions at http://www.census.gov/popest/about/terms.html. // Net international migration in the United States includes the international migration of both native and foreign-born populations. Specifically, it includes: (a) the net international migration of the foreign born, (b) the net migration between the United States and Puerto Rico, (c) the net migration of natives to and from the United States, and (d) the net movement of the Armed Forces population between the United States and overseas. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. See Geographic Terms and Definitions at http://www.census.gov/popest/about/geo/terms.html for a list of the states that are included in each region and division. // For detailed information about the methods used to create the population estimates, see http://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureaus Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2014) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: http://www.census.gov/popest/index.html.
This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site
Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.
SELECT
covid19.county,
covid19.state_name,
total_pop AS county_population,
confirmed_cases,
ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000,
deaths,
ROUND(deaths/total_pop *100000,2) AS deaths_per_100000
FROM
bigquery-public-data.covid19_nyt.us_counties
covid19
JOIN
bigquery-public-data.census_bureau_acs.county_2017_5yr
acs ON covid19.county_fips_code = acs.geo_id
WHERE
date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day)
AND covid19.county_fips_code != "00000"
ORDER BY
confirmed_cases_per_100000 desc
How do I calculate the number of new COVID-19 cases per day?
This query determines the total number of new cases in each state for each day available in the dataset
SELECT
b.state_name,
b.date,
MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases
FROM
(SELECT
state_name AS state,
state_fips_code ,
confirmed_cases,
DATE_ADD(date, INTERVAL 1 day) AS date_shift
FROM
bigquery-public-data.covid19_nyt.us_states
WHERE
confirmed_cases + deaths > 0) a
JOIN
bigquery-public-data.covid19_nyt.us_states
b ON
a.state_fips_code = b.state_fips_code
AND a.date_shift = b.date
GROUP BY
b.state_name, date
ORDER BY
date desc
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VERSION 1.5. The world's most accurate population datasets. Seven maps/datasets for the distribution of various populations in Egypt: (1) Overall population density (2) Women (3) Men (4) Children (ages 0-5) (5) Youth (ages 15-24) (6) Elderly (ages 60+) (7) Women of reproductive age (ages 15-49). Methodology These high-resolution maps are created using machine learning techniques to identify buildings from commercially available satellite images. This is then overlayed with general population estimates based on publicly available census data and other population statistics at Columbia University. The resulting maps are the most detailed and actionable tools available for aid and research organizations. For more information about the methodology used to create our high resolution population density maps and the demographic distributions, click here For information about how to use HDX to access these datasets, please visit: https://dataforgood.fb.com/docs/high-resolution-population-density-maps-demographic-estimates-documentation/ Adjustments to match the census population with the UN estimates are applied at the national level. The UN estimate for a given country (or state/territory) is divided by the total census estimate of population for the given country. The resulting adjustment factor is multiplied by each administrative unit census value for the target year. This preserves the relative population totals across administrative units while matching the UN total. More information can be found here
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:
-How often do people visit a location? (daily, monthly, absolute, and averages).
-What type of places do they visit ? (parks, schools, hospitals, etc)
-Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors.
-What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?
Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.
Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.
We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.
Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.
Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.
Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.
Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.
POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.
Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.
Delivery schemas We can deliver the data in three different formats:
Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.
Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.
Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
THIS DATASET WAS LAST UPDATED AT 2:10 AM EASTERN ON JUNE 29
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
India is the most populous country in the world with one-sixth of the world's population. According to official estimates in 2022, India's population stood at over 1.42 billion.
This dataset contains the population distribution by state, gender, sex & region.
The file is in .csv format thus it is accessible everywhere.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.
The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa
https://cloud.google.com/bigquery/public-data/us-census
Dataset Source: United States Census Bureau
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What are the ten most populous zip codes in the US in the 2010 census?
What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?
https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png">
https://cloud.google.com/bigquery/images/census-population-map.png