A. SUMMARY This dataset contains COVID-19 positive confirmed cases aggregated by several different geographic areas and by day. COVID-19 cases are mapped to the residence of the individual and shown on the date the positive test was collected. In addition, 2016-2020 American Community Survey (ACS) population estimates are included to calculate the cumulative rate per 10,000 residents. Dataset covers cases going back to 3/2/2020 when testing began. This data may not be immediately available for recently reported cases and data will change to reflect as information becomes available. Data updated daily. Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas B. HOW THE DATASET IS CREATED Addresses from the COVID-19 case data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area for a given date. The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a cumulative rate which is equal to ([cumulative count up to that date] / [acs_population]) * 10000) representing the number of total cases per 10,000 residents (as of the specified date). COVID-19 case data undergo quality assurance and other data verification processes and are continually updated to maximize completeness and accuracy of information. This means data may change for previous days as information is updated. C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 05:00 Pacific Time. D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS). This dataset can be used to track the spread of COVID-19 throughout the city, in a variety of geographic areas. Note that the new cases column in the data represents the number of new cases confirmed in a certain area on the specified day, while the cumulative cases column is the cumulative total of cases in a certain area as of the specified date. Privacy rules in effect To protect privacy, certain rules are in effect: 1. Any area with a cumulative case count less than 10 are dropped for all days the cumulative count was less than 10. These will be null values. 2. Once an area has a cumulative case count of 10 or greater, that area will have a new row of case data every day following. 3. Cases are dropped altogether for areas where acs_population < 1000 4. Deaths data are not included in this dataset for privacy reasons. The low COVID-19 death rate in San Francisco, along with other publicly available information on deaths, means that deaths data by geography and day is too granular and potentially risky. Read more in our privacy guidelines Rate suppression in effect where counts lower than 20 Rates are not calculated unless the cumulative case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology. A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are spec
This dataset contains information on antibody testing for COVID-19: the number of people who received a test, the number of people with positive results, the percentage of people tested who tested positive, and the rate of testing per 100,000 people, stratified by modified ZIP Code Tabulation Area (ZCTA) of residence. Modified ZCTA reflects the first non-missing address within NYC for each person reported with an antibody test result. This unit of geography is similar to ZIP codes but combines census blocks with smaller populations to allow more stable estimates of population size for rate calculation. It can be challenging to map data that are reported by ZIP Code. A ZIP Code doesn’t refer to an area, but rather a collection of points that make up a mail delivery route. Furthermore, there are some buildings that have their own ZIP Code, and some non-residential areas with ZIP Codes. To deal with the challenges of ZIP Codes, the Health Department uses ZCTAs which solidify ZIP codes into units of area. Often, data reported by ZIP code are actually mapped by ZCTA. The ZCTA geography was developed by the U.S. Census Bureau. These data can also be accessed here: https://github.com/nychealth/coronavirus-data/blob/master/totals/antibody-by-modzcta.csv Exposure to COVID-19 can be detected by measuring antibodies to the disease in a person’s blood, which can indicate that a person may have had an immune response to the virus. Antibodies are proteins produced by the body’s immune system that can be found in the blood. People can test positive for antibodies after they have been exposed, sometimes when they no longer test positive for the virus itself. It is important to note that the science around COVID-19 antibody tests is evolving rapidly and there is still much uncertainty about what individual antibody test results mean for a single person and what population-level antibody test results mean for understanding the epidemiology of COVID-19 at a population level.
These data only provide information on people tested. People receiving an antibody test do not reflect all people in New York City; therefore, these data may not reflect antibody prevalence among all New Yorkers. Increasing instances of screening programs further impact the generalizability of these data, as screening programs influence who and how many people are tested over time. Examples of screening programs in NYC include: employers screening their workers (e.g., hospitals), and long-term care facilities screening their residents.
In addition, there may be potential biases toward people receiving an antibody test who have a positive result because people who were previously ill are preferentially seeking testing, in addition to the testing of persons with higher exposure (e.g., health care workers, first responders)
Rates were calculated using interpolated intercensal population estimates updated in 2019. These rates differ from previously reported rates based on the 2000 Census or previous versions of population estimates. The Health Department produced these population estimates based on estimates from the U.S. Census Bureau and NYC Department of City Planning.
Antibody tests are categorized based on the date of specimen collection and are aggregated by full weeks starting each Sunday and ending on Saturday. For example, a person whose blood was collected for antibody testing on Wednesday, May 6 would be categorized as tested during the week ending May 9. A person tested twice in one week would only be counted once in that week. This dataset includes testing data beginning April 5, 2020.
Data are updated daily, and the dataset preserves historical records and source data changes, so each extract date reflects the current copy of the data as of that date. For example, an extract date of 11/04/2020 and extract date of 11/03/2020 will both contain all records as they were as of that extract date. Without filtering or grouping by extract date, an analysis will almost certainly be miscalculating or counting the same values multiple times. To analyze the most current data, only use the latest extract date. Antibody tests that are missing dates are not included in the dataset; as dates are identified, these events are added. Lags between occurrence and report of cases and tests can be assessed by comparing counts and rates across multiple data extract dates.
For further details, visit:
• https://www1.nyc.gov/site/doh/covid/covid-19-data.page
• https://github.com/nychealth/coronavirus-data
• https://data.cityofnewyork.us/Health/Modified-Zip-Code-Tabulation-Areas-MODZCTA-/pri4-ifjk
This dataset has been retired as of February 17, 2023. This dataset will be kept for historical purposes, but will no longer be updated. Similar data are available on the state’s open data portal: https://data.chhs.ca.gov/dataset/covid-19-time-series-metrics-by-county-and-state.
A. DATASET DESCRIPTION This dataset contains COVID-19 positive confirmed cases aggregated by several different geographic areas and by day. COVID-19 cases are mapped to the residence of the individual and shown on the date the positive test was collected. In addition, 2019 American Community Survey (ACS) 5-year population estimates are included to calculate the cumulative rate per 10,000 residents.
Dataset covers cases going back to March 18th, 2020 when the first person in Marin County tested positive for COVID-19. This data may not be immediately available for recently reported cases and data will change to reflect as information becomes available. Data updated daily.
COVID-19 case data undergo quality assurance and other data verification processes and are continually updated to maximize completeness and accuracy of information. This means data may change for previous days as information is updated.
Geographic areas summarized are: 1. City, Town, or Community Area 2. Census Tracts 3. Census ZIP Code Tabulation Areas (ZCTAs)
B. HOW THE DATASET IS CREATED Addresses from the COVID-19 case data are geocoded by Marin County HHS. Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area for a given date.
The 2019 ACS estimates for population provided by the Census are used to create a cumulative rate which is equal to ([cumulative count up to that date] / [acs_population]) * 10000) representing the number of total cases per 10,000 residents (as of the specified date).
C. UPDATE PROCESS Geographic analysis is scripted by Marin HHS staff and synced to this dataset each day.
D. HOW TO USE THIS DATASET This dataset can be used to track the spread of COVID-19 throughout Marin County in a variety of geographic areas. Note that the new cases column in the data represents the number of new cases confirmed in a certain area on the specified day, while the cumulative cases column is the cumulative total of cases in a certain area as of the specified date.
Privacy rules in effect To protect privacy, certain rules are in effect: 1. Any area with a cumulative case count less than 10 are dropped for all days the cumulative count was less than 10. These will be null values. For example if a zip code did not have 10 cumulative cases until June 1, 2020 that location will not be included in the dataset until June 1. 2. Once an area has a cumulative case count of 10 or greater, that area will have a new row of case data every day following. 3. 3. Cases are dropped altogether for areas where acs_population < 1000. Some adjacent geographic areas may be combined until the ACS population exceeds 1,000 to still provide information for these regions.
Note: 14-day case rate or 30-day case rate where the counts are lower than 20 may be unstable. We advise caution in interpreting rates at these small numbers.
A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes.
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
This dataset provides modeled predictions of PM2.5 levels from the EPA's Downscaler model. Data are at the census tract level for 2006-2010. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Census tract-level datasets contain estimates of the mean predicted concentration and associated standard error. Please refer to the metadata attachment for more information. Learn more about outdoor air quality on the Tracking Network's website: https://ephtracking.cdc.gov/showAirLanding.action. By using these data, you signify your agreement to comply with the following requirements: 1. Use the data for statistical reporting and analysis only. 2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals. 3. Do not disclose of or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov. 4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating. 5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC. 6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. Web. Accessed: insert date. www.cdc.gov/ephtracking.
This dataset provides modeled predictions of PM2.5 levels from the EPA's Downscaler model. Data are at the census tract level for 2011-2014. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Census tract-level datasets contain estimates of the mean predicted concentration and associated standard error. Please refer to the metadata attachment for more information. Learn more about outdoor air quality on the Tracking Network's website: https://res1ephtrackingd-o-tcdcd-o-tgov.vcapture.xyz/showAirLanding.action. By using these data, you signify your agreement to comply with the following requirements: 1. Use the data for statistical reporting and analysis only. 2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals. 3. Do not disclose of or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov. 4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating. 5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC. 6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. Web. Accessed: insert date. www.cdc.gov/ephtracking.
A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases are from: * Case interviews * Laboratories * Medical providers These multiple streams of data are merged, deduplicated, and undergo data verification processes.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.
Gender * The City collects information on gender identity using these guidelines.
Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives. * This dataset includes data for COVID-19 cases reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.
Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to the California Department of Public Health, Virtual Assistant information gathering beginning December 2021. The Virtual Assistant is only sent to adults who are 18+ years old. https://www.sfdph.org/dph/files/PoliciesProcedures/COM9_SexualOrientationGuidelines.pdf">Learn more about our data collection guidelines pertaining to sexual orientation.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.
Transmission Type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
C. UPDATE PROCESS This dataset has been archived and will no longer update as of 9/11/2023.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco po
Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin; for the United States, States, Counties; and for Puerto Rico and its Municipios: April 1, 2010 to July 1, 2019 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // Current data on births, deaths, and migration are used to calculate population change since the 2010 Census. An annual time series of estimates is produced, beginning with the census and extending to the vintage year. The vintage year (e.g., Vintage 2019) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the entire estimates series is revised. Additional information, including historical and intercensal estimates, evaluation estimates, demographic analysis, research papers, and methodology is available on website: https://www.census.gov/programs-surveys/popest.html.
This dataset provides modeled predictions of PM2.5 levels from the EPA's Downscaler model. Data are at the census tract level for 2001-2005. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Census tract-level datasets contain estimates of the mean predicted concentration and associated standard error. Please refer to the metadata attachment for more information.
Learn more about outdoor air quality on the Tracking Network's website: https://ephtracking.cdc.gov/showAirLanding.action.
By using these data, you signify your agreement to comply with the following requirements: 1. Use the data for statistical reporting and analysis only. 2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals. 3. Do not disclose of or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov. 4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating. 5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC. 6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. Web. Accessed: insert date. www.cdc.gov/ephtracking.
This dataset provides modeled predictions of PM2.5 levels from the EPA's Downscaler model. Data are at the census tract level for 2011-2015. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Census tract-level datasets contain estimates of the mean predicted concentration and associated standard error. Please refer to the metadata attachment for more information. Learn more about outdoor air quality on the Tracking Network's website: https://res1ephtrackingd-o-tcdcd-o-tgov.vcapture.xyz/showAirLanding.action. By using these data, you signify your agreement to comply with the following requirements: 1. Use the data for statistical reporting and analysis only. 2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals. 3. Do not disclose of or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov. 4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating. 5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC. 6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. Web. Accessed: insert date. www.cdc.gov/ephtracking.
Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire
Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe
Basic units of analysis that the study investigates include: individuals and groups
Sample survey data [ssd]
A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.
The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.
Sample Universe
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample Design
The sample design is a clustered, stratified, multi-stage, area probability sample.
To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.
In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:
The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages
A first-stage to stratify and randomly select primary sampling units;
A second-stage to randomly select sampling start-points;
A third stage to randomly choose households;
A final-stage involving the random selection of individual respondents
We shall deal with each of these stages in turn.
STAGE ONE: Selection of Primary Sampling Units (PSUs)
The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.
We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.
Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.
Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.
Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.
Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.
The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.
These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.
The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Island Areas: Comparative Statistics by Manufacturing Industry for Puerto Rico: 2022 and 2017.Table ID.ISLANDAREASIND2022.IA2200IND12.Survey/Program.Economic Census of Island Areas.Year.2022.Dataset.ECNIA Economic Census of Island Areas.Source.U.S. Census Bureau, 2022 Economic Census of Island Areas, Core Statistics.Release Date.2024-12-19.Release Schedule.The Economic Census occurs every five years, in years ending in 2 and 7.2022 Economic Census of Island Areas tables are released on a flow basis from June through December 2024.For more information about economic census planned data product releases, see 2022 Economic Census Release Schedule..Dataset Universe. The dataset universe consists of all establishments that are in operation for at least some part of 2022, are located in Puerto Rico, have paid employees, and are classified in one of eighteen in-scope sectors defined by the 2022 NAICS..Sponsor.U.S. Department of Commerce.Methodology.Data Items and Other Identifying Records.Number of establishmentsAnnual payroll ($1,000)Number of employeesNumber of production workers, average for yearProduction workers hoursProduction workers wages ($1,000)Value added ($1,000)Total cost of supplies and/or materials ($1,000)Sales, value of shipments, or revenue ($1,000)Range indicating imputed percentage of total annual payrollRange indicating imputed percentage of total employeesRange indicating imputed percentage of total sales, value of shipments, or revenueDefinitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the Economic Census of Island Areas are employer establishments. An establishment is generally a single physical location where business is conducted or where services or industrial operations are performed..Geography Coverage.The data are shown for employer establishments and firms that vary by industry:At the Territory level for Puerto RicoFor information about economic census geographies, including changes for 2022, see Economic Census: Economic Geographies..Industry Coverage.The data are shown for Puerto Rico at the 2- through 3-digit 2022 NAICS code levels for the manufacturing industry.For information about NAICS, see Economic Census Code Lists..Sampling.The Economic Census of Island Areas is a complete enumeration of establishments located in the islands (i.e., all establishments on the sampling frame are included in the sample). Therefore, the accuracy of tabulations is not affected by sampling error..Confidentiality.The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. 7504609, Disclosure Review Board (DRB) approval number: CBDRB-FY24-0044).The primary method of disclosure avoidance protection is noise infusion. Under this method, the quantitative data values such as sales or payroll for each establishment are perturbed prior to tabulation by applying a random noise multiplier (i.e., factor). Each establishment is assigned a single noise factor, which is applied to all its quantitative data value. Using this method, most published cell totals are perturbed by at most a few percentage points.To comply with disclosure avoidance guidelines, data rows with fewer than three contributing establishments are not presented. For more information on disclosure avoidance, see Methodology for the 2022 Economic Census- Island Areas..Technical Documentation/Methodology.For detailed information about the methods used to collect data and produce statistics, see Methodology for the 2022 Economic Census- Island Areas.For more information about survey questionnaires, Primary Business Activity/NAICS codes, and NAPCS codes, see Economic Census Technical Documentation..Weights.Because the Economic Census of Island Areas is a complete enumeration, there is no sample weighting..Table Information.FTP Download.https://www2.census.gov/programs-surveys/economic-census/data/2022/sector00.API Information.Economic census data are housed in the Census Bureau Application Programming Interface (API)..Symbols.D - Withheld to avoid disclosing data for individual companies; data are included in higher level totalsN - Not available or not comparableS - Estimate does not meet publication standards because of high sampling variability, poor response quality, or other concerns about the estimate quality. Unpublished estimates derived from this table by subtraction are subject to these same limitations and should not be attributed to the U.S. Census Bureau. For a description of publication standards and the total quantity response rate, see link to program methodology page.X - Not applicableA - Relative standard error of 100% or morer - Reviseds - Relative standard error exceeds 40%For a complete list of symbols, see Economic Census Data Dictionary..Data-Specific Notes.Data users who crea...
Monthly Population Estimates by Universe, Age, Sex, Race, and Hispanic Origin for the United States: April 1, 2010 to December 1, 2016 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www.census.gov/popest/data/historical/files/MRSF-01-US1.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // Persons on active duty in the Armed Forces were not enumerated in the 2010 Census. Therefore, variables for the 2010 Census civilian, civilian noninstitutionalized, and resident population plus Armed Forces overseas populations cannot be derived and are not available on these files. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2015) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/popest/index.html
This dataset provides modeled predictions of PM2.5 levels from the EPA's Downscaler model. Data are at the census tract level for 2016-2020. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Census tract-level datasets contain estimates of the mean predicted concentration and associated standard error. Please refer to the metadata attachment for more information. Learn more about outdoor air quality on the Tracking Network's website: https://ephtracking.cdc.gov/showAirLanding.action. By using these data, you signify your agreement to comply with the following requirements: 1. Use the data for statistical reporting and analysis only. 2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals. 3. Do not disclose of or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov. 4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating. 5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC. 6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. Web. Accessed: insert date. www.cdc.gov/ephtracking.
This dataset provides modeled predictions of ozone levels from the EPA's Downscaler model. Data are at the census tract level for 2006-2010. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Census tract-level datasets contain estimates of the mean predicted concentration and associated standard error. Please refer to the metadata attachment for more information.
Learn more about outdoor air quality on the Tracking Network's website: https://ephtracking.cdc.gov/showAirLanding.action.
By using these data, you signify your agreement to comply with the following requirements: 1. Use the data for statistical reporting and analysis only. 2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals. 3. Do not disclose of or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov. 4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating. 5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC. 6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. Web. Accessed: insert date. www.cdc.gov/ephtracking.
This dataset provides modeled predictions of ozone levels from the EPA's Downscaler model. Data are at the census tract level for 2001-2005. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Census tract-level datasets contain estimates of the mean predicted concentration and associated standard error. Please refer to the metadata attachment for more information.
Learn more about outdoor air quality on the Tracking Network's website: https://ephtracking.cdc.gov/showAirLanding.action.
By using these data, you signify your agreement to comply with the following requirements: 1. Use the data for statistical reporting and analysis only. 2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals. 3. Do not disclose of or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov. 4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating. 5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC. 6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. Web. Accessed: insert date. www.cdc.gov/ephtracking.
Annual State Resident Population Estimates for 5 Race Groups (5 Race Alone or in Combination Groups) by Age, Sex, and Hispanic Origin: April 1, 2010 to July 1, 2013// File: 7/1/2013 State Characteristics Population Estimates // Source: U.S. Census Bureau, Population Division // Release Date: June 2014 // Note: 'In combination' means in combination with one or more other races. The sum of the five race groups adds to more than the total population because individuals may report more than one race. The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see http://www.census.gov/popest/data/historical/files/MRSF-01-US1.pdf. // For detailed information about the methods used to create the population estimates, see http://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2013) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: http://www.census.gov/popest/index.html.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table contains data on land use, arable farming, horticulture, grassland, grazing livestock and housed animals, at regional level, by general farm type.
The figures in this table are derived from the agricultural census. Data collection for the agricultural census is part of a combined data collection for a.o. agricultural policy use and enforcement of the manure law.
Regional breakdown is based on the main location of the holding. Due to this the region where activities (crops, animals) are allocated may differ from the location where these activities actually occur.
The agricultural census is also used as the basis for the European Farm Structure Survey (FSS). Data from the agricultural census do not fully coincide with the FSS. In the FSS years (2000, 2003, 2005, 2007 and 2010) additional information was collected to meet the requirements of the FSS.
Reference date for livestock is 1 April and for crops 15 May.
In 2022, equidae are not part of the Agricultural Census. This affects the farm type and the total number of farms in the Agricultural Census. Farms with horses, ponies and donkeys that were previously classified as ‘specialist grazing livestock' could be classified, according to their dominant activity, as another farm type in 2022.
From 2020 onwards, the SO2017, based on the years 2015 to 2019, will apply (see also the explanation for SO: Standard Output).
From 2018 onwards the number of calves for fattening, pigs for fattening, chicken and turkey are adjusted in the case of temporary breaks in the production cycle (e.g. sanitary cleaning). The agricultural census is a structural survey, in which adjustment for temporary breaks in the production cycle is a.o. relevant for the calculation of the economic size of the holding, and its farm type. In the livestock surveys the number of animals on the reference day is relevant, therefore no adjustment for temporary breaks in the production cycle are made. This means that the number of animals in the tables of the agricultural census may differ from those in the livestock tables (see ‘links to relevant tables and relevant articles).
From 2017 onwards, animal numbers are increasingly derived from I&R registers (Identification and Registration of animals), instead of by means of the combined data collection. The I&R registers are the responsibility of RVO (Netherlands Enterprise Agency). Since 2017, cattle numbers are derived from I&R cattle, and from 2018 sheep, goats and poultry are also derived from the relevant I&R registers. The registration of cattle, sheep and goats takes place directly at RVO. Poultry data is collected via the designated database Poultry Information System Poultry (KIP) from Avined. Avined is a branch organization for the egg and poultry meat sectors. Avined passes the data on to the central database of RVO. Due to the transition to the use of I&R registers, a change in classification will occur for sheep and goats from 2018 onwards.
Since 2016, information of the Dutch Business Register is used to define the agricultural census. Registration in the Business Register with an agricultural standard industrial classification code (SIC), related to NACE/ISIC, (in Dutch SBI: ‘Standaard BedrijfsIndeling’) is leading to determine whether there is an agricultural holding. This aligns the agricultural census as closely as possible to the statistical regulations of Eurostat and the (Dutch) implementation of the definition of 'active farmer' as described in the common agricultural policy.
The definition of the agricultural census based on information from the Dutch Business Register mainly affects the number of holdings, a clear deviation of the trend occurs. The impact on areas (except for other land and rough grazing) and the number of animals (except for sheep, horses and ponies) is limited. This is mainly due to the holdings that are excluded as a result of the new delimitation of agricultural holdings (such as equestrian centres, city farms and organisations in nature management).
In 2011 there were changes in geographic assignment of holdings with a foreign main seat. This may influence regional figures, mainly in border regions.
Until 2010 the economic size of agricultural holdings was expressed in Dutch size units (in Dutch NGE: 'Nederlandse Grootte Eenheid'). From 2010 onwards this has become Standard Output (SO). This means that the threshold for holdings in the agricultural census has changed from 3 NGE to 3000 euro SO. For comparable time series the figures for 2000 up to and including 2009 have been recalculated, based on SO coefficients and SO typology. The latest update took place in 2016.
Data available from: 2000
Status of the figures: The figures are final.
Changes as of March 28, 2025: the final figures for 2024 have been added.
When will new figures be published? According to regular planning provisional figures are published in November and the definite figures will follow in March of the following year.
http://opendata.victoria.ca/pages/open-data-licencehttp://opendata.victoria.ca/pages/open-data-licence
Assessment Values data consists of a Land Value (lot value) and Improvement Value (value of structures built on those lots). The Total Value is the sum of the Land and Improvement Value (total value of land plus structures built on it). The assessment values are for the current year.The data are broken down by property type (Residential, Business, Industrial, and Recreational - determined from BC Assessment's Actual Use Code) with Minimum/Maximum/Mean/Average values in each category by 2016 census dissemination area (DA).The "Last Updated" date shown on our Open Data Portal refers to the last time the data schema was modified in the portal, or any changes were made to this description. We update our data through scripts which does not trigger the "last updated" date to change.Note: Attributes represent each field in a dataset, and some fields will contain information such as ID numbers. As a result some visualizations on the tabs on our Open Data page will not be relevant.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY Medical provider confirmed COVID-19 cases and confirmed COVID-19 related deaths in San Francisco, CA aggregated by several different geographic areas and normalized by 2016-2020 American Community Survey (ACS) 5-year estimates for population data to calculate rate per 10,000 residents.
On September 12, 2021, a new case definition of COVID-19 was introduced that includes criteria for enumerating new infections after previous probable or confirmed infections (also known as reinfections). A reinfection is defined as a confirmed positive PCR lab test more than 90 days after a positive PCR or antigen test. The first reinfection case was identified on December 7, 2021.
Cases and deaths are both mapped to the residence of the individual, not to where they were infected or died. For example, if one was infected in San Francisco at work but lives in the East Bay, those are not counted as SF Cases or if one dies in Zuckerberg San Francisco General but is from another county, that is also not counted in this dataset.
Dataset is cumulative and covers cases going back to 3/2/2020 when testing began.
Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas
B. HOW THE DATASET IS CREATED Addresses from medical data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area. The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a rate which is equal to ([count] / [acs_population]) * 10000) representing the number of cases per 10,000 residents.
C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 7:30 Pacific Time.
D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
Privacy rules in effect To protect privacy, certain rules are in effect: 1. Case counts greater than 0 and less than 10 are dropped - these will be null (blank) values 2. Death counts greater than 0 and less than 10 are dropped - these will be null (blank) values 3. Cases and deaths dropped altogether for areas where acs_population < 1000
Rate suppression in effect where counts lower than 20 Rates are not calculated unless the case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology.
A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website.
Row included for Citywide case counts, incidence rate, and deaths A single row is included that has the Citywide case counts and incidence rate. This can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling basis.
E. CHANGE LOG
A. SUMMARY This dataset contains COVID-19 positive confirmed cases aggregated by several different geographic areas and by day. COVID-19 cases are mapped to the residence of the individual and shown on the date the positive test was collected. In addition, 2016-2020 American Community Survey (ACS) population estimates are included to calculate the cumulative rate per 10,000 residents. Dataset covers cases going back to 3/2/2020 when testing began. This data may not be immediately available for recently reported cases and data will change to reflect as information becomes available. Data updated daily. Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas B. HOW THE DATASET IS CREATED Addresses from the COVID-19 case data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area for a given date. The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a cumulative rate which is equal to ([cumulative count up to that date] / [acs_population]) * 10000) representing the number of total cases per 10,000 residents (as of the specified date). COVID-19 case data undergo quality assurance and other data verification processes and are continually updated to maximize completeness and accuracy of information. This means data may change for previous days as information is updated. C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 05:00 Pacific Time. D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS). This dataset can be used to track the spread of COVID-19 throughout the city, in a variety of geographic areas. Note that the new cases column in the data represents the number of new cases confirmed in a certain area on the specified day, while the cumulative cases column is the cumulative total of cases in a certain area as of the specified date. Privacy rules in effect To protect privacy, certain rules are in effect: 1. Any area with a cumulative case count less than 10 are dropped for all days the cumulative count was less than 10. These will be null values. 2. Once an area has a cumulative case count of 10 or greater, that area will have a new row of case data every day following. 3. Cases are dropped altogether for areas where acs_population < 1000 4. Deaths data are not included in this dataset for privacy reasons. The low COVID-19 death rate in San Francisco, along with other publicly available information on deaths, means that deaths data by geography and day is too granular and potentially risky. Read more in our privacy guidelines Rate suppression in effect where counts lower than 20 Rates are not calculated unless the cumulative case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology. A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are spec