The data was collected using the High Frequency Survey (HFS). The survey allowes for better reaching populations of interest with remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
Household
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance.
Computer Assisted Personal Interview [capi]
Questionnaire contained the following sections: Household Demographics, vulnerability, basic Needs, coping capacity, well-being.
How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov. The Low- to Moderate-Income (LMI) New York State (NYS) Census Population Analysis dataset is resultant from the LMI market database designed by APPRISE as part of the NYSERDA LMI Market Characterization Study (https://www.nyserda.ny.gov/lmi-tool). All data are derived from the U.S. Census Bureau’s American Community Survey (ACS) 1-year Public Use Microdata Sample (PUMS) files for 2013, 2014, and 2015. Each row in the LMI dataset is an individual record for a household that responded to the survey and each column is a variable of interest for analyzing the low- to moderate-income population. The LMI dataset includes: county/county group, households with elderly, households with children, economic development region, income groups, percent of poverty level, low- to moderate-income groups, household type, non-elderly disabled indicator, race/ethnicity, linguistic isolation, housing unit type, owner-renter status, main heating fuel type, home energy payment method, housing vintage, LMI study region, LMI population segment, mortgage indicator, time in home, head of household education level, head of household age, and household weight. The LMI NYS Census Population Analysis dataset is intended for users who want to explore the underlying data that supports the LMI Analysis Tool. The majority of those interested in LMI statistics and generating custom charts should use the interactive LMI Analysis Tool at https://www.nyserda.ny.gov/lmi-tool. This underlying LMI dataset is intended for users with experience working with survey data files and producing weighted survey estimates using statistical software packages (such as SAS, SPSS, or Stata).
This data collection comprises a data library, sample outputs, batch files and accompanying documentation from the ESRC-funded project “Population247NRT: Near real-time spatiotemporal population estimates for health, emergency response and national security”. The data comprise a structured set of input data for use with the authors’ SurfaceBuilder247 software and sample outputs which estimate the population distribution of England at specific times on specific dates, referenced to 2011 census population totals.
The sample output files (provided as GeoTIFFs) contain population estimates in 200m grid cells, based on the British National Grid, for 02:00 (2am) and 14:00 (2pm) on a typical weekday in University and school term-time and out of term-time. The estimates are broken down by seven age/economic activity sub-groups for term-time and six for out of term-time, and include estimates of population activity in residential, workplace, education, healthcare and road transportation domains.
The data library, which has been constructed entirely using open data sources, comprises population estimates, by age/economic activity sub-groups, for point locations (typically population-weighted centroids of census output areas and workplace zones, or postcode centroids of sites such as schools or hospitals); time profiles representing usual patterns of population activity at these sites during a 24-hour period; and background grid layers representing the land surface area and major road network. SurfaceBuilder247 uses the data library to generate time-specific gridded population estimates by redistributing the population of each sub-group across the available locations and background grid in accordance with the reference time profiles.
The sample output grids provided in this resource may be used directly in GIS software or, alternatively, the input data library may be reprocessed using SurfaceBuilder247 to generate estimates for specific dates and times of interest to the user. Sample batch and session parameter files are included in the resource.
Decision-making and policy formulation in sectors such as health, emergency/crisis response and national security, ideally require accurate dynamic information on the number of people in specific places at specific times of the day, week, season or year. Traditional census data do not provide this level of detail but are often used for such policy and planning purposes. The ESRC-funded Population247 programme of research (Martin et al, 2015) developed a framework, methodology and software tool (SurfaceBuilder247) for integrating diverse contemporary data sources to produce enhanced time-specific population estimates for small geographical areas. Its usefulness has since been demonstrated for flooding and radiation emergency response/planning, through collaborations with HR Wallingford and Public Health England. These models have primarily involved the integration of open administrative data for activities such as place of residence, work, education and health. Now, new and emerging forms of data, such as sensor data, live and static data feeds provided via the internet, and various commercial datasets which were not previously available, provide exciting opportunities to enhance these population estimates. Such new and emerging datasets are useful because they provide near real-time information on population activity in sectors which are particularly dynamic and have previously been difficult to model, such as retail, leisure and transport. However, extracting useful intelligence from these sources, and integrating and calibrating them with existing data sources, poses significant challenges for researchers and practitioners seeking to employ them in the creation of time-specific population estimates. This project will combine new, emerging and existing datasets in order to produce enhanced time-specific population estimates for more informed decision-making and policy formulation in the health, emergency/crisis response and national security sectors. It is a collaborative project between University of Southampton, Public Health England (PHE), Health and Safety Executive (HSE) and Defence Science and Technology Laboratory (Dstl). The project will enhance existing methods and tools for harvesting, processing, integrating and calibrating new, emerging and existing data sources in order to produce time-specific population estimates. It will deliver two substantive policy demonstrator case studies with the project partners. The first case study will demonstrate the potential for using time-specific population estimates for near real-time response in emergencies; the second will explore their usefulness for modelling variation in 'normal' population distributions through space and time in order to inform longer-term planning and policy formulation. Importantly, the project will also encourage the sharing of knowledge and expertise between academia and the public...
A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey population data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of total population, male and female, and both broad and narrowly-defined age groups. In addition to the standard selection of age-group breakdowns (by male or female), the dataset provides supplemental calculated fields which combine several attributes into one (for example, the total population of persons under 18, or the number of females over 65 years of age). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users.The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.
The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS), all its associated LFS boosts and the APS boost.
The APS allows for analysis to be carried out on detailed subgroups and below regional level. In recent years (particularly with the sample size of the LFS 5 quarter dataset reducing) there has been some interest in producing a two year APS longitudinal dataset to look at any trends that may occur over a year. The APS Two-Year Longitudinal Datasets, covering 2012/13 onwards, have been deposited as a result of this work. Person- and Household-level APS datasets are also available.
For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation.
Occupation data for 2021 and 2022
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022
Abstract copyright UK Data Service and data collection copyright owner.The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS), all its associated LFS boosts and the APS boost.The APS allows for analysis to be carried out on detailed subgroups and below regional level. In recent years (particularly with the sample size of the LFS 5 quarter dataset reducing) there has been some interest in producing a two year APS longitudinal dataset to look at any trends that may occur over a year. The APS Two-Year Longitudinal Datasets, covering 2012/13 onwards, have been deposited as a result of this work. Person- and Household-level APS datasets are also available. For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation.Occupation data for 2021 and 2022The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022
Gallup Worldwide Research continually surveys residents in more than 150 countries, representing more than 98% of the world's adult population, using randomly selected, nationally representative samples. Gallup typically surveys 1,000 individuals in each country, using a standard set of core questions that has been translated into the major languages of the respective country. In some regions, supplemental questions are asked in addition to core questions. Face-to-face interviews are approximately 1 hour, while telephone interviews are about 30 minutes. In many countries, the survey is conducted once per year, and fieldwork is generally completed in two to four weeks. The Country Dataset Details spreadsheet displays each country's sample size, month/year of the data collection, mode of interviewing, languages employed, design effect, margin of error, and details about sample coverage.
Gallup is entirely responsible for the management, design, and control of Gallup Worldwide Research. For the past 70 years, Gallup has been committed to the principle that accurately collecting and disseminating the opinions and aspirations of people around the globe is vital to understanding our world. Gallup's mission is to provide information in an objective, reliable, and scientifically grounded manner. Gallup is not associated with any political orientation, party, or advocacy group and does not accept partisan entities as clients. Any individual, institution, or governmental agency may access the Gallup Worldwide Research regardless of nationality. The identities of clients and all surveyed respondents will remain confidential.
Sample survey data [ssd]
SAMPLING AND DATA COLLECTION METHODOLOGY With some exceptions, all samples are probability based and nationally representative of the resident population aged 15 and older. The coverage area is the entire country including rural areas, and the sampling frame represents the entire civilian, non-institutionalized, aged 15 and older population of the entire country. Exceptions include areas where the safety of interviewing staff is threatened, scarcely populated islands in some countries, and areas that interviewers can reach only by foot, animal, or small boat.
Telephone surveys are used in countries where telephone coverage represents at least 80% of the population or is the customary survey methodology (see the Country Dataset Details for detailed information for each country). In Central and Eastern Europe, as well as in the developing world, including much of Latin America, the former Soviet Union countries, nearly all of Asia, the Middle East, and Africa, an area frame design is used for face-to-face interviewing.
The typical Gallup Worldwide Research survey includes at least 1,000 surveys of individuals. In some countries, oversamples are collected in major cities or areas of special interest. Additionally, in some large countries, such as China and Russia, sample sizes of at least 2,000 are collected. Although rare, in some instances the sample size is between 500 and 1,000. See the Country Dataset Details for detailed information for each country.
FACE-TO-FACE SURVEY DESIGN
FIRST STAGE In countries where face-to-face surveys are conducted, the first stage of sampling is the identification of 100 to 135 ultimate clusters (Sampling Units), consisting of clusters of households. Sampling units are stratified by population size and or geography and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size, otherwise simple random sampling is used. Samples are drawn independent of any samples drawn for surveys conducted in previous years.
There are two methods for sample stratification:
METHOD 1: The sample is stratified into 100 to 125 ultimate clusters drawn proportional to the national population, using the following strata: 1) Areas with population of at least 1 million 2) Areas 500,000-999,999 3) Areas 100,000-499,999 4) Areas 50,000-99,999 5) Areas 10,000-49,999 6) Areas with less than 10,000
The strata could include additional stratum to reflect populations that exceed 1 million as well as areas with populations less than 10,000. Worldwide Research Methodology and Codebook Copyright © 2008-2012 Gallup, Inc. All rights reserved. 8
METHOD 2:
A multi-stage design is used. The country is first stratified by large geographic units, and then by smaller units within geography. A minimum of 33 Primary Sampling Units (PSUs), which are first stage sampling units, are selected. The sample design results in 100 to 125 ultimate clusters.
SECOND STAGE
Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day, and where possible, on different days. If an interviewer cannot obtain an interview at the initial sampled household, he or she uses a simple substitution method. Refer to Appendix C for a more in-depth description of random route procedures.
THIRD STAGE
Respondents are randomly selected within the selected households. Interviewers list all eligible household members and their ages or birthdays. The respondent is selected by means of the Kish grid (refer to Appendix C) in countries where face-to-face interviewing is used. The interview does not inform the person who answers the door of the selection criteria until after the respondent has been identified. In a few Middle East and Asian countries where cultural restrictions dictate gender matching, respondents are randomly selected using the Kish grid from among all eligible adults of the matching gender.
TELEPHONE SURVEY DESIGN
In countries where telephone interviewing is employed, random-digit-dial (RDD) or a nationally representative list of phone numbers is used. In select countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to reach a person in each household, spread over different days and times of day. Appointments for callbacks that fall within the survey data collection period are made.
PANEL SURVEY DESIGN
Prior to 2009, United States data were collected using The Gallup Panel. The Gallup Panel is a probability-based, nationally representative panel, for which all members are recruited via random-digit-dial methodology and is only used in the United States. Participants who elect to join the panel are committing to the completion of two to three surveys per month, with the typical survey lasting 10 to 15 minutes. The Gallup Worldwide Research panel survey is conducted over the telephone and takes approximately 30 minutes. No incentives are given to panel participants. Worldwide Research Methodology and Codebook Copyright © 2008-2012 Gallup, Inc. All rights reserved. 9
QUESTION DESIGN
Many of the Worldwide Research questions are items that Gallup has used for years. When developing additional questions, Gallup employed its worldwide network of research and political scientists1 to better understand key issues with regard to question development and construction and data gathering. Hundreds of items were developed, tested, piloted, and finalized. The best questions were retained for the core questionnaire and organized into indexes. Most items have a simple dichotomous ("yes or no") response set to minimize contamination of data because of cultural differences in response styles and to facilitate cross-cultural comparisons.
The Gallup Worldwide Research measures key indicators such as Law and Order, Food and Shelter, Job Creation, Migration, Financial Wellbeing, Personal Health, Civic Engagement, and Evaluative Wellbeing and demonstrates their correlations with world development indicators such as GDP and Brain Gain. These indicators assist leaders in understanding the broad context of national interests and establishing organization-specific correlations between leading indexes and lagging economic outcomes.
Gallup organizes its core group of indicators into the Gallup World Path. The Path is an organizational conceptualization of the seven indexes and is not to be construed as a causal model. The individual indexes have many properties of a strong theoretical framework. A more in-depth description of the questions and Gallup indexes is included in the indexes section of this document. In addition to World Path indexes, Gallup Worldwide Research questions also measure opinions about national institutions, corruption, youth development, community basics, diversity, optimism, communications, religiosity, and numerous other topics. For many regions of the world, additional questions that are specific to that region or country are included in surveys. Region-specific questions have been developed for predominantly Muslim nations, former Soviet Union countries, the Balkans, sub-Saharan Africa, Latin America, China and India, South Asia, and Israel and the Palestinian Territories.
The questionnaire is translated into the major conversational languages of each country. The translation process starts with an English, French, or Spanish version, depending on the region. One of two translation methods may be used.
METHOD 1: Two independent translations are completed. An independent third party, with some knowledge of survey research methods, adjudicates the differences. A professional translator translates the final version back into the source language.
METHOD 2: A translator
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can download data or view data tables on topics related to the labor force of the United States. Background Current Population Survey is a joint effort between the Bureau of Labor Statistics and the Census Bureau. It provides information and data on the labor force of the United States, such as: employment, unemployment, earnings, hours of work, school enrollment, health, employee benefits and income. The CPS is conducted monthly and has a sample of approximately 50,000 households. It is representative of the non-institutionalized US population. The sample provides estimates for the nation as a whole and serves as part of model-based estimates for individual states and other geographic areas. User Functionality Users can download data sets or view data tables on their topic of interest. Data can be organized by a variety of demographic variables, including: sex, age, race, marital status and educational attainment. Data is available on a national or state level. Data Notes The CPS is conducted monthly and has a sample of approximately 50,000 households. It is representative of the non-institutionalized US population. The sample provides estimates for th e nation as a whole and serves as part of model-based estimates for individual states and other geographic areas.
The National Population Database (NPD) for Northern Ireland is a point-based Geographical Information System (GIS) dataset that combines locational information from Ordnance Survey Northern Ireland (OSNI) with population information about those locations, mainly sourced from Northern Irish government statistics. The points represent individual buildings allowing the NPD NI to provide detailed local analysis for anywhere in Northern Ireland.
The Health and Safety Laboratory (HSL) working with Staffordshire University originally created the NPD for Great Britain in 2004 to help its parent organisation, the Health and Safety Executive (HSE), assess the risks to society of major hazard sites e.g. oil refineries, chemical works and gas holders. Of particular interest to HSE were ‘sensitive’ populations e.g. schools and hospitals where the people at those locations may be more vulnerable to harm and potentially harder to evacuate in an emergency. The data for the NPD NI includes residential, schools and colleges, hospitals and workplace layers.
The NPD NI was created using various datasets from OSNI and government organisations and contains other intellectual property so is only available under a license and for a fee. Please contact the HSL GIS team if you would like to discuss gaining access to the sample or full dataset.
A broad and generalized selection of 2012-2016 US Census Bureau 2016 5-year American Community Survey poverty data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of populations living below the poverty level, as grouped by age, sex, education, workforce status, and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
Since the 1980s, the Office of Refugee Resettlement (ORR) has conducted the Annual Survey of Refugees (ASR), which collects information on refugees during their first five years after arrival in the U.S. The ASR is the only scientifically-collected source of national data on refugees’ progress toward self-sufficiency and integration. ORR uses the ASR results alongside other information sources to fulfill its Congressionally-mandated reporting following the Refugee Act of 1980. Historically, the microdata from these surveys have generally been unavailable to researchers.
In the spring of 2018, ORR completed its 51st Annual Survey of Refugees (ASR). The data from the ASR offer a window into respondents’ first five years in the United States and show the progress that refugee families made towards learning English, participating in the workforce, and establishing permanent residence.
National coverage
Households and individuals
The population of interest – the study population – for the 2017 ASR is defined as refugees entering the U.S. between FY 2012 and FY 2016, inclusive, who are at ages 16 and over at the time of the 2017 ASR interview3. Because the interviews were conducted in early 2018, the population includes a small number of refugees younger than 16 at the time of arrival to the U.S.
While this covers five distinct fiscal years of refugee entrants, there is special policy/analytic interest in collapsing years into three domains as follows:
• Cohort 1 – Refugees entering FY 2012 and FY 2013,
• Cohort 2 – Refugees entering FY 2014 and FY 2015, and
• Cohort 3 – Refugees entering FY 2016
Sample survey data [ssd]
The 2017 ASR employed a stratified probability sample design of refugees. The first stage of selection was the household (PA) and the second stage was the selection of persons within households. Principal features of the sample design are highlighted below.
The 2017 ASR design replicated the 2016 ASR design, which used a full cross-sectional national sample of refugees entering within the past five years. This section documents the research design, data collection and data processing protocols. It also presents outcomes (e.g., sample sizes) and paradata results such as response rates.
The population of interest - the study population - for the 2017 ASR is defined as refugees entering the U.S. between FY 2013 and FY 2017, inclusive, who are at ages 16 and over at the time of the 2018 ASR interview. Because the interviews were conducted in early 2018, the population includes a small number of refugee respondents younger than 16 at the time of arrival to the U.S.
The 2017 ASR targeted 1,500 completed interviews from refugee households entering the U.S. between FY 2012-2016. The sample was designed to allow for separate estimates and analyses from each of the three designated cohorts. Moreover, the design needed to accommodate both household- and person-level analyses.
The sample was drawn as fresh cross sections by cohort; there was no longitudinal component. The survey objectives required that – in addition to primary stratification by cohort – the sample of households (i.e., PAs) be stratified at least by year of entry and geographic region of origin.
The 2017 ASR sampling frame was ORR’s Refugee Arrivals Data System (RADS) dataset.
The ASR design targeted equal numbers of household interviews by cohort. This means that there was an oversample of households for FY 2016, the most recent year of entry. This allocation prioritizes the statistical precision to cohorts.
Within each of the three cohort strata, the following factors were used for stratification: year of arrival (for cohorts 1 and 2 only), geographic region, native language, age group, gender, and family size at arrival (1, 2, 3+ persons). Missing contact information status was also used as a stratification variable for cohort 3 due to an unusual degree of missing contact information among FY 2017 arrivals. Proportionate stratified samples were drawn independently within cohort.
The 2017 ASR employed a sample management plan integrating the sample design and field protocols to include locating subjects, contacting them and conducting telephone interviews. A sample of 6,006 PAs was released at the start of data collection. A reserve sample of about 4,500 was held in case some portion was needed to meet the interview target of 1,500.
Computer Assisted Telephone Interview [cati]
An overall response rate of 25 percent was achieved. The response rate was driven by the ability to locate and speak to (1515+534)/6006 = 32 percent of the sample, meaning that two thirds of the sample could neither be located nor (if located) successfully contacted.
The overall response rates decreased with time since arrival to the U.S., varying from 18 percent for FY 2012-13 refugees to 26 percent for FY 2014-15 refugees and a high of 34 percent for FY 2016 refugees.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Refer to the current geographies boundaries table for a list of all current geographies and recent updates. This dataset is the definitive version of the annually released statistical area 2 (SA2) boundaries as at 1 January 2025 as defined by Stats NZ. This version contains 2,395 SA2s (2,379 digitised and 16 with empty or null geometries (non-digitised)). SA2 is an output geography that provides higher aggregations of population data than can be provided at the statistical area 1 (SA1) level. The SA2 geography aims to reflect communities that interact together socially and economically. In populated areas, SA2s generally contain similar sized populations. The SA2 should: form a contiguous cluster of one or more SA1s, excluding exceptions below, allow the release of multivariate statistics with minimal data suppression, capture a similar type of area, such as a high-density urban area, farmland, wilderness area, and water area, be socially homogeneous and capture a community of interest. It may have, for example: a shared road network, shared community facilities, shared historical or social links, or socio-economic similarity, form a nested hierarchy with statistical output geographies and administrative boundaries. It must: be built from SA1s, either define or aggregate to define SA3s, urban areas, territorial authorities, and regional councils. SA2s in city council areas generally have a population of 2,000–4,000 residents while SA2s in district council areas generally have a population of 1,000–3,000 residents. In major urban areas, an SA2 or a group of SA2s often approximates a single suburb. In rural areas, rural settlements are included in their respective SA2 with the surrounding rural area. SA2s in urban areas where there is significant business and industrial activity, for example ports, airports, industrial, commercial, and retail areas, often have fewer than 1,000 residents. These SA2s are useful for analysing business demographics, labour markets, and commuting patterns. In rural areas, some SA2s have fewer than 1,000 residents because they are in conservation areas or contain sparse populations that cover a large area. To minimise suppression of population data, small islands with zero or low populations close to the mainland, and marinas are generally included in their adjacent land-based SA2. Zero or nominal population SA2s To ensure that the SA2 geography covers all of New Zealand and aligns with New Zealand’s topography and local government boundaries, some SA2s have zero or nominal populations. These include: SA2s where territorial authority boundaries straddle regional council boundaries. These SA2s each have fewer than 200 residents and are: Arahiwi, Tiroa, Rangataiki, Kaimanawa, Taharua, Te More, Ngamatea, Whangamomona, and Mara. SA2s created for single islands or groups of islands that are some distance from the mainland or to separate large unpopulated islands from urban areas SA2s that represent inland water, inlets or oceanic areas including: inland lakes larger than 50 square kilometres, harbours larger than 40 square kilometres, major ports, other non-contiguous inlets and harbours defined by territorial authority, and contiguous oceanic areas defined by regional council. SA2s for non-digitised oceanic areas, offshore oil rigs, islands, and the Ross Dependency. Each SA2 is represented by a single meshblock. The following 16 SA2s are held in non-digitised form (SA2 code; SA2 name): 400001; New Zealand Economic Zone, 400002; Oceanic Kermadec Islands, 400003; Kermadec Islands, 400004; Oceanic Oil Rig Taranaki, 400005; Oceanic Campbell Island, 400006; Campbell Island, 400007; Oceanic Oil Rig Southland, 400008; Oceanic Auckland Islands, 400009; Auckland Islands, 400010 ; Oceanic Bounty Islands, 400011; Bounty Islands, 400012; Oceanic Snares Islands, 400013; Snares Islands, 400014; Oceanic Antipodes Islands, 400015; Antipodes Islands, 400016; Ross Dependency. SA2 numbering and naming Each SA2 is a single geographic entity with a name and a numeric code. The name refers to a geographic feature or a recognised place name or suburb. In some instances where place names are the same or very similar, the SA2s are differentiated by their territorial authority name, for example, Gladstone (Carterton District) and Gladstone (Invercargill City). SA2 codes have six digits. North Island SA2 codes start with a 1 or 2, South Island SA2 codes start with a 3 and non-digitised SA2 codes start with a 4. They are numbered approximately north to south within their respective territorial authorities. To ensure the north–south code pattern is maintained, the SA2 codes were given 00 for the last two digits when the geography was created in 2018. When SA2 names or boundaries change only the last two digits of the code will change. High-definition version This high definition (HD) version is the most detailed geometry, suitable for use in GIS for geometric analysis operations and for the computation of areas, centroids and other metrics. The HD version is aligned to the LINZ cadastre. Macrons Names are provided with and without tohutō/macrons. The column name for those without macrons is suffixed ‘ascii’. Digital data Digital boundary data became freely available on 1 July 2007. Further information To download geographic classifications in table formats such as CSV please use Ariā For more information please refer to the Statistical standard for geographic areas 2023. Contact: geography@stats.govt.nz
The RMS conducted in the Democratic Republic of the Congo between February and April 2023 aimed to gather household-level data on forcibly displaced and stateless individuals. Its objective was to monitor impact and outcome indicators related to education, healthcare, livelihoods, protection concerns, shelter, and water and sanitation, contributing to UNHCR’s reporting against multi-year strategies to various stakeholders. The survey can be implemented in diverse operational contexts and utilizes a standard structured questionnaire, adaptable for standalone use or integration with other data collection efforts. Data collection includes indicators at both household and individual levels, ensuring statistical representativeness. The population of interest comprised refugees, asylum-seekers, internally displaced people (IDPs), refugee returnees, and host communities, each sampled separately. Refugees and asylum-seekers were sampled from UNHCR’s registration database (proGres), stratified by areas. Refugee returnees, IDPs, and host communities were sampled using lists provided by the operation in each area, with systematic sampling conducted if sampled households were unreachable. Limited data collection occurred in some areas due to restricted lists or security concerns. In total, 14,040 households were sampled, with 13,570 households interviewed. Of these, 5,456 out of 5,740 sampled refugee and asylum-seeker households were interviewed, along with 4,865 IDP households, 949 refugee returnee households out of 1,090 sampled, and 2,300 host community households out of 2,315 sampled. Data collection utilized Computer-Assisted Personal Interviews (CAPI).
Household
Sample survey data [ssd]
The sampling strategy for the RMS 2023 involved dividing refugee settlements into four zones, randomly selecting blocks within each zone, and systematically sampling households within these blocks. For phone-based interviews, lists provided by UNHCR were randomized, and respondents were selected proportionately by nationality. This approach ensured comprehensive coverage and statistical representativeness.
Face-to-face interview
Income, food consumption, expenditures, assets, community relations, wellbeing, resilience, mental health, health, accommodation, protection, and education
A broad and generalized selection of 2011-2015 US Census Bureau 2015 5-year American Community Survey education data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of educational attaiment by grade level and sex (for all persons 25 years and older), plus enrollment estimates at key educational levels (for the universe of all persons 3+ years old). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.
The UNHCR Results Monitoring Survey (RMS) is a household-level survey on persons of concern (PoC) to UNHCR directly or indirectly assisted by UNHCR, including refugees and asylum seekers, internally displaced persons, returnees, stateless and others of concern. The objective of the survey is to monitor impact and outcome level indicators on education, healthcare, livelihoods, protection concerns, shelter, and water and sanitation. The results contribute to an evidence base for reporting against UNHCR’s multi-year strategies to key stakeholders.
The RMS can be implemented in any operational context. A standard structured questionnaire has been developed for the RMS, which can be conducted as a stand-alone survey or flexibly integrated with other data collection exercises. The data includes indicators collected at both the household and individual (household-member) level, and results are statistically representative.
This RMS took place in South Africa between April and June 2022. The population of interest included all PoCs to UNHCR in South Africa, and the sample frame was taken from UNHCR's registration datasase (ProGres). Data subjects were intereviewed over the phone. This dataset is the anonymous version of the original data.
South Africa
Individuals and Households
Persons of concerns (PoCs) to UNHCR in South Africa.
Sample survey data [ssd]
UNHCR's registration database (ProGres) was used as a sample frame to identify households of interest. Households were stratified by registration group and sex of head of household. A total of 1,050 households were contacted for an interview via telephone. Enumerators were able to secure 388 interviews, of which 385 were completed. Non-response was due to three main factors: inability to reach the respondent with the phone number provided (about 50%), no answer (about 30%) and households reached were not persons of concern to UNHCR, most likely the wrong number (about 6%).
Computer Assisted Telephone Interview [cati]
A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey education data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection is not comprehensive, but allows a first-level characterization of educational attaiment by grade level and sex (for all persons 25 years and older), plus enrollment estimates at key educational levels (for the universe of all persons 3+ years old). The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or housing data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The replication package contains the data files, Stata do-files, and R script necessary to replicate the illustrations that appear in "Taking Distributions Seriously: On the Interpretation of the Estimates of Interactive Nonlinear Models" by Andrei Zhirnov, Mert Moral, and Evgeny Sedashov. Abstract: In recent decades, political science literature has experienced significant growth in the popularity of non-linear models with multiplicative interaction terms. When one or more constitutive variables are not binary, most studies report the marginal effect of the variable of interest at its sample mean while allowing the other constitutive variable/s to vary along its range and holding all other covariates constant at their means, modes, or medians. In this article, we argue that this conventional approach is not always the most suitable since the marginal effect of a variable at its sample mean might not be sufficiently representative of its prevalent effect at a specific value of the conditioning variable and might produce excessively model-dependent predictions. We propose two procedures to help researchers gain a better understanding of how the typical effect of the variable of interest varies as a function of the conditioning variable: (1) computing and plotting the marginal effects at all in-sample combinations of the values of the constitutive variables, and (2) computing and plotting what we call the "Distribution-Weighted Average Marginal Effect" over the values of the conditioning variable.
The NPS 2019/20 with sex-disaggregated data (NPS-SDD 2019/20) is an off-shoot survey undertaken by following the entire NPS 2014/15 “Extended Panel” sample. The NPS-SDD 2019/20 is the first Extended Panel with sex-disaggregated data survey, collecting information on a wide range of topics including agricultural production, non-farm income generating activities, individual rights to plots, consumption expenditures, and a wealth of other socioeconomic characteristics.
Designed for analysis of key indicators at the national level.
Households; Individuals;
The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.
Sample survey data [ssd]
The sample design for the NPS-SDD 2019/20 targeted the sub-sample of households from the initial NPS cohort originating in 2008/09 and subsequently surveyed in all four consecutive rounds, considered the “Extended Panel”. This consisted of 989 households from the NPS 2014/15 sample to be tracked and interviewed in the NPS-SDD 2019/20.
It is worth mentioning that the sample design included complete households that could not be interviewed in NPS 2014/15, excluding those households that had refused to be interviewed in NPS 2014/15. This constituted an additional 8 households. Individuals meeting the eligibility requirement that were interviewed as part of the NPS 2012/13, but were not located and interviewed during the NPS 2014/15, were also included in this round if located. Additionally, individuals from NPS 2014/15 who moved into another This constituted an additional 158 individuals assigned to their last known associated household.
The eligibility requirement for inclusion in the NPS is defined as any household member aged 15 years and above, excluding live-in servants. Households with at least one eligible member were completely interviewed, including any non-eligible members present in the household. Any household or eligible members that had either moved or split away from a primary household were tracked and interviewed in their new location.
Additionally, the final sample for NPS-SDD 2019/20 included any resulting split-off households identified during data collection (i.e. a previous NPS member who had moved or started another household). Ultimately, the final sample size for NPS-SDD 2019/20 was 5,587 individuals in 1,184 households.
Computer Assisted Personal Interview [capi]
The NPS-SDD 2019/20 consists of four survey instruments: a Household Questionnaire, Agriculture Questionnaire, Livestock Questionnaire, and a Community Questionnaire.
The Household Questionnaire is comprised of thematic sections. This questionnaire allows for the construction of a full consumption-based welfare measure, permitting distributional and incidence analysis. Data within the household instrument is structured around a household panel survey, and will add additional living standards measure in the form of sex-disaggregated data, this additional level of information will add value in the analysis of intra-household dynamics and revealing a more refined picture of welfare of Tanzania. To protect the confidentiality of respondents, sensitive information has been masked in or removed from the public household data files.
The NPS Extended Panel also includes a robust instrument on household agriculture activities. It offers an essential data source to understand the dynamic role of agriculture to household welfare. Agriculture information is collected at both the plot and crop level on inputs, production and sales, consistent with key phases in the agricultural value chain. The NPS Extended Panel likewise recognizes the importance of livestock activities to many households. As with the integrated instrument on agriculture, the NPS contains a robust instrument to capture details on these activities. The Livestock Questionnaire is administered to all households participating in these activities and asks about the inputs, outputs, labour, and sales related to these activities. Table 3 provides a more comprehensive list of the sections found within the Livestock Questionnaire.
The Community Questionnaire collects information on physical and economic infrastructure and events in surveyed communities . Responses to the community questionnaire are provided through a group discussion among key informants within the community.
Each of the NPS questionnaires were developed in collaboration with line ministries and donor partners, including the Technical Committee, over a period of several months. The NBS solicited feedback from various stakeholders in regards to survey content and design paying due consideration to comparability with previous panel rounds.
Additional data cleaning was conducted as the final stage of the data processing. Further adjustment of the data post-entry was conducted under the principle of absolute certainty where adjustments must be evidence-based and correction values true beyond a reasonable doubt. As such, the resulting final data files may still contain some inconsistencies and outliers. Handling of these values is thus left entirely to the data user. Throughout the data processing system, versions of the data are archived at all key steps and all checking and cleaning syntax documented and archived.
As with most panel surveys a certain portion of panel respondents are not able to be re-interviewed over time. This attrition of panel respondents can lead to attrition bias where respondents drop out of the survey non-randomly and where the attrition is correlated with variables of interest. The Tanzania NPS has fortunately maintained low attrition over the rounds, thus minimizing the potential for attrition bias within the datasets.
By the end of data collection, 974 of the 989 households had been located and 908 households were successfully re-interviewed for a total household attrition rate of 9.2 percent. At the individual level, 2,621 of the 3,188 eligible household members (over the age of 15 years and not a household servant) were successfully re-interviewed during the NPS-SDD 2019/20, equating to an individual attrition rate of roughly 17.7 percent between the NPS 2014/15 and the NPS-SDD 2019/20 (for extended panel households).
The sample of households selected in the NPS-SDD 2019/2020 is only one of many samples that could have been selected from the same population. Each alternative sample would yield slightly different from the results of the selected sample. Sampling errors are a measure of the variability between all possible samples and although the degree of variability cannot be directly observed, it can be estimated from the survey results and statistically evaluated. A sampling error can be measured in terms of the standard error for a particular statistic. The computer software program STATA used estat effects to calculate sampling errors for the NPS-SDD 2019/2020. In addition to the standard error, STATA computed the design effect (DEFF) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFF value of 1.0 indicates that the sample design is as efficient as a simple random sample, while a value greater than 1.0 indicates the increase in the sampling error is due to the use of a more complex and less statistically efficient (but perhaps more logistically efficient) design. STATA also computed the relative error and confidence limits for the estimates. Sampling errors for the NPS-SDD 2019/2020 are calculated for selected variables considered to be of primary interest at the household and individual levels. For each variable of interest, the value of the statistic (R), its standard error (SE), the number of cases, the design effect (DEFF), the relative standard error (SE/R), and the 95 percent confidence limits (R±2SE) are provided in Tables 1-10 in the BID. The DEFF is considered undefined when the standard error in a simple random sample is zero (when the estimate is close to 0 or 1).
The COVID-19 pandemic is first and foremost a health shock, but the secondary economic shock is equally formidable. Access to timely, policy-relevant information on the awareness of, responses to and impacts of the health situation and related restrictions are critical to effectively design, target and evaluate programme and policy interventions. This research project investigates the main socioeconomic impacts of the pandemic on UNHCR people of concern (PoC) – and nationals where possible – in terms of access to information, services and livelihoods opportunities. Three geographic regions were taken into consideration: Southern Mexico, Mexico City and the Northern and Central Industrial Corridor. Two rounds of data collection took place for this survey, with the purpose of following up with the respondents.
Southern Mexico, Mexico City, Northern and Central Mexico
Household
Sample survey data [ssd]
The ProGres database served as the sampling frame due to the unavailability of other reliable sources. Likewise, the sample was stratified by location and population groups based on country of origin helping to account for the different economic realities from one part of the country to another, as well as differences between nationalities. Following discussion with the UNHCR country team and regional bureau, three geographic regions were presented for consideration : a) Southern Mexico; b) Mexico City; and c) the Northern and Central Industrial Corridor. Additionally, partners expressed interest in the Venezuelan community as a separate group, primarily residing in Mexico City, Monterrey and Cancun. The population of the four groups represents 67% of the active registered refugees in Mexico. Out of the 35,140 refugee households in the four regions, 26,688 families have at least one phone number representing an overall high rate of phone penetration. Across regions of interest, Hondurans make up the single largest group of PoC in Southern Mexico (38%), and the Northern and Central Industrial Corridor (43%), whereas Venezuelans make up over half of the PoC population in Mexico City (52%). Based on the above, a sampling strategy based on four separate strata was proposed in order to adequately represent the regions and sub-groups of interest: 1. Southern Mexico – Honduran and El Salvadoran PoC population 2. Mexico City – Honduran, El Salvadoran and Cuban PoC population 3. Northern and Central Industrial Corridor – Hondurans and El Salvadoran PoC population 4. Venezuelan Population – Mexico City, Monterey (Nuevo Leon) and Cancun (Quintana Roo) A comparable sub-sample of the national population in the same locations PoC were sampled was also generated using random digit dialing (RDD). This was made possible through the inclusion of location-based area codes in the list of phone numbers, however selected participants were also asked about their current location as a first filter to proceed with the phone survey to ensure a comparable national sub-sample.
Computer Assisted Telephone Interview [cati]
Questionnaire contained the following sections: consent, knowledge, behaviour, access, employment, income, food security, concerns, resilience, networks, demographics
The FAO has developed a monitoring system in 26 food crisis countries to better understand the impacts of various shocks on agricultural livelihoods, food security and local value chains. The Monitoring System consists of primary data collected from households on a periodic basis (more or less every four months, depending on seasonality). The FAO launched a round 3 of data collection in Colombia between 22 July and 22 August 2022. Data were conducted through face-to-face interviews in ten departments of Colombia: Antioquia, Arauca, Bolívar, Boyacá, Cesar, Chocó, Córdoba, La Guajira, Nariño and Putumayo. A total of 3240 households were surveyed, 324 rural households in each department. For more information, please go to https://data-in-emergencies.fao.org/pages/monitoring
National coverage
Households
Sample survey data [ssd]
Two-stage sampling was applied – cluster sampling based on the geostatistical sample frame provided by the government of Colombia, followed by simple random sampling to ensure that all households in the targeted cluster had an equal chance of being selected. A quota was not applied to sub-groups of interest at the regional level and no weights were needed for population sub-groups by activity type. The surveyed agricultural households were not represented in the sample. Therefore, the crop and livestock sections should be considered descriptive, not representative.
Face-to-face paper [f2f]
A link to the questionnaire has been provided in the documentations tab.
The datasets have been edited and processed for analysis by the Needs Assessment team at the Office of Emergency and Resilience, FAO, with some dashboards and visualizations produced. For more information, see https://data-in-emergencies.fao.org/pages/countries.
The data was collected using the High Frequency Survey (HFS). The survey allowes for better reaching populations of interest with remote modalities (phone interviews and self-administered surveys online) and improved sampling guidance and strategies. It includes a set of standardized regional core questions while allowing for operation-specific customizations. The core questions revolve around populations of interest's demographic profile, difficulties during their journey, specific protection needs, access to documentation & regularization, health access, coverage of basic needs, coping capacity & negative mechanisms used, and well-being & local integration. The data collected has been used by countries in their protection monitoring analysis and vulnerability analysis.
Household
Sample survey data [ssd]
In the absence of a well-developed sampling-frame for forcibly displaced populations in the Americas, the High Frequency Survey employed a multi-frame sampling strategy where respondents entered the sample through one of three channels: (i) those who opt-in to complete an online self-administered version of the questionnaire which was widely circulated through refugee social media; (ii) persons identified through UNHCR and partner databases who were remotely-interviewed by phone; and (iii) random selection from the cases approaching UNHCR for registration or assistance.
Computer Assisted Personal Interview [capi]
Questionnaire contained the following sections: Household Demographics, vulnerability, basic Needs, coping capacity, well-being.