100+ datasets found
  1. f

    Demographic distribution of the target population and the study sample.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Feb 20, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kungu, Stella; Musyimi, Robert; Tigoi, Caroline C.; Scott, J. Anthony G.; Abdullahi, Osman; Mugo, Daisy; Karani, Angela; Jomo, Jane; Wanjiru, Eva; Lipsitch, Marc (2012). Demographic distribution of the target population and the study sample. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001156064
    Explore at:
    Dataset updated
    Feb 20, 2012
    Authors
    Kungu, Stella; Musyimi, Robert; Tigoi, Caroline C.; Scott, J. Anthony G.; Abdullahi, Osman; Mugo, Daisy; Karani, Angela; Jomo, Jane; Wanjiru, Eva; Lipsitch, Marc
    Description

    Demographic distribution of the target population and the study sample.

  2. World Health Survey 2003 - Belgium

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    • +2more
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2025). World Health Survey 2003 - Belgium [Dataset]. https://datacatalog.ihsn.org/catalog/5200
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2003
    Area covered
    Belgium
    Description

    Abstract

    Different countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers.

    The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters.

    The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules.

    The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.

    Geographic coverage

    The survey sampling frame must cover 100% of the country's eligible population, meaning that the entire national territory must be included. This does not mean that every province or territory need be represented in the survey sample but, rather, that all must have a chance (known probability) of being included in the survey sample.

    There may be exceptional circumstances that preclude 100% national coverage. Certain areas in certain countries may be impossible to include due to reasons such as accessibility or conflict. All such exceptions must be discussed with WHO sampling experts. If any region must be excluded, it must constitute a coherent area, such as a particular province or region. For example if ¾ of region D in country X is not accessible due to war, the entire region D will be excluded from analysis.

    Analysis unit

    Households and individuals

    Universe

    The WHS will include all male and female adults (18 years of age and older) who are not out of the country during the survey period. It should be noted that this includes the population who may be institutionalized for health reasons at the time of the survey: all persons who would have fit the definition of household member at the time of their institutionalisation are included in the eligible population.

    If the randomly selected individual is institutionalized short-term (e.g. a 3-day stay at a hospital) the interviewer must return to the household when the individual will have come back to interview him/her. If the randomly selected individual is institutionalized long term (e.g. has been in a nursing home the last 8 years), the interviewer must travel to that institution to interview him/her.

    The target population includes any adult, male or female age 18 or over living in private households. Populations in group quarters, on military reservations, or in other non-household living arrangements will not be eligible for the study. People who are in an institution due to a health condition (such as a hospital, hospice, nursing home, home for the aged, etc.) at the time of the visit to the household are interviewed either in the institution or upon their return to their household if this is within a period of two weeks from the first visit to the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SAMPLING GUIDELINES FOR WHS

    Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.

    The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.

    The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.

    All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO

    STRATIFICATION

    Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.

    Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).

    Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.

    MULTI-STAGE CLUSTER SELECTION

    A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.

    In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.

    In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.

    It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which

  3. NSDUH 2018 Sample Design Report

    • healthdata.gov
    • data.virginia.gov
    • +2more
    csv, xlsx, xml
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NSDUH 2018 Sample Design Report [Dataset]. https://healthdata.gov/SAMHSA/NSDUH-2018-Sample-Design-Report/e6hy-grmx
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Jul 14, 2025
    Description

    This report details the 2018 sample design for the NSDUH and covers the design overview, target population, and stages of sample selection. The design overview describes how the sample design remains consistent with NSDUH’s designs since 1991 and has extended coverage of the sample since then to include additional resident populations. The 2018 target population for this report comprises a civilian, noninstitutionalized population aged 12 years or older residing within the 50 states and the District of Columbia. There are three stages of sample selection that are explained in terms of how to select and aggregate the appropriate census tracts and segments based on state sampling regions (SSRs) for each state.

  4. f

    Table of target populations vs analytical sample demographics.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pereira, Snehal M. Pinto; Newlands, Fiona; Heyman, Isobel; Dalrymple, Emma; Segal, Terry; Ford, Tamsin; Ladhani, Shamez N.; Semple, Malcolm G.; Stephenson, Terence; Buszewicz, Marta; Nugawela, Manjula D.; Chalder, Trudie; McOwat, Kelsey; Simmons, Ruth; Shafran, Roz (2023). Table of target populations vs analytical sample demographics. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001055187
    Explore at:
    Dataset updated
    Nov 21, 2023
    Authors
    Pereira, Snehal M. Pinto; Newlands, Fiona; Heyman, Isobel; Dalrymple, Emma; Segal, Terry; Ford, Tamsin; Ladhani, Shamez N.; Semple, Malcolm G.; Stephenson, Terence; Buszewicz, Marta; Nugawela, Manjula D.; Chalder, Trudie; McOwat, Kelsey; Simmons, Ruth; Shafran, Roz
    Description

    Table of target populations vs analytical sample demographics.

  5. European Union Statistics on Income and Living Conditions 2007 -...

    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2019). European Union Statistics on Income and Living Conditions 2007 - Cross-Sectional User Database - Hungary [Dataset]. https://catalog.ihsn.org/index.php/catalog/5675
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    Time period covered
    2007
    Area covered
    Hungary
    Description

    Abstract

    In 2007, the EU-SILC instrument covered all EU Member States plus Iceland, Turkey, Norway, Switzerland and Croatia. EU-SILC has become the EU reference source for comparative statistics on income distribution and social exclusion at European level, particularly in the context of the "Program of Community action to encourage cooperation between Member States to combat social exclusion" and for producing structural indicators on social cohesion for the annual spring report to the European Council. The first priority is to be given to the delivery of comparable, timely and high quality cross-sectional data.

    There are two types of datasets: 1) Cross-sectional data pertaining to fixed time periods, with variables on income, poverty, social exclusion and living conditions. 2) Longitudinal data pertaining to individual-level changes over time, observed periodically - usually over four years.

    Social exclusion and housing-condition information is collected at household level. Income at a detailed component level is collected at personal level, with some components included in the "Household" section. Labor, education and health observations only apply to persons aged 16 and over. EU-SILC was established to provide data on structural indicators of social cohesion (at-risk-of-poverty rate, S80/S20 and gender pay gap) and to provide relevant data for the two 'open methods of coordination' in the field of social inclusion and pensions in Europe.

    The sixth revision of the 2007 Cross-Sectional User Database is documented here.

    Geographic coverage

    National

    Analysis unit

    • Households;
    • Individuals 16 years and older.

    Universe

    The survey covered all household members over 16 years old. Persons living in collective households and in institutions are generally excluded from the target population.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    On the basis of various statistical and practical considerations and the precision requirements for the most critical variables, the minimum effective sample sizes to be achieved were defined. Sample size for the longitudinal component refers, for any pair of consecutive years, to the number of households successfully interviewed in the first year in which all or at least a majority of the household members aged 16 or over are successfully interviewed in both the years.

    For the cross-sectional component, the plans are to achieve the minimum effective sample size of around 131.000 households in the EU as a whole (137.000 including Iceland and Norway). The allocation of the EU sample among countries represents a compromise between two objectives: the production of results at the level of individual countries, and production for the EU as a whole. Requirements for the longitudinal data will be less important. For this component, an effective sample size of around 98.000 households (103.000 including Iceland and Norway) is planned.

    Member States using registers for income and other data may use a sample of persons (selected respondents) rather than a sample of complete households in the interview survey. The minimum effective sample size in terms of the number of persons aged 16 or over to be interviewed in detail is in this case taken as 75 % of the figures shown in columns 3 and 4 of the table I, for the cross-sectional and longitudinal components respectively.

    The reference is to the effective sample size, which is the size required if the survey were based on simple random sampling (design effect in relation to the 'risk of poverty rate' variable = 1.0). The actual sample sizes will have to be larger to the extent that the design effects exceed 1.0 and to compensate for all kinds of non-response. Furthermore, the sample size refers to the number of valid households which are households for which, and for all members of which, all or nearly all the required information has been obtained. For countries with a sample of persons design, information on income and other data shall be collected for the household of each selected respondent and for all its members.

    At the beginning, a cross-sectional representative sample of households is selected. It is divided into say 4 sub-samples, each by itself representative of the whole population and similar in structure to the whole sample. One sub-sample is purely cross-sectional and is not followed up after the first round. Respondents in the second sub-sample are requested to participate in the panel for 2 years, in the third sub-sample for 3 years, and in the fourth for 4 years. From year 2 onwards, one new panel is introduced each year, with request for participation for 4 years. In any one year, the sample consists of 4 sub-samples, which together constitute the cross-sectional sample. In year 1 they are all new samples; in all subsequent years, only one is new sample. In year 2, three are panels in the second year; in year 3, one is a panel in the second year and two in the third year; in subsequent years, one is a panel for the second year, one for the third year, and one for the fourth (final) year.

    According to the Commission Regulation on sampling and tracing rules, the selection of the sample will be drawn according to the following requirements:

    1. For all components of EU-SILC (whether survey or register based), the crosssectional and longitudinal (initial sample) data shall be based on a nationally representative probability sample of the population residing in private households within the country, irrespective of language, nationality or legal residence status. All private households and all persons aged 16 and over within the household are eligible for the operation.
    2. Representative probability samples shall be achieved both for households, which form the basic units of sampling, data collection and data analysis, and for individual persons in the target population.
    3. The sampling frame and methods of sample selection shall ensure that every individual and household in the target population is assigned a known and non-zero probability of selection.
    4. By way of exception, paragraphs 1 to 3 shall apply in Germany exclusively to the part of the sample based on probability sampling according to Article 8 of the Regulation of the European Parliament and of the Council (EC) No 1177/2003 concerning

    Community Statistics on Income and Living Conditions. Article 8 of the EU-SILC Regulation of the European Parliament and of the Council mentions: 1. The cross-sectional and longitudinal data shall be based on nationally representative probability samples. 2. By way of exception to paragraph 1, Germany shall supply cross-sectional data based on a nationally representative probability sample for the first time for the year 2008. For the year 2005, Germany shall supply data for one fourth based on probability sampling and for three fourths based on quota samples, the latter to be progressively replaced by random selection so as to achieve fully representative probability sampling by 2008. For the longitudinal component, Germany shall supply for the year 2006 one third of longitudinal data (data for year 2005 and 2006) based on probability sampling and two thirds based on quota samples. For the year 2007, half of the longitudinal data relating to years 2005, 2006 and 2007 shall be based on probability sampling and half on quota sample. After 2007 all of the longitudinal data shall be based on probability sampling.

    Detailed information about sampling is available in Quality Reports in Documentation.

    Mode of data collection

    Mixed

  6. Number of rice sample of varieties and production areas.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hungyen Chen; Hirohisa Kishino (2023). Number of rice sample of varieties and production areas. [Dataset]. http://doi.org/10.1371/journal.pone.0141117.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Hungyen Chen; Hirohisa Kishino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Aic, Aichinokaori; Aki, Akitakomachi; Don, Dontokoi; Hae, Haenuki; Han, Hanaechizen; Hin, Hinohikari; Hit, Hitomebore; Hos, Hoshinoyume; Kin, Kinuhikari; Kir, Kirara397; Kos, Koshihikari; Mas, Masshigura; Tsu, Tsugaruroman; Yum, Yumeakar.Number of rice sample of varieties and production areas.

  7. w

    Living Standards Measurement Survey 2003 (General Population, Wave 2 Panel)...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Jan 30, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Social Affairs (2020). Living Standards Measurement Survey 2003 (General Population, Wave 2 Panel) and Roma Settlement Survey 2003 - Serbia and Montenegro [Dataset]. https://microdata.worldbank.org/index.php/catalog/81
    Explore at:
    Dataset updated
    Jan 30, 2020
    Dataset provided by
    Strategic Marketing & Media Research Institute Group (SMMRI)
    Ministry of Social Affairs
    Time period covered
    2003
    Area covered
    Serbia and Montenegro
    Description

    Abstract

    The study included four separate surveys:

    1. The LSMS survey of general population of Serbia in 2002
    2. The survey of Family Income Support (MOP in Serbian) recipients in 2002 These two datasets are published together separately from the 2003 datasets.

    3. The LSMS survey of general population of Serbia in 2003 (panel survey)

    4. The survey of Roma from Roma settlements in 2003 These two datasets are published together.

    Objectives

    LSMS represents multi-topical study of household living standard and is based on international experience in designing and conducting this type of research. The basic survey was carried out in 2002 on a representative sample of households in Serbia (without Kosovo and Metohija). Its goal was to establish a poverty profile according to the comprehensive data on welfare of households and to identify vulnerable groups. Also its aim was to assess the targeting of safety net programs by collecting detailed information from individuals on participation in specific government social programs. This study was used as the basic document in developing Poverty Reduction Strategy (PRS) in Serbia which was adopted by the Government of the Republic of Serbia in October 2003.

    The survey was repeated in 2003 on a panel sample (the households which participated in 2002 survey were re-interviewed).

    Analysis of the take-up and profile of the population in 2003 was the first step towards formulating the system of monitoring in the Poverty Reduction Strategy (PRS). The survey was conducted in accordance with the same methodological principles used in 2002 survey, with necessary changes referring only to the content of certain modules and the reduction in sample size. The aim of the repeated survey was to obtain panel data to enable monitoring of the change in the living standard within a period of one year, thus indicating whether there had been a decrease or increase in poverty in Serbia in the course of 2003. [Note: Panel data are the data obtained on the sample of households which participated in the both surveys. These data made possible tracking of living standard of the same persons in the period of one year.]

    Along with these two comprehensive surveys, conducted on national and regional representative samples which were to give a picture of the general population, there were also two surveys with particular emphasis on vulnerable groups. In 2002, it was the survey of living standard of Family Income Support recipients with an aim to validate this state supported program of social welfare. In 2003 the survey of Roma from Roma settlements was conducted. Since all present experiences indicated that this was one of the most vulnerable groups on the territory of Serbia and Montenegro, but with no ample research of poverty of Roma population made, the aim of the survey was to compare poverty of this group with poverty of basic population and to establish which categories of Roma population were at the greatest risk of poverty in 2003. However, it is necessary to stress that the LSMS of the Roma population comprised potentially most imperilled Roma, while the Roma integrated in the main population were not included in this study.

    Geographic coverage

    The surveys were conducted on the whole territory of Serbia (without Kosovo and Metohija).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sample frame for both surveys of general population (LSMS) in 2002 and 2003 consisted of all permanent residents of Serbia, without the population of Kosovo and Metohija, according to definition of permanently resident population contained in UN Recommendations for Population Censuses, which were applied in 2002 Census of Population in the Republic of Serbia. Therefore, permanent residents were all persons living in the territory Serbia longer than one year, with the exception of diplomatic and consular staff.

    The sample frame for the survey of Family Income Support recipients included all current recipients of this program on the territory of Serbia based on the official list of recipients given by Ministry of Social affairs.

    The definition of the Roma population from Roma settlements was faced with obstacles since precise data on the total number of Roma population in Serbia are not available. According to the last population Census from 2002 there were 108,000 Roma citizens, but the data from the Census are thought to significantly underestimate the total number of the Roma population. However, since no other more precise data were available, this number was taken as the basis for estimate on Roma population from Roma settlements. According to the 2002 Census, settlements with at least 7% of the total population who declared itself as belonging to Roma nationality were selected. A total of 83% or 90,000 self-declared Roma lived in the settlements that were defined in this way and this number was taken as the sample frame for Roma from Roma settlements.

    Planned sample: In 2002 the planned size of the sample of general population included 6.500 households. The sample was both nationally and regionally representative (representative on each individual stratum). In 2003 the planned panel sample size was 3.000 households. In order to preserve the representative quality of the sample, we kept every other census block unit of the large sample realized in 2002. This way we kept the identical allocation by strata. In selected census block unit, the same households were interviewed as in the basic survey in 2002. The planned sample of Family Income Support recipients in 2002 and Roma from Roma settlements in 2003 was 500 households for each group.

    Sample type: In both national surveys the implemented sample was a two-stage stratified sample. Units of the first stage were enumeration districts, and units of the second stage were the households. In the basic 2002 survey, enumeration districts were selected with probability proportional to number of households, so that the enumeration districts with bigger number of households have a higher probability of selection. In the repeated survey in 2003, first-stage units (census block units) were selected from the basic sample obtained in 2002 by including only even numbered census block units. In practice this meant that every second census block unit from the previous survey was included in the sample. In each selected enumeration district the same households interviewed in the previous round were included and interviewed. On finishing the survey in 2003 the cases were merged both on the level of households and members.

    Stratification: Municipalities are stratified into the following six territorial strata: Vojvodina, Belgrade, Western Serbia, Central Serbia (Šumadija and Pomoravlje), Eastern Serbia and South-east Serbia. Primary units of selection are further stratified into enumeration districts which belong to urban type of settlements and enumeration districts which belong to rural type of settlement.

    The sample of Family Income Support recipients represented the cases chosen randomly from the official list of recipients provided by Ministry of Social Affairs. The sample of Roma from Roma settlements was, as in the national survey, a two-staged stratified sample, but the units in the first stage were settlements where Roma population was represented in the percentage over 7%, and the units of the second stage were Roma households. Settlements are stratified in three territorial strata: Vojvodina, Beograd and Central Serbia.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    In all surveys the same questionnaire with minimal changes was used. It included different modules, topically separate areas which had an aim of perceiving the living standard of households from different angles. Topic areas were the following: 1. Roster with demography. 2. Housing conditions and durables module with information on the age of durables owned by a household with a special block focused on collecting information on energy billing, payments, and usage. 3. Diary of food expenditures (weekly), including home production, gifts and transfers in kind. 4. Questionnaire of main expenditure-based recall periods sufficient to enable construction of annual consumption at the household level, including home production, gifts and transfers in kind. 5. Agricultural production for all households which cultivate 10+ acres of land or who breed cattle. 6. Participation and social transfers module with detailed breakdown by programs 7. Labour Market module in line with a simplified version of the Labour Force Survey (LFS), with special additional questions to capture various informal sector activities, and providing information on earnings 8. Health with a focus on utilization of services and expenditures (including informal payments) 9. Education module, which incorporated pre-school, compulsory primary education, secondary education and university education. 10. Special income block, focusing on sources of income not covered in other parts (with a focus on remittances).

    Response rate

    During field work, interviewers kept a precise diary of interviews, recording both successful and unsuccessful visits. Particular attention was paid to reasons why some households were not interviewed. Separate marks were given for households which were not interviewed due to refusal and for cases when a given household could not be found on the territory of the chosen census block.

    In 2002 a total of 7,491 households were contacted. Of this number a total of 6,386 households in 621 census rounds were interviewed. Interviewers did not manage to collect the data for 1,106 or 14.8% of selected households. Out of this number 634 households

  8. f

    Comparison of target population to analytic sample; and characteristics of...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pereira, Snehal M. Pinto; Rojas, Natalia K.; Dalrymple, Emma; Shafran, Roz; Ford, Tamsin; Swann, Olivia V.; McOwat, Kelsey; Nugawela, Manjula D.; Fox-Smith, Lana; Chalder, Trudie; Ladhani, Shamez N.; Stephenson, Terence; Simmons, Ruth; Heyman, Isobel (2023). Comparison of target population to analytic sample; and characteristics of children and young people in analytic sample by baseline PCR-test result: N (%). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000977863
    Explore at:
    Dataset updated
    Mar 6, 2023
    Authors
    Pereira, Snehal M. Pinto; Rojas, Natalia K.; Dalrymple, Emma; Shafran, Roz; Ford, Tamsin; Swann, Olivia V.; McOwat, Kelsey; Nugawela, Manjula D.; Fox-Smith, Lana; Chalder, Trudie; Ladhani, Shamez N.; Stephenson, Terence; Simmons, Ruth; Heyman, Isobel
    Description

    Comparison of target population to analytic sample; and characteristics of children and young people in analytic sample by baseline PCR-test result: N (%).

  9. NSDUH 2019 Sample Design Report

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). NSDUH 2019 Sample Design Report [Dataset]. https://catalog.data.gov/dataset/nsduh-2019-sample-design-report
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
    Description

    This report details the 2019 sample design for the NSDUH and covers the design overview, target population, and stages of sample selection. The design overview describes how the sample design remains consistent with NSDUH’s designs since 1991 and has extended coverage of the sample since then to include additional resident populations. The 2019 target population for this report comprises a civilian, noninstitutionalized population aged 12 years or older residing within the 50 states and the District of Columbia. There are three stages of sample selection that are explained in terms of how to select and aggregate the appropriate census tracts and segments based on state sampling regions (SSRs) for each state.

  10. i

    Progress in International Reading and Literacy Study 2006 - United Arab...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Jun 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    International Association for Educational Attainment (2022). Progress in International Reading and Literacy Study 2006 - United Arab Emirates, United Arab Emirates, Argentina...and 59 more [Dataset]. https://datacatalog.ihsn.org/catalog/7658
    Explore at:
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    International Study Centre
    International Association for Educational Attainment
    Time period covered
    2005 - 2006
    Area covered
    Argentina, United Arab Emirates
    Description

    Abstract

    The PIRLS 2006 aimed to generate a database of student achievement data in addition to information on student, parent, teacher, and school background data for the 47 areas that participated in PIRLS 2006.

    Geographic coverage

    Nationally representative

    Analysis unit

    Units of analysis in the study are schools, students, parents and teachers.

    Universe

    PIRLS is a study of student achievement in reading comprehension in primary school, and is targeted at the grade level in which students are at the transition from learning to read to reading to learn, which is the fourth grade in most countries. The formal definition of the PIRLS target population makes use of UNESCO's International Standard Classification of Education (ISCED) in identifying the appropriate target grade:

    "…all students enrolled in the grade that represents four years of schooling, counting from the first year of ISCED Level 1, providing the mean age at the time of testing is at least 9.5 years. For most countries, the target grade should be the fourth grade, or its national equivalent."

    ISCED Level 1 corresponds to primary education or the first stage of basic education, and should mark the beginning of "systematic apprenticeship of reading, writing, and mathematics" (UNESCO, 1999). By the fourth year of Level 1, students have had 4 years of formal instruction in reading, and are in the process of becoming independent readers. In IEA studies, the above definition corresponds to what is known as the international desired target population. Each participating country was expected to define its national desired population to correspond as closely as possible to this definition (i.e., its fourth grade of primary school). In order to measure trends, it was critical that countries that participated in PIRLS 2001, the previous cycle of PIRLS, choose the same target grade for PIRLS 2006 that was used in PIRLS 2001. Information about the target grade in each country is provided in Chapter 9 of the PIRLS 2006 Technical Report.

    Although countries were expected to include all students in the target grade in their definition of the population, sometimes it was not possible to include all students who fell under the definition of the international desired target population. Consequently, occasionally a country's national desired target population excluded some section of the population, based on geographic or linguistic constraints. For example, Lithuania's national desired target population included only students in Lithuanian-speaking schools, representing approximately 93 percent of the international desired population of students in the country. PIRLS participants were expected to ensure that the national defined population included at least 95 percent of the national desired population of students. Exclusions (which had to be kept to a minimum) could occur at the school level, within the sampled schools, or both. Although countries were expected to do everything possible to maximize coverage of the national desired population, school-level exclusions sometimes were necessary. Keeping within the 95 percent limit, school-level exclusions could include schools that:

    • were geographically remote,
    • had very few students,
    • had a curriculum or structure different from the mainstream education system, or
    • were specifically for students with special needs.

    The difference between these school-level exclusions and those at the previous level is that these schools were included as part of the sampling frame (i.e., the list of schools to be sampled). Th ey then were eliminated on an individual basis if it was not feasible to include them in the testing.

    In many education systems, students with special educational needs are included in ordinary classes. Due to this fact, another level of exclusions is necessary to reach an eff ective target population-the population of students who ultimately will be tested. These are called within-school exclusions and pertain to students who are unable to be tested for a particular reason but are part of a regular classroom. There are three types of within-school exclusions.

    • Intellectually disabled students
    • Functionally disabled students
    • Non-native language speakers

    Students eligible for within-school exclusion were identified by staff at the schools and could still be administered the test if the school did not want the student to feel out of place during the assessment (though the data from these students were not included in any analyses). Again, it was important to ensure that this population was as close to the national desired target population as possible. If combined, school-level and within-school exclusions exceeded 5 percent of the national desired target population, results were annotated in the PIRLS 2006 International Report (Mullis, Martin, Kennedy, & Foy, 2007). Target population coverage and exclusion rates are displayed for each country in Chapter 9 of the PIRLS 2006 Technical Report. Descriptions of the countries' school-level and within-school exclusions can be found in Appendix B of the PIRLS 2006 Technical Report.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The basic sample design used in PIRLS 2006 is known as a two-stage stratified cluster design, with the first stage consisting of a sample of schools, and the second stage consisting of a sample of intact classrooms from the target grade in the sampled schools. While all participants adopted this basic two-stage design, four countries, with approval from the PIRLS sampling consultants, added an extra sampling stage. The Russian Federation and the United States introduced a preliminary sampling stage, (first sampling regions in the case of the Russian Federation and primary sampling units consisting of metropolitan areas and counties in the case of the United States). Morocco and Singapore also added a third sampling stage; in these cases, sub-sampling students within classrooms rather than selecting intact classes.

    For countries participating in PIRLS 2006, school stratification was used to enhance the precision of the survey results. Many participants employed explicit stratification, where the complete school sampling frame was divided into smaller sampling frames according to some criterion, such as region, to ensurea predetermined number of schools sampled for each stratum. For example, Austria divided its sampling frame into nine regions to ensure proportional representation by region (see Appendix B for stratification information for each country). Stratification also could be done implicitly, a procedure by which schools in a sampling frame were sorted according to a set of stratification variables prior to sampling. For example, Austria employed implicit stratification by district and school size within each regional stratum. Regardless of the other stratification variables used, all countries used implicit stratification by a measure of size (MOS) of the school.

    All countries used a systematic (random start, fixed interval) probability proportional-to-size (PPS) sampling approach to sample schools. Note that when this method is combined with an implicit stratification procedure, the allocation of schools in the sample is proportional to the size of the implicit strata. Within the sampled schools, classes were sampled using a systematic random method in all countries except Morocco and Singapore, where classes were sampled with probability proportional to size, and students within classes sampled with equal probability. The PIRLS 2006 sample designs were implemented in an acceptable manner by all participants.

    Sampling deviation

    8 National Research Coordinators (NRCs) encountered organizational constraints in their systems that necessitated deviations from the sample design. In each case, the Statistics Canada sampling expert was consulted to ensure that the altered design remained compatible with the PIRLS standards.

    These country specific deviations from sample design are detailed in Appendix B of the PIRLS 2006 Technical Report (page 231) attached as Related Material.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    • PIRLS Background Questionnaires By gathering information about children’s experiences together with reading achievement on the PIRLS test, it is possible to identify the factors or combinations of factors that relate to high reading literacy. An important part of the PIRLS design is a set of questionnaires targeting factors related to reading literacy. PIRLS administered four questionnaires: to the tested students, to their parents, to their reading teachers, and to their school principals.

    • Student Questionnaire Each student taking the PIRLS reading assessment completes the student questionnaire. The questionnaire asks about aspects of students’ home and school experiences - including instructional experiences and reading for homework, self-perceptions and attitudes towards reading, out-of-school reading habits, computer use, home literacy resources, and basic demographic information.

    • Learning to Read (Home) Survey The learning to read survey is completed by the parents or primary caregivers of each student taking the PIRLS reading assessment. It addresses child-parent literacy interactions, home literacy resources, parents’ reading habits and attitudes, homeschool connections, and basic demographic and socioeconomic indicators.

    • Teacher Questionnaire The reading teacher of each fourth-grade class sampled for PIRLS completes a questionnaire designed to gather information about classroom contexts for developing reading literacy. This questionnaire

  11. World Health Survey 2003 - Hungary

    • microdata.worldbank.org
    • apps.who.int
    • +1more
    Updated Oct 17, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2013). World Health Survey 2003 - Hungary [Dataset]. https://microdata.worldbank.org/index.php/catalog/1719
    Explore at:
    Dataset updated
    Oct 17, 2013
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2003
    Area covered
    Hungary
    Description

    Abstract

    Different countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers.

    The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters.

    The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules.

    The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.

    Geographic coverage

    The survey sampling frame must cover 100% of the country's eligible population, meaning that the entire national territory must be included. This does not mean that every province or territory need be represented in the survey sample but, rather, that all must have a chance (known probability) of being included in the survey sample.

    There may be exceptional circumstances that preclude 100% national coverage. Certain areas in certain countries may be impossible to include due to reasons such as accessibility or conflict. All such exceptions must be discussed with WHO sampling experts. If any region must be excluded, it must constitute a coherent area, such as a particular province or region. For example if ¾ of region D in country X is not accessible due to war, the entire region D will be excluded from analysis.

    Analysis unit

    Households and individuals

    Universe

    The WHS will include all male and female adults (18 years of age and older) who are not out of the country during the survey period. It should be noted that this includes the population who may be institutionalized for health reasons at the time of the survey: all persons who would have fit the definition of household member at the time of their institutionalisation are included in the eligible population.

    If the randomly selected individual is institutionalized short-term (e.g. a 3-day stay at a hospital) the interviewer must return to the household when the individual will have come back to interview him/her. If the randomly selected individual is institutionalized long term (e.g. has been in a nursing home the last 8 years), the interviewer must travel to that institution to interview him/her.

    The target population includes any adult, male or female age 18 or over living in private households. Populations in group quarters, on military reservations, or in other non-household living arrangements will not be eligible for the study. People who are in an institution due to a health condition (such as a hospital, hospice, nursing home, home for the aged, etc.) at the time of the visit to the household are interviewed either in the institution or upon their return to their household if this is within a period of two weeks from the first visit to the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SAMPLING GUIDELINES FOR WHS

    Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.

    The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.

    The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.

    All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO

    STRATIFICATION

    Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.

    Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).

    Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.

    MULTI-STAGE CLUSTER SELECTION

    A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.

    In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.

    In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.

    It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which

  12. w

    General Population Census of 1982 - IPUMS Subset - France

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Aug 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INSEE (Institut National de la Statisque et des Etudes Economiques) (2025). General Population Census of 1982 - IPUMS Subset - France [Dataset]. https://microdata.worldbank.org/index.php/catalog/2145
    Explore at:
    Dataset updated
    Aug 1, 2025
    Dataset provided by
    INSEE (Institut National de la Statisque et des Etudes Economiques)
    IPUMS
    Time period covered
    1982
    Area covered
    France
    Description

    Analysis unit

    Persons, households, and dwellings

    UNITS IDENTIFIED: - Dwellings: yes* - Vacant Units: No - Households: yes - Individuals: yes - Group quarters: yes*

    UNIT DESCRIPTIONS: - Dwellings: no - Households: Yes - Group quarters: A collective household is a group of persons that does not live in an ordinary household, but lives in a collective establishment, sharing meal times.

    Universe

    Residents of France, of any nationality. Does not include French citizens living in other countries, foreign tourists, or people passing through. Reintegrated persons: Persons living in group quarters or without a fixed address but having a usual home elsewhere (i.e., enumerated away from their usual residence). During data processing, most of these people are reintegrated into their usual households, except in the case of persons in psychiatric hospitals and prisons. Legal population refers to de jure population plus population compte a part.

    Kind of data

    Population and Housing Census [hh/popcen]

    Sampling procedure

    MICRODATA SOURCE: INSEE (Institut National de la Statisque et des Etudes Economiques)

    SAMPLE SIZE (person records): 2631713.

    SAMPLE DESIGN: Systematic manual sorting into lots with different sample units according to target population. Lots divide the population into different samples (1/4 and 3/4). 1/20 sample is selected from 1/4 sample. Reintegrated persons: Persons living in group quarters or without a fixed address but having a usual home elsewhere (i.e., enumerated away from their usual residence). During data processing, most of these people are reintegrated into their usual households, except in the case of persons in psychiatric hospitals and prisons. Legal population refers to de jure population plus population compte a part.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Separate forms for buildings, group quarters (collective households), group quarters (compte a part), private households, and boats. Four forms for individuals (living in group quarters and private dwellings; two different forms for people compte a part; living in boats).

  13. World Health Survey 2003 - Finland

    • microdata.worldbank.org
    • apps.who.int
    • +1more
    Updated Oct 17, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2013). World Health Survey 2003 - Finland [Dataset]. https://microdata.worldbank.org/index.php/catalog/1711
    Explore at:
    Dataset updated
    Oct 17, 2013
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2003
    Area covered
    Finland
    Description

    Abstract

    Different countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers.

    The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters.

    The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules.

    The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.

    Geographic coverage

    The survey sampling frame must cover 100% of the country's eligible population, meaning that the entire national territory must be included. This does not mean that every province or territory need be represented in the survey sample but, rather, that all must have a chance (known probability) of being included in the survey sample.

    There may be exceptional circumstances that preclude 100% national coverage. Certain areas in certain countries may be impossible to include due to reasons such as accessibility or conflict. All such exceptions must be discussed with WHO sampling experts. If any region must be excluded, it must constitute a coherent area, such as a particular province or region. For example if ¾ of region D in country X is not accessible due to war, the entire region D will be excluded from analysis.

    Analysis unit

    Households and individuals

    Universe

    The WHS will include all male and female adults (18 years of age and older) who are not out of the country during the survey period. It should be noted that this includes the population who may be institutionalized for health reasons at the time of the survey: all persons who would have fit the definition of household member at the time of their institutionalisation are included in the eligible population.

    If the randomly selected individual is institutionalized short-term (e.g. a 3-day stay at a hospital) the interviewer must return to the household when the individual will have come back to interview him/her. If the randomly selected individual is institutionalized long term (e.g. has been in a nursing home the last 8 years), the interviewer must travel to that institution to interview him/her.

    The target population includes any adult, male or female age 18 or over living in private households. Populations in group quarters, on military reservations, or in other non-household living arrangements will not be eligible for the study. People who are in an institution due to a health condition (such as a hospital, hospice, nursing home, home for the aged, etc.) at the time of the visit to the household are interviewed either in the institution or upon their return to their household if this is within a period of two weeks from the first visit to the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SAMPLING GUIDELINES FOR WHS

    Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.

    The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.

    The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.

    All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO

    STRATIFICATION

    Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.

    Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).

    Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.

    MULTI-STAGE CLUSTER SELECTION

    A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.

    In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.

    In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.

    It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which

  14. World Health Survey 2003 - Latvia

    • microdata.worldbank.org
    • apps.who.int
    • +2more
    Updated Oct 17, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2013). World Health Survey 2003 - Latvia [Dataset]. https://microdata.worldbank.org/index.php/catalog/1729
    Explore at:
    Dataset updated
    Oct 17, 2013
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2003
    Area covered
    Latvia
    Description

    Abstract

    Different countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers.

    The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters.

    The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules.

    The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.

    Geographic coverage

    The survey sampling frame must cover 100% of the country's eligible population, meaning that the entire national territory must be included. This does not mean that every province or territory need be represented in the survey sample but, rather, that all must have a chance (known probability) of being included in the survey sample.

    There may be exceptional circumstances that preclude 100% national coverage. Certain areas in certain countries may be impossible to include due to reasons such as accessibility or conflict. All such exceptions must be discussed with WHO sampling experts. If any region must be excluded, it must constitute a coherent area, such as a particular province or region. For example if ¾ of region D in country X is not accessible due to war, the entire region D will be excluded from analysis.

    Analysis unit

    Households and individuals

    Universe

    The WHS will include all male and female adults (18 years of age and older) who are not out of the country during the survey period. It should be noted that this includes the population who may be institutionalized for health reasons at the time of the survey: all persons who would have fit the definition of household member at the time of their institutionalisation are included in the eligible population.

    If the randomly selected individual is institutionalized short-term (e.g. a 3-day stay at a hospital) the interviewer must return to the household when the individual will have come back to interview him/her. If the randomly selected individual is institutionalized long term (e.g. has been in a nursing home the last 8 years), the interviewer must travel to that institution to interview him/her.

    The target population includes any adult, male or female age 18 or over living in private households. Populations in group quarters, on military reservations, or in other non-household living arrangements will not be eligible for the study. People who are in an institution due to a health condition (such as a hospital, hospice, nursing home, home for the aged, etc.) at the time of the visit to the household are interviewed either in the institution or upon their return to their household if this is within a period of two weeks from the first visit to the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SAMPLING GUIDELINES FOR WHS

    Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.

    The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.

    The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.

    All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO

    STRATIFICATION

    Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.

    Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).

    Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.

    MULTI-STAGE CLUSTER SELECTION

    A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.

    In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.

    In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.

    It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which

  15. Data from: Population Assessment of Tobacco and Health (PATH) Study [United...

    • icpsr.umich.edu
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inter-university Consortium for Political and Social Research [distributor] (2025). Population Assessment of Tobacco and Health (PATH) Study [United States] Restricted-Use Files [Dataset]. http://doi.org/10.3886/ICPSR36231.v43
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/36231/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36231/terms

    Area covered
    United States
    Description

    The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who use or do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population (CNP) at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Unit (PSU)s and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the CNP at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the CNP at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This "second replenishment sample" was combined for estimation and analysis purposes with the Wave 7 adult and youth respondents from the Wave 4 Cohorts who were at least age 15 and in the CNP at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Dataset 0002 (DS0002) contains the data from the State Design Data. This file contains 7 variables and 82,139 cases. The state identifier in the State Design file reflects the participant's state of residence at the time of selection and recruitment for the PATH Study. Dataset 1011 (DS1011) contains the data from the Wave 1 Adult Questionnaire. This data file contains 2,021 variables and 32,320 cases. Each of the cases represents a single, completed interview. Dataset 1012 (DS1012) contains the data from the Wave 1 Youth and Parent Questionnaire. This file contains 1,431 variables and 13,651 cases. Dataset 1411 (DS1411) contains the Wave 1 State Identifier data for Adults and has 5 variables and 32,320 cases. Dataset 1412 (DS1412) contains the Wave 1 State Identifier data for Youth (and Parents) and has 5 variables and 13,651 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state Federal Information Processing System (FIPS), state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 1, which is also their state of residence at the time of recruitment. Dataset 1611 (DS1611) contains the Tobacco Universal Product Code (UPC) data from Wave 1. This data file contains 32 variables and 8,601 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 1. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used

  16. World Health Survey 2003 - United Kingdom

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    • +2more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2019). World Health Survey 2003 - United Kingdom [Dataset]. https://datacatalog.ihsn.org/catalog/3826
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2003
    Area covered
    United Kingdom
    Description

    Abstract

    Different countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers.

    The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters.

    The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules.

    The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.

    Geographic coverage

    The survey sampling frame must cover 100% of the country's eligible population, meaning that the entire national territory must be included. This does not mean that every province or territory need be represented in the survey sample but, rather, that all must have a chance (known probability) of being included in the survey sample.

    There may be exceptional circumstances that preclude 100% national coverage. Certain areas in certain countries may be impossible to include due to reasons such as accessibility or conflict. All such exceptions must be discussed with WHO sampling experts. If any region must be excluded, it must constitute a coherent area, such as a particular province or region. For example if ¾ of region D in country X is not accessible due to war, the entire region D will be excluded from analysis.

    Analysis unit

    Households and individuals

    Universe

    The WHS will include all male and female adults (18 years of age and older) who are not out of the country during the survey period. It should be noted that this includes the population who may be institutionalized for health reasons at the time of the survey: all persons who would have fit the definition of household member at the time of their institutionalisation are included in the eligible population.

    If the randomly selected individual is institutionalized short-term (e.g. a 3-day stay at a hospital) the interviewer must return to the household when the individual will have come back to interview him/her. If the randomly selected individual is institutionalized long term (e.g. has been in a nursing home the last 8 years), the interviewer must travel to that institution to interview him/her.

    The target population includes any adult, male or female age 18 or over living in private households. Populations in group quarters, on military reservations, or in other non-household living arrangements will not be eligible for the study. People who are in an institution due to a health condition (such as a hospital, hospice, nursing home, home for the aged, etc.) at the time of the visit to the household are interviewed either in the institution or upon their return to their household if this is within a period of two weeks from the first visit to the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SAMPLING GUIDELINES FOR WHS

    Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.

    The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.

    The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.

    All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO

    STRATIFICATION

    Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.

    Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).

    Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.

    MULTI-STAGE CLUSTER SELECTION

    A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.

    In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.

    In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.

    It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which

  17. Data from: Respondent-Driven Sampling and Total Population Data from a Rural...

    • datacatalogue.ukdataservice.ac.uk
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    White, R., London School of Hygiene & Tropical Medicine (2022). Respondent-Driven Sampling and Total Population Data from a Rural Ugandan Cohort, 2010: Special Licence Access [Dataset]. http://doi.org/10.5255/UKDA-SN-7462-1
    Explore at:
    Dataset updated
    Aug 8, 2022
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    White, R., London School of Hygiene & Tropical Medicine
    Area covered
    Uganda
    Description

    This is a mixed-methods data collection. This study used Respondent Driven Sampling (RDS) methodology, which is a sampling method designed to generate unbiased estimates of population characteristics for populations where a sampling frame is not available. It is a form of snowball or link-tracing sampling, where respondents are given coupons to recruit other members of the target population, and where respondents are rewarded for both participating and for recruiting others. In addition to variables of interest, data are collected on the number of members of the target population each participant knows. Estimation methods are then applied to account for the non-random sample selection in an attempt to generate unbiased estimates for the target population.

    In 2010, the researchers conducted an RDS study in a rural Ugandan population where total population data were available. The aim of this study was to evaluate whether RDS could generate representative data on a rural Ugandan population by comparing estimates from an RDS survey with total-population data. The data used to define the target population (male household heads) were available from an ongoing general population cohort of 25 villages in rural Masaka, Uganda covering an area of approximately 38km. Annually, households in the study villages are mapped and after obtaining consent, a total-population household census and an individual questionnaire are administered and blood taken for HIV-1 testing. A random sample of eligible men in the target population who were not recruited during the RDS study were also interviewed, using the same RDS questionnaire. Finally, 49 qualitative interviews (of which summaries have been deposited) were conducted with a range of people (men and women) including RDS participants and non-participants, and RDS interviewers. These data can be used to evaluate the RDS sampling method, and to test new RDS estimators.

    Further information may be found in the documentation and in the journal articles listed in the Publications section.

    Special Licence access and geographic data
    This data collection is subject to Special Licence access conditions (see Access section for details). Data are analysable at individual village level, and GPS point data are available for the villages and interview sites. Finer detail geographic variables may be available for certain research questions. If these are required, users should request this when making their Special Licence application.

  18. h

    target-audience

    • huggingface.co
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Tseng (2025). target-audience [Dataset]. https://huggingface.co/datasets/agentlans/target-audience
    Explore at:
    Dataset updated
    Sep 6, 2025
    Authors
    Alan Tseng
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    Target Audience

    This dataset contains text samples from the agentlans/high-quality-text collection (sample_k10000 config). It uses the google/gemma-3-12b-it model to identify the target audience of each text and to rewrite the content to better appeal to that audience.

    text: the original text from the dataset
    audience: a detailed description of the text’s intended target audience
    revised: the original text rewritten to better engage the identified audience

    { "text": "The… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/target-audience.

  19. Population Assessment of Tobacco and Health (PATH) Study [United States]...

    • icpsr.umich.edu
    ascii, delimited, r +3
    Updated Jun 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inter-university Consortium for Political and Social Research [distributor] (2025). Population Assessment of Tobacco and Health (PATH) Study [United States] Special Collection Public-Use Files [Dataset]. http://doi.org/10.3886/ICPSR37786.v9
    Explore at:
    sas, r, delimited, stata, spss, asciiAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/37786/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37786/terms

    Area covered
    United States
    Description

    The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who do and do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population (CNP) at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Units (PSUs) and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the CNP at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort.At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the CNP at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This "second replenishment sample" was combined for estimation and analysis purposes with the Wave 7 adult and youth respondents from the Wave 4 Cohorts who were at least age 15 and in the CNP at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort.Please refer to the Public-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Wave 4.5 was a special data collection for youth only who were aged 12 to 17 at the time of the Wave 4.5 interview. Wave 4.5 was the fourth annual follow-up wave for those who were members of the Wave 1 Cohort. For those who were sampled at Wave 4, Wave 4.5 was the first annual follow-up wave.Wave 5.5, conducted in 2020, was a special data collection for Wave 4 Cohort youth and young adults ages 13 to 19 at the time of the Wave 5.5 interview. Also in 2020, a subsample of Wave 4 Cohort adults ages 20 and older were interviewed via the PATH Study Adult Telephone Survey (PATH-ATS).Wave 7.5 was a special collection for Wave 4 and Wave 7 Cohort youth and young adults ages 12 to 22 at the time of the Wave 7.5 interview. For those who were sampled at Wave 7, Wave 7.5 was the first annual follow-up wave. Dataset 1002 (DS1002) contains the data from the Wave 4.5 Youth and Parent Questionnaire. This file contains 1,395 variables and 13,131 cases. Of these cases, 11,378 are continuing youth having completed a prior Youth Interview. The other 1,753 cases are "aged-up youth" having previously been sampled as "shadow youth." Datasets 1112, 1212, and 1222, (DS1112, DS1212, and DS1222) are data files comprising the weight variables for Wave 4.5. The "all-waves" weight file contains weights for participants in the Wave 1 Cohort who completed a Wave 4.5 Youth Interview and completed interviews (if old enough to do so) or verified their information with the study (if not old enough to be interviewed) in Waves 1, 2, 3, and 4. There are two separate files with "single wave" weights: one for the Wave 1 Cohort and one for the Wave 4 Cohort. The "single-wave" weight file for the Wave 1 Cohort contains weights for youth who completed an interview in Wave 1 an

  20. i

    Integrated Biological and Behavioural Surveillance Survey 2007 - Nigeria

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Ministry of Health (FMOH) (2019). Integrated Biological and Behavioural Surveillance Survey 2007 - Nigeria [Dataset]. https://datacatalog.ihsn.org/catalog/3909
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Federal Ministry of Health (FMOH)
    Time period covered
    2007
    Area covered
    Nigeria
    Description

    Abstract

    The main objectives of the study were to assess the knowledge and beliefs of high-risk groups about STI and HIV, determine the prevalence of HIV infection and syphilis among these groups and obtain baseline data that will permit comparisons of risk behaviours, HIV infection and syphilis over time.

    Geographic coverage

    Six selected states

    Analysis unit

    State, group, individual

    Universe

    The Integrated Biological and Behavioural Surveillance Survey 2007 covered only males and females aged up to 15-49 years among seven sub-populations at risk of HIV in six selected states of Nigeria, namely Female Sex Workers (both brothel- and non-brothel-based), men who have sex with men (MSM), injecting drug users (IDU), members of the armed forces, police, and transport workers (TW).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    In order to reach a representative sample of all groups involved in the 2007 IBBSS, a number of different sampling techniques were used depending on the group in question, including simple random sampling (SRS), cluster sampling (probability proportionate to size (PPS) for fixed populations), time-location sampling (TLS) and respondent-driven sampling (RDS). For MSM and IDU, the RDS method was used, while a TLS technique was used to select non-brothel-based FSW and TW. The brothel-based FSW, armed forces, and police were selected using a two-stage cluster sampling technique. The take all (TA) sampling method was used when the desired sample size was not attainable based on the results of target population mapping.

    ITLS is a form of cluster sampling that contains both time and location dimensions. TLS provides the opportunity to reach members of a target population who access certain locations at any point in time. The process starts by creating time * location PSU (PSU that have both a time and a location dimensions) from which a random sample is selected. At the second stage all or a sub-sample of randomly selected population members who appear at the site during a designated time interval of fixed length, for example 4 hours, are interviewed. To the extent that all members of a target population access the locations at some point in time, TLS is a probability sampling method because: (i) all population members have a non-zero chance of selection as long as the TLS frame is complete; and (ii) the selection probabilities can be calculated by taking the time dimension as well as the space dimension into account.

    RDS is a method that combines "snowball sampling" with a mathematical model that weights the sample to compensate for the fact that the sample was collected in a non-random way. Characterized by long referral chains (to ensure that all members of the target population can be reached) and a statistical theory of the sampling process which controls for bias including the effects of choice of seeds and differences in network size, RDS overcomes the shortcomings of institutional sampling (coverage) and snow-ball type methods (statistical validity). By making chain-referral into a probability sampling method and consequently resolving the dilemma of a choice between coverage and statistical validity, RDS has become the most appropriate method for reaching the hard-to-reach population groups. The RDS process starts with the recruitment of the initial seeds each of whom recruits a maximum of two to three members from their population group.

    Sampling deviation

    Cluster samples were chosen randomly based on sampling frames developed through the mapping process. This process was to identify places where potential subjects could be reached and sampled. Field work for the mapping exercise was performed over one week. Due to the limited period some hidden populations may not be adequately represented in sampling frames.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire was designed in collaboration with FMOH, SFH, CDC, WHO, UNAIDS and other stakeholders. At both central- and state-level trainings, each question in the questionnaire was reviewed and role-played and possible challenges were identified and addressed. The questionnaire of Integrated Biological and Behavioural Surveillance Survey 2007 was grouped into fifteen sections

    Section 0: Identification particularsBackground characteristics Section 1: Background characteristics Section 2: Marriage and partnerships Section 3: Sexual history numbers and types of partners Section 4: Sexual history-regular partners (for those with spouse/live-in sexual partners only; for MSM, female spouse/live-in sexual partners only) Section 5: Sexual history-boy friends/girl friends (for those with boy friends/girl friends sexual partners only; for MSM, female boy friends/girl friends sexual partners only) Section 6: Sexual history-purchasing sex (male only) (for those with commercial sex partners only; for MSM, female commercial sex partners only) Section 7: Sexual history-casual-non regular non-paying sexual partners (for those with casual sexual partners only; for MSM, female casual sexual partners only) Section 8: Selling sex (for female populatios only) Section 9: Social habits (all groups) Section 10: Dru use/needle sharing (all population reporting drug injection in the past 12 months) Section 11: MSM-men who have sex with men (ask all respondents) Section 12: STIs (ask all respondents) Section 13: Knowledge, opinions, and attitudes towards HIV/AIDS (ask all respondents) Section 12: Exposure to interventions

    Cleaning operations

    After data entry, the data was cleaned using STATA 10. Frequency counts were carried out to check consistency and assess cleaniness of the database. The data cleaning also included the following:

    Searching for ages outside the age range criteria; Cross-checking all corresponding skips to the questionnaire; Reviewing the cluster allocations; Cross-checking the questionnaire completion responses from the interviewers in the database with the records in the supervisors log to ensure they matched; Tallying the supervisors log of blood samples collected to ensure that recorded numbers of samples collected matched the results recorded in the database; and Consistency checks involving cross-checking answers to related questions.

    Response rate

    There were 11,175 individuals selected for this study out of whom 0.8% and 8.1% refused to participate in behavioural and biological componenets of the study respectively.

    Non-brothel based FSW had the highest refusal rate of 2.7% and 19.4% for behavioural and biological components respectively, followed by brothel-based FSW at 2.2% and 13.1% respectively. Refusal rates for the behavioural component were less than 0.5% for other groups.

    For the biological component, refusal rates were 3% for police, 0.8% for the armed forces, 1 .2% for TW, 4.6% for MSM, and 3.3% for IDU.

    Sampling error estimates

    No sampling error estimate

    Data appraisal

    A template for the questionnaire was designed with pre-programmed consistency checks for cross-checking answers, including skips and eligibility criteria. Laboratory data forms were collected on a periodic basis from the central laboratories and brought to the same centralized location for data entry. At least 25% of the questionnaires entered daily by each data entry clerk had the behaviour and other non-biological data entered, while 100% double-data entry was achieved for the biological data for quality control purposes. The data entry clerks were supervised by three supervisors who reviewed and validated all questionnaires entered.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kungu, Stella; Musyimi, Robert; Tigoi, Caroline C.; Scott, J. Anthony G.; Abdullahi, Osman; Mugo, Daisy; Karani, Angela; Jomo, Jane; Wanjiru, Eva; Lipsitch, Marc (2012). Demographic distribution of the target population and the study sample. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001156064

Demographic distribution of the target population and the study sample.

Explore at:
Dataset updated
Feb 20, 2012
Authors
Kungu, Stella; Musyimi, Robert; Tigoi, Caroline C.; Scott, J. Anthony G.; Abdullahi, Osman; Mugo, Daisy; Karani, Angela; Jomo, Jane; Wanjiru, Eva; Lipsitch, Marc
Description

Demographic distribution of the target population and the study sample.

Search
Clear search
Close search
Google apps
Main menu