10 datasets found
  1. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  2. i

    Population and Family Health Survey 2017-2018 - Jordan

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    • +2more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Statistics (DoS) (2019). Population and Family Health Survey 2017-2018 - Jordan [Dataset]. https://catalog.ihsn.org/catalog/8005
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Department of Statistics (DoS)
    Time period covered
    2017 - 2018
    Area covered
    Jordan
    Description

    Abstract

    The primary objective of the 2017-18 Jordan Population and Family Health Survey (JPFHS) is to provide up-to-date estimates of basic demographic and health indicators. Specifically, the 2017-18 JPFHS: - Collected data at the national level that allowed calculation of key demographic indicators - Explored the direct and indirect factors that determine levels of and trends in fertility and childhood mortality - Measured levels of contraceptive knowledge and practice - Collected data on key aspects of family health, including immunisation coverage among children, the prevalence and treatment of diarrhoea and other diseases among children under age 5, and maternity care indicators such as antenatal visits and assistance at delivery among ever-married women - Obtained data on child feeding practices, including breastfeeding, and conducted anthropometric measurements to assess the nutritional status of children under age 5 and ever-married women age 15-49 - Conducted haemoglobin testing on children age 6-59 months and ever-married women age 15-49 to provide information on the prevalence of anaemia among these groups - Collected data on knowledge and attitudes of ever-married women and men about sexually transmitted infections (STIs) and HIV/AIDS - Obtained data on ever-married women’s experience of emotional, physical, and sexual violence - Obtained data on household health expenditures

    Geographic coverage

    National coverage

    Analysis unit

    • Household
    • Individual
    • Children age 0-5
    • Woman age 15-49
    • Man age 15-59

    Universe

    The survey covered all de jure household members (usual residents), children age 0-5 years, women age 15-49 years and men age 15-59 years resident in the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sampling frame used for the 2017-18 JPFHS is based on Jordan's Population and Housing Census (JPHC) frame for 2015. The current survey is designed to produce results representative of the country as a whole, of urban and rural areas separately, of three regions, of 12 administrative governorates, and of three national groups: Jordanians, Syrians, and a group combined from various other nationalities.

    The sample for the 2017-18 JPFHS is a stratified sample selected in two stages from the 2015 census frame. Stratification was achieved by separating each governorate into urban and rural areas. Each of the Syrian camps in the governorates of Zarqa and Mafraq formed its own sampling stratum. In total, 26 sampling strata were constructed. Samples were selected independently in each sampling stratum, through a two-stage selection process, according to the sample allocation. Before the sample selection, the sampling frame was sorted by district and sub-district within each sampling stratum. By using a probability-proportional-to-size selection for the first stage of selection, an implicit stratification and proportional allocation were achieved at each of the lower administrative levels.

    In the first stage, 970 clusters were selected with probability proportional to cluster size, with the cluster size being the number of residential households enumerated in the 2015 JPHC. The sample allocation took into account the precision consideration at the governorate level and at the level of each of the three special domains. After selection of PSUs and clusters, a household listing operation was carried out in all selected clusters. The resulting household lists served as the sampling frame for selecting households in the second stage. A fixed number of 20 households per cluster were selected with an equal probability systematic selection from the newly created household listing.

    For further details on sample design, see Appendix A of the final report.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Four questionnaires were used for the 2017-18 JPFHS: the Household Questionnaire, the Woman’s Questionnaire, the Man’s Questionnaire, and the Biomarker Questionnaire. These questionnaires, based on The DHS Program’s standard Demographic and Health Survey questionnaires, were adapted to reflect population and health issues relevant to Jordan. After all questionnaires were finalised in English, they were translated into Arabic.

    Cleaning operations

    All electronic data files for the 2017-18 JPFHS were transferred via IFSS to the DOS central office in Amman, where they were stored on a password-protected computer. The data processing operation included secondary editing, which required resolution of computer-identified inconsistencies and coding of open-ended questions. Data editing was accomplished using CSPro software. During the duration of fieldwork, tables were generated to check various data quality parameters, and specific feedback was given to the teams to improve performance. Secondary editing and data processing were initiated in October 2017 and completed in February 2018.

    Response rate

    A total of 19,384 households were selected for the sample, of which 19,136 were found to be occupied at the time of the fieldwork. Of the occupied households, 18,802 were successfully interviewed, yielding a response rate of 98%.

    In the interviewed households, 14,870 women were identified as eligible for an individual interview; interviews were completed with 14,689 women, yielding a response rate of 99%. A total of 6,640 eligible men were identified in the sampled households and 6,429 were successfully interviewed, yielding a response rate of 97%. Response rates for both women and men were similar across urban and rural areas.

    Sampling error estimates

    The estimates from a sample survey are affected by two types of errors: nonsampling errors and sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2017-18 Jordan Population and Family Health Survey (JPFHS) to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2017-18 JPFHS is only one of many samples that could have been selected from the same population, using the same design and sample size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability among all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

    Sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.

    If the sample of respondents had been selected by simple random sampling, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2017-18 JPFHS sample was the result of a multi-stage stratified design, and, consequently, it was necessary to use more complex formulas. Sampling errors are computed using SAS programmes developed by ICF International. These programmes use the Taylor linearisation method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

    The Taylor linearisation method treats any percentage or average as a ratio estimate, r = y/x, where y represents the total sample value for variable y, and x represents the total number of cases in the group or subgroup under consideration.

    A more detailed description of estimates of sampling errors are presented in Appendix B of the survey final report.

    Data appraisal

    Data Quality Tables - Household age distribution - Age distribution of eligible and interviewed women - Age distribution of eligible and interviewed men - Completeness of reporting - Births by calendar years - Reporting of age at death in days - Reporting of age at death in months

    See details of the data quality tables in Appendix C of the survey final report.

  3. f

    Data from: Robust inference under r-size-biased sampling without replacement...

    • tandf.figshare.com
    xlsx
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P. Economou; G. Tzavelas; A. Batsidis (2023). Robust inference under r-size-biased sampling without replacement from finite population [Dataset]. http://doi.org/10.6084/m9.figshare.11542974.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 28, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    P. Economou; G. Tzavelas; A. Batsidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The case of size-biased sampling of known order from a finite population without replacement is considered. The behavior of such a sampling scheme is studied with respect to the sampling fraction. Based on a simulation study, it is concluded that such a sample cannot be treated either as a random sample from the parent distribution or as a random sample from the corresponding r-size weighted distribution and as the sampling fraction increases, the biasness in the sample decreases resulting in a transition from an r-size-biased sample to a random sample. A modified version of a likelihood-free method is adopted for making statistical inference for the unknown population parameters, as well as for the size of the population when it is unknown. A simulation study, which takes under consideration the sampling fraction, demonstrates that the proposed method presents better and more robust behavior compared to the approaches, which treat the r-size-biased sample either as a random sample from the parent distribution or as a random sample from the corresponding r-size weighted distribution. Finally, a numerical example which motivates this study illustrates our results.

  4. Minimum Credence Test

    • figshare.com
    txt
    Updated Feb 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilya Fastovets (2017). Minimum Credence Test [Dataset]. http://doi.org/10.6084/m9.figshare.4696168.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 25, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ilya Fastovets
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    R script for Minimum Credence statistical test for comparing mixed samples. The test evaluates maximum possible population standard deviation from a reference sample and then applies this estimate to mixed samples for comparisons. In case the reference sample is less than 8, chi-squared approximation is used to find confidence interval for population standard deviation. Otherwise, the test bootstraps corrected median absolute deviation of the reference sample to obtain bias-corrected accelerated confidence interval for population standard deviation. This script is adapted for RFA soil analysis, but can be used elswhere if method=0. If method=1, the test extracts GOST metrological data (between-lab analytical error) from the excel file, and must be corrected for every specific analysis. The test implies homogeniety of variances among groups, and the studied parameter must average additively (i. e. arithmetic mean of individual samples is equal to mixed sample) in order to use mixed samples.For any questions/remarks please contact me at: fastovetsilya@yandex.ru

  5. f

    Table_1_Determination of sample size for a multinomial model coupled with...

    • frontiersin.figshare.com
    docx
    Updated Jul 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martyna Lukaszewicz; Brian Dennis (2024). Table_1_Determination of sample size for a multinomial model coupled with the phenology model.DOCX [Dataset]. http://doi.org/10.3389/fams.2024.1374832.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    Frontiers
    Authors
    Martyna Lukaszewicz; Brian Dennis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicting the timing of phenological events is important in agriculture, especially high-revenue products. A project sponsored by USDA-ARS had the objective of adapting a previously developed model for estimating proportions of insects in different development stages as a function of temperature (degree) and time (days) for predicting bloom in almond orchards. Data for the model normally form a two-way table of counts, with rows corresponding to sample percentages of different development stages and columns to sampling times. In this study, we report a technique developed to estimate sample sizes of multinomial and product multinomial models using a method of moments and determine the empirical coverage of sample size. This study aims to determine an appropriate sample size for data collection. This involves establishing a sampling distribution for the Pearson statistic, defined as the product of the sample size and the deviance of empirical proportions from population proportions. The intended outcome is to predict the optimal timing for harvesting crops at desired development stages when coupled with the phenology model, for which variability of the maximum likelihood estimates of the phenology model depends on sample size.

  6. d

    Expert opinions of demographic rates of Argentine black and white tegus in...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Expert opinions of demographic rates of Argentine black and white tegus in South Florida [Dataset]. https://catalog.data.gov/dataset/expert-opinions-of-demographic-rates-of-argentine-black-and-white-tegus-in-south-florida
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    South Florida, Florida
    Description

    We illustrate the utility of expert elicitation, explicit recognition of uncertainty, and the value of information for directing management and research efforts for invasive species, using tegu lizards (Salvator merianae) in southern Florida as a case study. We posited a post-birth pulse, matrix model, which was parameterized using a 3-point process to elicit estimates of tegu demographic rates from herpetology experts. We fit statistical distributions for each parameter and for each expert, then drew and pooled a large number of replicate samples from these to form a distribution for each demographic parameter. Using these distributions, we generated a large sample of matrix models to infer how the tegu population might respond to control efforts. We used the concepts of Pareto efficiency and stochastic dominance to conclude that targeting older age classes at relatively high rates appears to have the best chance of minimizing tegu abundance and control costs. Expert opinion combined with an explicit consideration of uncertainty can be valuable for conducting an initial assessment of the effort needed to control the invader. The value of information can be used to focus research in a way that not only helps increases the efficacy of control, but minimizes costs as well.

  7. f

    Data from: A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical...

    • tandf.figshare.com
    xlsx
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William M. Goodman; Susan E. Spruill; Eugene Komaroff (2024). A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting its Use [Dataset]. http://doi.org/10.6084/m9.figshare.7871960.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    William M. Goodman; Susan E. Spruill; Eugene Komaroff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    When the editors of Basic and Applied Social Psychology effectively banned the use of null hypothesis significance testing (NHST) from articles published in their journal, it set off a fire-storm of discussions both supporting the decision and defending the utility of NHST in scientific research. At the heart of NHST is the p-value which is the probability of obtaining an effect equal to or more extreme than the one observed in the sample data, given the null hypothesis and other model assumptions. Although this is conceptually different from the probability of the null hypothesis being true, given the sample, p-values nonetheless can provide evidential information, toward making an inference about a parameter. Applying a 10,000-case simulation described in this article, the authors found that p-values’ inferential signals to either reject or not reject a null hypothesis about the mean (α = 0.05) were consistent for almost 70% of the cases with the parameter’s true location for the sampled-from population. Success increases if a hybrid decision criterion, minimum effect size plus p-value (MESP), is used. Here, rejecting the null also requires the difference of the observed statistic from the exact null to be meaningfully large or practically significant, in the researcher’s judgment and experience. The simulation compares performances of several methods: from p-value and/or effect size-based, to confidence-interval based, under various conditions of true location of the mean, test power, and comparative sizes of the meaningful distance and population variability. For any inference procedure that outputs a binary indicator, like flagging whether a p-value is significant, the output of one single experiment is not sufficient evidence for a definitive conclusion. Yet, if a tool like MESP generates a relatively reliable signal and is used knowledgeably as part of a research process, it can provide useful information.

  8. f

    Descriptive Statistics of Sample (N = 6091).

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ann M. Swartz; Young Cho; Whitney A. Welch; Scott J. Strath (2023). Descriptive Statistics of Sample (N = 6091). [Dataset]. http://doi.org/10.1371/journal.pone.0150325.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ann M. Swartz; Young Cho; Whitney A. Welch; Scott J. Strath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Descriptive Statistics of Sample (N = 6091).

  9. Reproductive Health Survey 2005 - Georgia

    • dev.ihsn.org
    • catalog.ihsn.org
    • +1more
    Updated Apr 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (CDC) (2019). Reproductive Health Survey 2005 - Georgia [Dataset]. https://dev.ihsn.org/nada//catalog/72922
    Explore at:
    Dataset updated
    Apr 25, 2019
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Georgian Ministry of Health (MoLHSA)
    Georgian Centers for Disease Control (NCDC)
    Time period covered
    2005
    Area covered
    Georgia
    Description

    Geographic coverage

    National, with tthe exception of the separatist regions of Abkhazia and South Ossetia.

    Analysis unit

    Women aged 15-44 years

    Universe

    Because the survey collected information from a representative sample of Georgian women aged 15-44 years, the data can be used to estimate percentages, averages, and other measures for the entire population of women of reproductive age residing in Georgian households in 2005.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Similar to the 1999 RHS survey, the GERHS05 was a population-based probability survey consisting of face to face interviews with women of reproductive age (15-44 years) at their homes. The survey was designed to collect information from a representative sample of approximately 6,000 women of reproductive age throughout Georgia (excluding the separatist regions of Abkhazia and South Ossetia). The population from which the respondents were selected included all females between the ages of 15 and 44 years, regardless of marital status, who were living in households in Georgia during the survey period.

    The current survey used a stratifi ed multistage sampling design that used the 2002 Georgia census as the sampling frame (State Department for Statistics, 2003). To better assist key stakeholders in assessing the baseline situation at a sub-national level, the sample was designed to produce estimates for 11 regions of the country. Census sectors were grouped into 11 strata, corresponding to Georgia’s administrative regions; three small regions, Racha-Lechkhumi, Kvemo Svaneti, and Zemo Svaneti were included in one stratum, identifi ed as the Racha-Svaneti stratum. Data are also representative for the urban-rural distribution of the population at the national level.

    The first stage of the three stage sample design was selection of census sectors, with probability of selection proportional to the number of households in each of the 11 regional sectors. The first stage was accomplished by using a systematic sampling process with a random starting point in each stratum. During the fi rst stage, 310 census sectors were selected as primary sampling units (PSUs).

    The overall sample consisted of 310 PSUs, and the target number of completed interviews was 6,200 for the entire sample, with an average of 20 completed interviews per PSU. The minimum acceptable number of interviews per stratum was set at 400, so that the minimum number of PSUs per stratum was set at 20. With these criteria, 20 PSUs were allocated to each stratum, which accounted for 220 of the available PSUs. The remaining 80 PSUs were distributed in the largest regions in order to obtain a distribution of PSUs approximately proportional to the distribution of households in the 2002 census. An additional 10 PSUs were added to the smallest stratum, Racha-Svaneti, to compensate for the considerable sparseness of women of reproductive age in this stratum.

    Unlike the 1999 survey, a separate sample of internally displaced persons was not selected for the 2005 survey.

    The sampling fraction ranges from 1 in 16 households in the Racha-Svaneti stratum (the least populated stratum) to 1 in 146 in Adjara. The ratio of households in the census to households in the sample is above 100.0, the region has been under-sampled, whereas if the ratio is less than 100.0, the region has been oversampled.

    In the second stage of sampling, clusters of households were randomly selected from each census sector chosen in the first stage. Determination of cluster size was based on the number of households required to obtain an average of 20 completed interviews per cluster. The total number of households in each cluster took into account estimates of unoccupied households, average number of women aged 15–44 years per household, the interview of only one respondent per household, and an estimated response rate of 98%. In the case of households with more than one woman between the ages of 15 and 44, one woman was selected at random to be interviewed.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire, already refined during the first RHS in Georgia in 1999, was revised carefully and reviewed by a panel of Georgian experts; in subsequent meetings and informal consultations, CDC sought advice on how to design a more effective and useful survey instrument. As a result, the content of the questionnaire was expanded substantially and made more relevant for programmatic needs.

    The questionnaire was designed to collect information on the following: - Demographic characteristics - Household assets (durable goods and dwelling characteristics) - Fertility and child mortality - Family planning and reproduction preferences - Use of reproductive and child health care services - Range and quality of maternity care services - Use of preventive and curative health care services - Reproductive health care expenditures - Perceptions of health service quality - Risky health behaviors (smoking and alcohol use) - Young adult health education and behaviors - Intimate partner violence - HIV/AIDS and other STDs

    The questionnaire was tested extensively, both before and during the pretest and prior to beginning the field work. Testing included practice field interviews and simulated interviews conducted by both CDC and NCDC staff. The questionnaire was translated into Georgian and Russian and back-translated into English.

    The inclusion of life histories (marital history and pregnancy history) and the five-year month-by-month calendar of pregnancy, contraceptive use, and union status helped respondents accurately recall the dates of one event in relation to the dates of others they had already recorded.

    Cleaning operations

    Legal ranges, pre-coded variables, consistency checks, and skips were programmed into the data entry software, so that data entry supervisors would notice errors or inconsistencies and could send problematic interviews back to the field for follow-up visits.

    Response rate

    Of the 12,338 households selected in the household sample, 6,402 included at least one eligible woman (aged 15–44 years). Of these identified respondents, 6,376 women were successfully interviewed, yielding a response rate of 99%. Virtually all respondents who were selected to participate and who could be reached agreed to be interviewed and were very cooperative. Response rates did not vary signifi cantly by geographical location.

    Sampling error estimates

    The estimates for a sample survey are affected by two types of errors: non-sampling error and sampling error. Non-sampling error is the result of mistakes made in carrying out data collection and data processing, including the failure to locate and interview the right household, errors in the way questions are asked or understood, and data entry errors. Although intensive quality-control efforts were made during the implementation of the GERHS05 to minimize this type of error, non-sampling errors are impossible to avoid altogether and difficult to evaluate statistically.

    Sampling error is a measure of the variability between an estimate and the true value of the population parameter intended to be estimated, which can be attributed to the fact that a sample rather than a complete enumeration was used to produce it. In other words, sampling error is the difference between the expected value for any variable measured in a survey and the value estimated by the survey. This sample is only one of the many probability samples that could have been selected from the female population aged 15–44 using the same sample design and projected sample size. Each of these samples would have yielded slightly different results from the actual sample selected. Because the statistics presented here are based on a sample, they may differ by chance variations from the statistics that would result if all women 15–44 years of age in Azerbaijan would have been interviewed.

    Sampling error is usually measured in terms of the variance and standard error (square root of the variance) for a particular statistic (mean, proportion, or ratio). The standard error (SE) can be used to calculate confidence intervals (CI) of the estimates within which we can say with a given level of certainty that the true value of population parameter lies. For example, for any given statistic calculated from the survey sample, there is a 95 percent probability that the true value of that statistic will lie within a range of plus or minus two SE of the survey estimate. The chances are about 68 out of 100 (about two out of three) that a sample estimate would fall within one standard error of a statistic based on a complete count of the population. The estimated sampling errors for 95% confidence intervals (1.96 x SE) for selected proportions and sample sizes are shown in Table A.1 of the Final Report. The estimates in Table A.1 can be used to estimate 95% confidence intervals for the estimated proportions shown for each sample size. The sampling error estimates include an average design effect of 1.6, needed because the GERHS05 did not employ a simple random sample but included clusters of elements in the second stage of the sample selection.

    The selection of clusters is generally characterized by some homogeneity that tends to increase the variance of the sample. Thus, the variance in the sample for the GERHS05 is greater than a simple random sample would be due to the effect of clustering. The design effect represents the ratio of the two variance estimates: the variance of the complex design using clusters, divided by the variance of a simple random sample

  10. i

    Household Budget Survey 2010 - Estonia

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Estonia (2019). Household Budget Survey 2010 - Estonia [Dataset]. https://datacatalog.ihsn.org/catalog/4509
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Statistics Estonia
    Time period covered
    2010
    Area covered
    Estonia
    Description

    Abstract

    The aim of the 2010 Estonia Household Budget Survey is to get reliable information on the expenditures and consumption of households. Besides obtaining data about the household composition, the survey also provides information on household members’ main demographic and social indicators (marital status, employment, education), as well as on living conditions and owning of durable goods. The data of the survey are used a lot by ministries and research institutions.

    Since 2000 the HBS consisting of four parts has been rather voluminous. The Household Picture concerns general data about the household’s background data such as sex, age, marital status, education, coping, employment, etc. of household members. Post-Interview is intended for registering the changes entered during the survey. The Diary Book for Food Expenditure reflects the expenditure made by the household during half a month. The Diary Book for Income, Taxes and Expenditure contains data about monetary and non-monetary income received by the household as well as the expenditures on all commodities and services.

    Geographic coverage

    National

    Analysis unit

    • Households;
    • Individuals.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The population of the Household Budget Survey was made up of all permanent residents of the Republic of Estonia aged 15 or older as of 1 January 2010, who live in private households, excl. those residing in institutions on a long-term basis (at least for a year). The Estonian Population Register, administered by the Ministry of Internal Affairs, was used as a sampling frame representing the survey population.

    The HBS is a sample survey i.e. the population is evaluated on the basis of the data collected from the sample. The survey sample was drawn from among the persons registered in the Population Register who were 15 years of age or older as at 1 January 2009. The person included in the sample (address person) brought his/her household into the sample.

    Sample persons were drawn from the Population Register by the stratified unproportional systematic sampling procedure. In case of this sampling procedure, the population is divided into non-overlapping subpopulations or strata, and independent subsamples are drawn separately from every subpopulation following the systematic sampling procedure and by applying different inclusion probabilities. The population was stratified by the county in which the address person's place of residence was. In the stratification procedure, the stratification principles worked out for and applied to the Estonian Social Survey, which has been carried out on an annual basis since 2004, were used, and thus three strata were formed by the number of inhabitants in the respective county. Hiiu county being smaller than other counties comprised a separate stratum, the remaining counties were distributed into two strata - the larger and smaller ones. Counties with the population less than 60,000 belonged to the stratum of smaller counties (as at 1 January of the survey year).

    To ensure an even distribution of the sample and preclude several address persons living at the same address from falling into the sample, records in the strata were sorted by address: first by the county code; within the county, by the rural municipality code; within the rural municipality, by the name of village; next, by the street name; and finally, by the house number.

    The original sample included 8,100 persons. In order not to put an excessive burden on the respondents, those who had participated in Statistics Esonia's surveys before were excluded. The final size of the sample was 7,803 persons.

    Although the inclusion probability is smaller in the stratum of larger counties than in other strata, the result gives a relatively large sample for Tallinn. This is necessary for the purpose of analysis, because in Tallinn the response probability is the lowest, but the diversity of households is the largest. Thus, a larger sample size from other (more homogenous) regions guarantees a required accuracy of estimates.

    Mode of data collection

    Face-to-face [f2f]

    Sampling error estimates

    Only a part of the population can be surveyed by sample survey. Because of that, the indicators calculated on the basis of sample data are always somewhat different from the actual value of the estimated population parameter. Such a difference is called the random error or sampling error of estimation. It is not possible to specify the sampling error exactly, but it can be estimated statistically by taking the variability or dispersion of the statistic that is used for parameter estimation as the basis for the sample design used in the survey. In addition to the sample design, the sampling error depends on the sample size. A smaller sampling error can be expected in case of larger sample sizes.

    An important group of quality indicators consists of the accuracy estimations of parameters calculated on the basis of the survey. The accuracy estimations provided by Statistics Estonia are estimates of the sampling error i.e. these estimations do not reflect other possible error sources. Estimates of sampling errors are calculated for more important indicators.

    Standard error is the main sampling error estimate. Standard error is a mathematical value that describes the variance of parameter estimates given on the basis of the sample. As the sample is selected randomly, the parameter estimate is also a random variable and variance can be calculated for it. The smaller the variance, the more exact is the parameter estimate. The variance of estimate depends on the sample size and sample design.

    Relative standard error shows the proportion that the estimate’s standard error forms of the estimated value. As a rule, it is presented as a percentage. Relative standard error is independent from measuring units, due to that it allows for comparing of different parameter estimations with each other irrespective of measurement units. Relative standard error is an operative tool in order to receive a quick overview of the accuracy of estimates.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD

Current Population Survey (CPS)

Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description

analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

Search
Clear search
Close search
Google apps
Main menu