17 datasets found
  1. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  2. a

    2020 ACS Demographic & Socio-Economic Data Of Oklahoma At Census Tract Level...

    • one-health-data-hub-osu-geog.hub.arcgis.com
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    snakka_OSU_GEOG (2024). 2020 ACS Demographic & Socio-Economic Data Of Oklahoma At Census Tract Level [Dataset]. https://one-health-data-hub-osu-geog.hub.arcgis.com/items/cf38f8a63cc649779740f403a6552081
    Explore at:
    Dataset updated
    May 22, 2024
    Dataset authored and provided by
    snakka_OSU_GEOG
    Area covered
    Description

    we utilized data from two main sources: the United States Census Bureau's American Community Survey (ACS) and the Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry (CDC/ATSDR) Social Vulnerability Index (SVI).American Community Survey (ACS):Conducted by the U.S. Census Bureau, the ACS is an ongoing survey that provides detailed demographic and socio-economic data on the population and housing characteristics of the United States.The survey collects information on various topics such as income, education, employment, health insurance coverage, and housing costs and conditions.It offers more frequent and up-to-date information compared to the decennial census, with annual estimates produced based on a rolling sample of households.The ACS data is essential for policymakers, researchers, and communities to make informed decisions and address the evolving needs of the population.CDC/ATSDR Social Vulnerability Index (SVI):Created by ATSDR’s Geospatial Research, Analysis & Services Program (GRASP) and utilized by the CDC, the SVI is designed to identify and map communities that are most likely to need support before, during, and after hazardous events.SVI ranks U.S. Census tracts based on 15 social factors, including unemployment, minority status, and disability, and groups them into four related themesEach tract receives rankings for each Census variable and for each theme, as well as an overall ranking, indicating its relative vulnerability.SVI data provides insights into the social vulnerability of communities at both the tract and county levels, helping public health officials and emergency response planners allocate resources effectively. In our utilization of these sources, we likely integrated data from both the ACS and the SVI to analyze and understand various socio-economic and demographic indicators at the state, county, and possibly tract levels. This integrated data would have been valuable for research, policymaking, and community planning purposes, allowing for a comprehensive understanding of social and economic dynamics across different geographical areas in the United StatesNote: Due to limitations in the ArcGIS Pro environment, the data variable names may be truncated. Refer to the provided table for a clear understanding of the variables.CSV Variable NameShapefile Variable NameDescriptionStateNameStateNameName of the stateStateFipsStateFipsState-level FIPS codeState nameStateNameName of the stateCountyNameCountyNameName of the countyCensusFipsCensusFipsCounty-level FIPS codeState abbreviationStateFipsState abbreviationCountyFipsCountyFipsCounty-level FIPS codeCensusFipsCensusFipsCounty-level FIPS codeCounty nameCountyNameName of the countyAREA_SQMIAREA_SQMITract area in square milesE_TOTPOPE_TOTPOPPopulation estimates, 2014-2018 ACSEP_POVEP_POVPercentage of persons below poverty estimateEP_UNEMPEP_UNEMPUnemployment Rate estimateEP_HBURDEP_HBURDHousing cost burdened occupied housing units with annual income less than $75,000EP_UNINSUREP_UNINSURUninsured in the total civilian noninstitutionalized population estimate, 2015-2019 ACSEP_PCIEP_PCIPer capita income estimate, 2015-2019 ACSEP_DISABLEP_DISABLPercentage of civilian noninstitutionalized population with a disability estimate, 2015-2019 ACSEP_SNGPNTEP_SNGPNTPercentage of single parent households with children under 18 estimate, 2015-2019 ACSEP_MINRTYEP_MINRTYPercentage minority (all persons except white, non-Hispanic) estimate, 2015-2019 ACSEP_LIMENGEP_LIMENGPercentage of persons (age 5+) who speak English "less than well" estimate, 2015-2019 ACSEP_MUNITEP_MUNITPercentage of housing in structures with 10 or more units estimateEP_MOBILEEP_MOBILEPercentage of mobile homes estimateEP_CROWDEP_CROWDPercentage of occupied housing units with more people than rooms estimateEP_NOVEHEP_NOVEHPercentage of households with no vehicle available estimateEP_GROUPQEP_GROUPQPercentage of persons in group quarters estimate, 2014-2018 ACSBelow_5_yrBelow_5_yrUnder 5 years: Percentage of Total populationBelow_18_yrBelow_18_yrUnder 18 years: Percentage of Total population18-39_yr18_39_yr18-39 years: Percentage of Total population40-64_yr40_64_yr40-64 years: Percentage of Total populationAbove_65_yrAbove_65_yrAbove 65 years: Percentage of Total populationPop_malePop_malePercentage of total population malePop_femalePop_femalePercentage of total population femaleWhitewhitePercentage population of white aloneBlackblackPercentage population of black or African American aloneAmerican_indianamerican_iPercentage population of American Indian and Alaska native aloneAsianasianPercentage population of Asian aloneHawaiian_pacific_islanderhawaiian_pPercentage population of Native Hawaiian and Other Pacific Islander aloneSome_othersome_otherPercentage population of some other race aloneMedian_tot_householdsmedian_totMedian household income in the past 12 months (in 2019 inflation-adjusted dollars) by household size – total householdsLess_than_high_schoolLess_than_Percentage of Educational attainment for the population less than 9th grades and 9th to 12th grade, no diploma estimateHigh_schoolHigh_schooPercentage of Educational attainment for the population of High school graduate (includes equivalency)Some_collegeSome_collePercentage of Educational attainment for the population of Some college, no degreeAssociates_degreeAssociatesPercentage of Educational attainment for the population of associate degreeBachelor’s_degreeBachelor_sPercentage of Educational attainment for the population of Bachelor’s degreeMaster’s_degreeMaster_s_dPercentage of Educational attainment for the population of Graduate or professional degreecomp_devicescomp_devicPercentage of Household having one or more types of computing devicesInternetInternetPercentage of Household with an Internet subscriptionBroadbandBroadbandPercentage of Household having Broadband of any typeSatelite_internetSatelite_iPercentage of Household having Satellite Internet serviceNo_internetNo_internePercentage of Household having No Internet accessNo_computerNo_computePercentage of Household having No computerThis table provides a mapping between the CSV variable names and the shapefile variable names, along with a brief description of each variable.

  3. i

    Household Budget Survey 2010 - Estonia

    • catalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Estonia (2019). Household Budget Survey 2010 - Estonia [Dataset]. https://catalog.ihsn.org/catalog/4509
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Statistics Estonia
    Time period covered
    2010
    Area covered
    Estonia
    Description

    Abstract

    The aim of the 2010 Estonia Household Budget Survey is to get reliable information on the expenditures and consumption of households. Besides obtaining data about the household composition, the survey also provides information on household members’ main demographic and social indicators (marital status, employment, education), as well as on living conditions and owning of durable goods. The data of the survey are used a lot by ministries and research institutions.

    Since 2000 the HBS consisting of four parts has been rather voluminous. The Household Picture concerns general data about the household’s background data such as sex, age, marital status, education, coping, employment, etc. of household members. Post-Interview is intended for registering the changes entered during the survey. The Diary Book for Food Expenditure reflects the expenditure made by the household during half a month. The Diary Book for Income, Taxes and Expenditure contains data about monetary and non-monetary income received by the household as well as the expenditures on all commodities and services.

    Geographic coverage

    National

    Analysis unit

    • Households;
    • Individuals.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The population of the Household Budget Survey was made up of all permanent residents of the Republic of Estonia aged 15 or older as of 1 January 2010, who live in private households, excl. those residing in institutions on a long-term basis (at least for a year). The Estonian Population Register, administered by the Ministry of Internal Affairs, was used as a sampling frame representing the survey population.

    The HBS is a sample survey i.e. the population is evaluated on the basis of the data collected from the sample. The survey sample was drawn from among the persons registered in the Population Register who were 15 years of age or older as at 1 January 2009. The person included in the sample (address person) brought his/her household into the sample.

    Sample persons were drawn from the Population Register by the stratified unproportional systematic sampling procedure. In case of this sampling procedure, the population is divided into non-overlapping subpopulations or strata, and independent subsamples are drawn separately from every subpopulation following the systematic sampling procedure and by applying different inclusion probabilities. The population was stratified by the county in which the address person's place of residence was. In the stratification procedure, the stratification principles worked out for and applied to the Estonian Social Survey, which has been carried out on an annual basis since 2004, were used, and thus three strata were formed by the number of inhabitants in the respective county. Hiiu county being smaller than other counties comprised a separate stratum, the remaining counties were distributed into two strata - the larger and smaller ones. Counties with the population less than 60,000 belonged to the stratum of smaller counties (as at 1 January of the survey year).

    To ensure an even distribution of the sample and preclude several address persons living at the same address from falling into the sample, records in the strata were sorted by address: first by the county code; within the county, by the rural municipality code; within the rural municipality, by the name of village; next, by the street name; and finally, by the house number.

    The original sample included 8,100 persons. In order not to put an excessive burden on the respondents, those who had participated in Statistics Esonia's surveys before were excluded. The final size of the sample was 7,803 persons.

    Although the inclusion probability is smaller in the stratum of larger counties than in other strata, the result gives a relatively large sample for Tallinn. This is necessary for the purpose of analysis, because in Tallinn the response probability is the lowest, but the diversity of households is the largest. Thus, a larger sample size from other (more homogenous) regions guarantees a required accuracy of estimates.

    Mode of data collection

    Face-to-face [f2f]

    Sampling error estimates

    Only a part of the population can be surveyed by sample survey. Because of that, the indicators calculated on the basis of sample data are always somewhat different from the actual value of the estimated population parameter. Such a difference is called the random error or sampling error of estimation. It is not possible to specify the sampling error exactly, but it can be estimated statistically by taking the variability or dispersion of the statistic that is used for parameter estimation as the basis for the sample design used in the survey. In addition to the sample design, the sampling error depends on the sample size. A smaller sampling error can be expected in case of larger sample sizes.

    An important group of quality indicators consists of the accuracy estimations of parameters calculated on the basis of the survey. The accuracy estimations provided by Statistics Estonia are estimates of the sampling error i.e. these estimations do not reflect other possible error sources. Estimates of sampling errors are calculated for more important indicators.

    Standard error is the main sampling error estimate. Standard error is a mathematical value that describes the variance of parameter estimates given on the basis of the sample. As the sample is selected randomly, the parameter estimate is also a random variable and variance can be calculated for it. The smaller the variance, the more exact is the parameter estimate. The variance of estimate depends on the sample size and sample design.

    Relative standard error shows the proportion that the estimate’s standard error forms of the estimated value. As a rule, it is presented as a percentage. Relative standard error is independent from measuring units, due to that it allows for comparing of different parameter estimations with each other irrespective of measurement units. Relative standard error is an operative tool in order to receive a quick overview of the accuracy of estimates.

  4. a

    2020 ACS Demographic & Socio-Economic Data Of Oklahoma At Zip Code Level

    • one-health-data-hub-osu-geog.hub.arcgis.com
    Updated May 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    snakka_OSU_GEOG (2024). 2020 ACS Demographic & Socio-Economic Data Of Oklahoma At Zip Code Level [Dataset]. https://one-health-data-hub-osu-geog.hub.arcgis.com/items/5175de388f27415caf6087afafa1cc52
    Explore at:
    Dataset updated
    May 22, 2024
    Dataset authored and provided by
    snakka_OSU_GEOG
    Area covered
    Description

    we utilized data from two main sources: the United States Census Bureau's American Community Survey (ACS) and the Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry (CDC/ATSDR) Social Vulnerability Index (SVI).American Community Survey (ACS):Conducted by the U.S. Census Bureau, the ACS is an ongoing survey that provides detailed demographic and socio-economic data on the population and housing characteristics of the United States.The survey collects information on various topics such as income, education, employment, health insurance coverage, and housing costs and conditions.It offers more frequent and up-to-date information compared to the decennial census, with annual estimates produced based on a rolling sample of households.The ACS data is essential for policymakers, researchers, and communities to make informed decisions and address the evolving needs of the population.CDC/ATSDR Social Vulnerability Index (SVI):Created by ATSDR’s Geospatial Research, Analysis & Services Program (GRASP) and utilized by the CDC, the SVI is designed to identify and map communities that are most likely to need support before, during, and after hazardous events.

  5. p

    Household Income and Expenditure Survey 2022 - Tuvalu

    • microdata.pacificdata.org
    Updated May 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Division (2025). Household Income and Expenditure Survey 2022 - Tuvalu [Dataset]. https://microdata.pacificdata.org/index.php/catalog/880
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Central Statistics Division
    Time period covered
    2022 - 2023
    Area covered
    Tuvalu
    Description

    Abstract

    The main purpose of a Household Income and Expenditure Survey (HIES) survey was to present high quality and representative national household data on income and expenditure in order to update Consumer Price Index (CPI), improve statistics on National Accounts and measure poverty within the country. These statistics are a requirement for evidence based policy-making in reducing poverty within the country and monitor progress in the national strategic plan in place.

    Geographic coverage

    Urban (Funafuti) and rural areas (outer islands).

    Analysis unit

    Household and Individual.

    Universe

    Private households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sampling design of the Tuvalu 2022 HIES consists in the random selection of the appropriate numbers of households (within each strata urban and rural) in order to be able to disaggregate HIES results at the strata level (in addition to National level). The urban strata of Tuvalu is made of the island of Funafuti (as a whole) and the rest of the country (all outer islands) compose the rural strata. The statistical unit used to run this sampling analysis is the household. The sample procedure is based on the following steps: - Assessment of the accuracy of the previous 2015 HIES in terms of per capita total expenditure (variable of interest) and check whether the sample size at that time were appropriate and correctly distributed among both stratas, - Update this assessment process by using the most recent population count to get the new sample size and distribution, - Proceed to the random selection of households using this most recent population count. The sampling frame (most recent household listing and population count) used to update and select is the 2021 Tuvalu Household Listing conducted by the Central Statistics Division of Tuvalu. At the National level, the 2015 Tuvalu HIES reported a good accuracy of the per capita total expenditure (less than 5%) but the disaggregation results by strata showed a lower quality of the result in Tuvalu urban. The Tuvalu 2021 household listing provides the most recent distribution of the households across all the islands of Tuvalu. This step consists in updating the accuracy of the previous 2015 HIES by using this recent household count and get the appropriate RSE by changing the sample size. For budget constraint, the total sample size cannot get increased, as the funding situation does not allow higher sample size. It means that the only parameter that can be modified is the distribution of the sample across the strata. Sample size by stratum: -Urban: 350 (out of 1,010 urban households as per the 2021 listing) -Rural: 310 (out of 835 rural households as per the 2021 listing) -National: 660 (out of 1,845 total households as per the 2021 listing)

    2015 per capita mean total expenditure (AUD): -Urban: 3,190 -Rural: 2,780 -National: 3,000

    Relative Standard Error (RSE): -Urban: 5.1% -Rural: 4.1% -National: 3.3%

    It results from this new sample design a new distribution that shows an increase in Funafuti urban, mainly due to: - The low quality of the survey results from the 2015 HIES, - The number of households that have increased by more than 15% between 2015 and 2020 in Tuvalu urban area.

    The household selection process is based on a simple random procedure within each stratum: - The 350 households in Funafuti are selected using the same probability of selection across all villages of the islands - The 310 household in rural Tuvalu are distributed proportionally to the size of each rural island of Tuvalu. This proportional allocation of the sample across rural Tuvalu islands generates the best accuracy at the strata level.

    Distribution of sample accross strata: Urban: Funafuti 350 Rural: Nanumea 42
    Nanumaga 37 Niutao 46
    Nui 39
    Vaitupu 75
    Nukufetau 45
    Nukulaelae 23
    Niukalita 4

    Non-response is a problem in surveys, and it is crucial that the field teams interview the selected households (the location on the map and the name of the household head are used to help to determine the selected households). During the first visit, interviewers must do their best to convince the household head to participate in the survey (and get his/her approval to proceed to interview). It may happen in the field that the first visit results in: I. A refusal: the household head does not show any interest in the survey and is reluctant to participate, II. The house is empty (household members away at the time of the visit).

    (I) Refusal: if the interviewer cannot convince the household head to participate, he has to liaise with the survey management, and the supervisor will help in the discussion to convince the household head to respond. In this case, it is important to mention that all responses are kept confidential and insist on the importance of it for the benefit of Tuvalu population. (II) Empty house: the interviewer must investigate (checking with neighbours) whether or not the house is still inhabited by the family: o If it is not the case, the dwelling is then vacant, and the replacement procedure must be activated. o If the dwelling is still occupied, interviewer must come back later the same day or the day after at different time

    Only in extreme cases of persistent refusal or empty house (household members away during the time of the collection) the replacement procedure must be activated. The replacement procedure consists in changing the selected household to the closest neighbour who is available.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The 2022 Tuvalu Household Income and Expenditure Survey (HIES) questionnaire was developed in English language and it follows the Pacific Standard HIES questionnaire structure. It is administered on CAPI using Survey Solution, and the diary is no longer part of the form. All transactions (food, non food, home production and gifts) are collected through different recall sections during the same visit. The traditional 14 days diary is no longer recommended in the region. This new method of implementing the HIES present some interesting and valuable advantages such as: cost saving, data quality, time reduction for data processing and reporting. The 2022 HIES of Tuvalu was directly integrated to a census through a Long Form Census (LFC). The LFC was an experiment led by the World Bank and the Pacific Community to try and group a census and a HIES collection. All households were normally enumerated during the 2022 Census and households selected to participate to the HIES were then asked the HIES questions.

    Below is a list of all modules in this questionnaire: -Household ID -Demographic characteristics -Education -Health -Functional difficulties -Communication -Alcohol -Other individual expenses -Labour force -Fisheries -Handicraft and home-processed food -Dwelling characteristics -Assets -Home maintenance -Vehicles -International trips -Domestic trips -Household services -Financial support -Other household expenditure -Ceremonies -Remittances -Food insecurity -Financial inclusion -Livestock & aquaculture -Agriculture parcel -Agriculture vegetables -Agriculture rootcrops -Agriculture fruits

    The survey questionnaire can be found in this documentation.

    Cleaning operations

    Data was edited, cleaned and imputed using the software Stata.

    Response rate

    There was a total of 662 households from the original selection of the sample. 592 of them were contacted 528 accepted the interviews. The number of valid households is 464, or 70% of households before replacement. After replacement, 54 households were considered valid making the final completion rate at 78% (73% in urban and 85% in rural area).

  6. General Household Survey 2007 - South Africa

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Sep 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics South Africa (2021). General Household Survey 2007 - South Africa [Dataset]. https://microdata.worldbank.org/index.php/catalog/924
    Explore at:
    Dataset updated
    Sep 21, 2021
    Dataset authored and provided by
    Statistics South Africahttp://www.statssa.gov.za/
    Time period covered
    2007
    Area covered
    South Africa
    Description

    Abstract

    The GHS is an annual household survey, specifically designed to measure various aspects of the living circumstances of South African households. The key findings reported here focus on the five broad areas covered by the GHS, namely: education, health, activities related to work and unemployment, housing and household access to services and facilities.

    Geographic coverage

    The scope of the General Household Survey 2007 was national coverage.

    Analysis unit

    The units of anaylsis for the General Household Survey 2007 are individuals and households.

    Universe

    The survey covered all de jure household members (usual residents) of households in the nine provinces of South Africa and residents in workers' hostels. The survey does not cover collective living quarters such as students' hostels, old age homes, hospitals, prisons and military barracks.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample design for the GHS 2007 was based on a master sample (MS) that was designed during 2003 and used for the first time in 2004. This master sample was developed specifically for household sample surveys that were conducted by Statistics South Africa between 2004 and 2007. These include surveys such as the annual Labour Force Surveys (LFS), General Household Survey (GHS) and the Income and Expenditure Survey (IES).

    A multi-stage stratified area probability sample design was used. Stratification was done per province (nine provinces) and according to district council (DC) (53 DCs) within provinces. These stratification variables were mainly chosen to ensure better geographical coverage, and to enable analysts to disaggregate the data at DC level.

    The design included two stages of sampling. Firstly PSUs were systematically selected using Probability Proportional to Size (PPS) sampling techniques. During the second stage of sampling, Dwelling Units (DUs) were systematically selected as Secondary Sampling Units (SSUs). Census Enumeration Areas (EAs) as delineated for Census 2001 formed the basis of the PSUs. EAs were pooled when needed to form PSUs of adequate size (72 dwelling units or more) for the first stage of sampling. The following criteria were used for PSU formation:

    • No overlapping between any two PSUs; • Complete coverage of the sampling population; • Fully identifiable (e.g. in the case of a household survey, information on the geographical boundaries of the PSU should enable the exact location of the PSU); • Secondary sampling units (SSUs) must be clearly identifiable within PSUs; • Updated information on the number of SSUs within all the PSUs had to be available; • PSUs must be sufficiently large in respect of the number of SSUs included to enable the forming of a predetermined number of clusters of SSUs, with the size of a cluster equal to the sample take of SSUs within a PSU, taking all types of surveys into consideration; and • PSUs must also be sufficiently small to facilitate the listing and also regular updating of the SSUs within them.

    A PPS sample of PSUs was drawn in each stratum, with the measure of size being the number of households in the PSU. Altogether approximately 3 000 PSUs were selected. In each selected PSU a systematic sample of ten dwelling units was drawn, thus, resulting in approximately 30 000 dwelling units. All households in the sampled dwelling units were enumerated.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The GHS 2007 questionnaire collected data on: Household characteristics: Dwelling type, home ownership, access to water and sanitation facilities, access to services, transport, household assets, land ownership, agricultural production Individuals' characteristics: demographic characteristics, relationship to household head, marital status, language, education, employment, income, health, disability, access to social services, mortality. Women's characteristics: fertility

    Response rate

    29 311 (84,0%) of the expected 34 902 interviews were successfully completed. This response rate is 2,0% points down from the 86,0% response rate as reported in the GHS 2006 report. It was not possible to complete interviews in 5,1% of the sampled dwelling units because of reasons such as refusals or absenteeism. An additional 10,9% of all interviews were not conducted for various reasons such as the sampled dwelling units had become vacant or had changed status (e.g.,. they were used as shops/small businesses at the time of the enumeration, but were originally listed as dwelling units).

    Sampling error estimates

    Estimation and use of standard error The published results of the General Household Survey are based on representative probability samples drawn from the South African population, as discussed in the section on sample design. Consequently, all estimates are subject to sampling variability. This means that the sample estimates may differ from the population figures that would have been produced if the entire South African population had been included in the survey. The measure usually used to indicate the probable difference between a sample estimate and the corresponding population figure is the standard error (SE), which measures the extent to which an estimate might have varied by chance because only a sample of the population was included. There are two major factors which influence the value of a standard error. The first factor is the sample size. Generally speaking, the larger the sample size, the more precise the estimate and the smaller the standard error. Consequently, in a national household survey such as the GHS, one expects more precise estimates at the national level than at the provincial level due to the larger sample size involved. The second factor is the variability between households of the parameter of the population being estimated, for example, the number of unemployed persons in the household.

  7. f

    Variable description and sources of data.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajesh Sharma (2023). Variable description and sources of data. [Dataset]. http://doi.org/10.1371/journal.pone.0204940.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Rajesh Sharma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Variable description and sources of data.

  8. U

    Growth of American families (GAF), 1960

    • dataverse-staging.rdmc.unc.edu
    • datasearch.gesis.org
    pdf, txt
    Updated Nov 30, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal K Whelpton; Arthur A. Campbell; John E. Patterson; Pascal K Whelpton; Arthur A. Campbell; John E. Patterson (2007). Growth of American families (GAF), 1960 [Dataset]. https://dataverse-staging.rdmc.unc.edu/dataset.xhtml?persistentId=hdl:1902.29/D-302
    Explore at:
    pdf(6740060), txt(2699224)Available download formats
    Dataset updated
    Nov 30, 2007
    Dataset provided by
    UNC Dataverse
    Authors
    Pascal K Whelpton; Arthur A. Campbell; John E. Patterson; Pascal K Whelpton; Arthur A. Campbell; John E. Patterson
    License

    https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/D-302https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/D-302

    Time period covered
    May 1960 - Jul 1960
    Description

    The 1960 Growth of American Families(GAF) Study was conceived as a follow-up and extension of the first GAF Study."The 1960 GAF Study provided, for the first time, an opportunity to evaluate the validity of women's fertility expectations; to determine whether the total number of children expected by such women changed significantly between 1955 and 1960; and to begin the analysis of time trends in the proportions using contraception, the level of family size desired, and the patterns o f group differences in fertility. One important difference in the 1960 Study was the inclusion of nonwhite women in the sample, which provides the opportunity to estimate parameters for the total population. Factual information included: age; education; income; occupation; employment history; residence history; marriage history; religious preference and attendance; parent's occupation, religion, and nationality."

  9. p

    National Sustainable Development Plan Baseline Survey 2019, Household Income...

    • microdata.pacificdata.org
    Updated Oct 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanuatu National Statistics Office (2020). National Sustainable Development Plan Baseline Survey 2019, Household Income and Expenditure Survey 2019 - Vanuatu [Dataset]. https://microdata.pacificdata.org/index.php/catalog/742
    Explore at:
    Dataset updated
    Oct 9, 2020
    Dataset authored and provided by
    Vanuatu National Statistics Office
    Time period covered
    2019 - 2020
    Area covered
    Vanuatu
    Description

    Abstract

    The National Sustainable Development Plan (NSDP) Baseline Survey 2019 is an expanded Household Income and Expenditure Survey (HIES) and is inclusive of health educational, cultural, and productive dimensions previously uncollected or in need of updating. The results of this survey will inform directly more than 30 key indicators listed in the NSDP M&E (Monitoring and Evaluation) Framework, as well as more than 40 of the listed indicators for the United Nations Sustainable Development Goals (SDGs). The NSDP Baseline Survey presents an opportunity as well for Vanuatu to establish a comprehensive Melanesian Wellbeing baseline as well as an updated baseline for the calculation of the Consumer Price Index (CPI) and revising National Accounts.

    Geographic coverage

    National coverage. Below are the details of this national coverage: 1. National (Vanuatu); 2. Provinces (Torba, Sanma, Penama, Malampa, Shefa, Tafea); 4. Area Councils (Torres Area council right to Futuna & Aneityum Area Council); 5. Villages / Towns; 6. Urban/Rural.

    Analysis unit

    Household and Individual.

    Universe

    All de jure residents.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample size for this survey was determined using the previous 2010 Household Income and Expenditure Survey (HIES) outputs, and especially the per capita monthly total expenditure. From the 2010 HIES the mean, standard deviation and standard error were computed (per capita expenditure) and from the 2016 Census the distribution of the population across the 6 provinces of Vanuatu was used as a base. According to the accuracy of this variable of interest within each province the sample size per province were adjusted in order to get an expected sampling error around 5% within each province. The sampling frame used is the last 2016 Vanuatu census for the computation of the probability of selection of the Enumeration Areas (EAs) and the random selection method started with the random selection of EAs using the probability proportional to size. Then within each selected EAs 10 households were randomly selected using the sampling uniformed method. Within each selected EA the household listing were updated by the team before random selection and interview.

    i) The only variable considered is per capita total household expenditure (variable of interest), as in addition to being one of the main indicators derived from the Household Income and Expenditure Survey (HIES), it is likely highly correlated with many other variables of interest (e.g. poverty). From the 2010 HIES dataset, using this variable of interest, a list of relevant indicators were calculated, those indicators provide information on: - (a)the status of the household expenditure distribution within each province, - (b) The efficiency provided by the 2010 HIES sample design - (c) The accuracy of the estimates calculated from the 2010 HIES dataset (especially the per capita household expenditure, our variable or interest)

    ii) The original dataset has been trimmed using the variable of interest, the lowest and the highest percentiles (the 1% households with the lowest and highest per capita total household expenditure) were removed from the analysis (outliers). The dataset ends up with 4,289 households (given 4,377 households were completed).

    iii) The 2010 Vanuatu HIES sample was based on a stratified multi stages selection - Stratification: geographical provinces (by urban / rural locations) - First stage of selection: Enumerations Areas (EAs) with probability of selection proportional to size - Second stage: households, with uniform probability of selection within the EAs

    iv) The mean and standard deviation indicate the status of the variable of interest within each strata. The intracluster correlation (p), and the design effect (DEFF) highlight the efficiency of the sampling strategy, and the standard error/relative standard error (SE/RSE) of the variable of interest show its accuracy.

    v) The purpose of this analysis is to get some insights from the 2010 HIES sample design in order to improve the 2019 survey. There is no point to improve the sample size in strata where the sample is not efficient (the gain in accuracy will be minor compared to the related cost).

    vi) The challenge in the 2019 Vanuatu baseline survey: - Meet precision targets in each strata (provincial level) including Penama where Ambae island has been evacuated at the time of the sample design. - Acceptable sample size (due to budget constraints) - Following international recommendations (12 months of field operation) - Enhance the monitoring and supervision of the field staff and simplify management of the logistics in the field

    ==> Optimize the variance/cost ratio of the survey design vii) Table 1 from the Document Sample Design (provided as External Resources) presents the Vanuatu 2010 HIES survey specifications, efficiency and accuracy in each strata (for the variable of interest). It shows that some improvements can be done in Torba, and Shefa rural (where the RSE is higher than 5%), and it shows a high intraclass correlation in Malampa, Shefa rural and Tafea (that lead to a high design effect in those strata). In Torba, the high design effect comes from the high number of households interviewed in each selected EA (on average 33 households per selected EA in this strata were interviewed). - Torba: the sample size is good, there is just a need to reduce the number of households to interview within each strata (and in order to keep a similar sample size the number of EAs to select in the province will be increased) - Malampa: given the high intracluster correlation in this province, a higher number of EAs to select is required (with the same number of households per EA to interview). - Shefa rural: keep the same number of households to interview within each EA, and increase the number of EA to select (this will lead to a higher sample size) - Tafea: similar to Malampa province, the high intraclass correlation indicates that the number of EAs to select has to be increased (therefore the sample size as well). The sample size has to be increased in Malampa, Shefa rural and Tafea, for the rest, the 2019 design will have to be similar as 2010 (in order to provide at least the same level of accuracy). viii) The 2019 Vanuatu base line survey follows the international recommendations in terms of data collection schedule (12-month coverage) and considers a better management and supervision of the field staff. In this context, the field staff will work by team, given that: - A team is made of 1 supervisor (team leader) and 2 or 3 interviewers - Each interviewer will be responsible for 5 interview per round - A round of survey is a 1 week period - 1 EA is covered during 1 round, after the round completion, the team moves to the next EA for the next round. - A team complete 32 rounds during the 12 month field operation period (roughly every 2 rounds/2 weeks) of work is followed by 1 round/1 week of rest). ix) Table 3 from the Document Sample Design (provided as External Resources) presents a survey schedule starting February 2019 and ending February 2020. During this period of 32 working weeks (corresponding to 32 different selected EAs) the teams will be on the field (a 3 weeks period of rest during Christmas period).

    x) The number of interviewer by team and number of team by province will determine the total sample size within each province. A team made of 3 interviewers can achieve 480 households over the period, while a team of 2 interviewers can achieve only 320 cases.

    xi) The intraclass correlation is used to calculate the precision loss due to clustering. Like the standard deviation, the intracluster correlation is considered to be a true population parameter, and therefore transferable between designs. We have to accept the hypothesis that this correlation factor has not changed during the period 2010-2019, and therefore can be used to predict DEFF and RSE for the next survey given an adjusted design (based on the conclusions provided by the 2010 design). Table 2 from the Document Sample Design (provided as External Resources) predicts the design effect and sampling error of the variable of interest given the new sample design that is based on: - the sample size within each strata - the number of teams within each strata - the number of interviewers per team In order to allow more flexibility in the sample size, it is preferable to set up some teams of 3 interviewers, that can achieve 480 households, which represent a good sample size for Torba and Sanma urban and some teams of 2 interviewers that will achieve 320 households each (2 teams will be required in other provinces).

    xii) The proposed design in Table 2 from the Document Sample Design (provided as External Resources) shows a total sample size of 4,640 households and a higher level of accuracy of the estimate of the variable of interest in all the stratas. Only Shefa rural shows a RSE higher than 5%, which will be still acceptable. The high intraclass correlation in Shefa rural impacts the variance of the estimates and lead to an increase the sample size or a decrease of the number of households to interview per EA which is logistically and financially not recommended.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The questionnaire was developed in English using the World Bank software Survey Solutions. This questionnaire is divided into 18 modules that are detailed below.

    -Introduction (geographic areas, list of household members) -Module 1: Demographic characteristics: ethnicity, marital status; -Module 2: Wellbeing: culture

  10. f

    Hyper-parameter values.

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephan Dietrich; Aline Meysonnat; Francisco Rosales; Victor Cebotari; Franziska Gassmann (2023). Hyper-parameter values. [Dataset]. http://doi.org/10.1371/journal.pone.0271373.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Stephan Dietrich; Aline Meysonnat; Francisco Rosales; Victor Cebotari; Franziska Gassmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hyper-parameter values.

  11. d

    Data from: 2010 County and City-Level Water-Use Data and Associated...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). 2010 County and City-Level Water-Use Data and Associated Explanatory Variables [Dataset]. https://catalog.data.gov/dataset/2010-county-and-city-level-water-use-data-and-associated-explanatory-variables
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    This data release contains the input-data files and R scripts associated with the analysis presented in [citation of manuscript]. The spatial extent of the data is the contiguous U.S. The input-data files include one comma separated value (csv) file of county-level data, and one csv file of city-level data. The county-level csv (“county_data.csv”) contains data for 3,109 counties. This data includes two measures of water use, descriptive information about each county, three grouping variables (climate region, urban class, and economic dependency), and contains 18 explanatory variables: proportion of population growth from 2000-2010, fraction of withdrawals from surface water, average daily water yield, mean annual maximum temperature from 1970-2010, 2005-2010 maximum temperature departure from the 40-year maximum, mean annual precipitation from 1970-2010, 2005-2010 mean precipitation departure from the 40-year mean, Gini income disparity index, percent of county population with at least some college education, Cook Partisan Voting Index, housing density, median household income, average number of people per household, median age of structures, percent of renters, percent of single family homes, percent apartments, and a numeric version of urban class. The city-level csv (city_data.csv) contains data for 83 cities. This data includes descriptive information for each city, water-use measures, one grouping variable (climate region), and 6 explanatory variables: type of water bill (increasing block rate, decreasing block rate, or uniform), average price of water bill, number of requirement-oriented water conservation policies, number of rebate-oriented water conservation policies, aridity index, and regional price parity. The R scripts construct fixed-effects and Bayesian Hierarchical regression models. The primary difference between these models relates to how they handle possible clustering in the observations that define unique water-use settings. Fixed-effects models address possible clustering in one of two ways. In a "fully pooled" fixed-effects model, any clustering by group is ignored, and a single, fixed estimate of the coefficient for each covariate is developed using all of the observations. Conversely, in an unpooled fixed-effects model, separate coefficient estimates are developed only using the observations in each group. A hierarchical model provides a compromise between these two extremes. Hierarchical models extend single-level regression to data with a nested structure, whereby the model parameters vary at different levels in the model, including a lower level that describes the actual data and an upper level that influences the values taken by parameters in the lower level. The county-level models were compared using the Watanabe-Akaike information criterion (WAIC) which is derived from the log pointwise predictive density of the models and can be shown to approximate out-of-sample predictive performance. All script files are intended to be used with R statistical software (R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org) and Stan probabilistic modeling software (Stan Development Team. 2017. RStan: the R interface to Stan. R package version 2.16.2. http://mc-stan.org).

  12. f

    How do young low-income university students deal with risk and time...

    • scielo.figshare.com
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Érica Teixeira dos Santos; Marcelo Cabus Klotzle; Paulo Vitor Jordão da Gama Silva; Antonio Carlos Figueiredo Pinto (2023). How do young low-income university students deal with risk and time preferences in Brazil?, [Dataset]. http://doi.org/10.6084/m9.figshare.20970276.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELO journals
    Authors
    Érica Teixeira dos Santos; Marcelo Cabus Klotzle; Paulo Vitor Jordão da Gama Silva; Antonio Carlos Figueiredo Pinto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    ABSTRACT This article sought to understand the behavior of young low-income university students through an experiment based on prospect and hyperbolic discounting theory, with risk and time preferences, and their relationships with financial literacy with regard to choice probability distortions. There is a notable lack of studies that simultaneously address risk and time preferences in low-income urban groups, relating experiments based on prospect theory to capture probability distortions in choice processes. This study opens the doors for the question of the relationship between poverty and risk and time preferences to be better discussed in Brazil with the aim of providing evidence that supports national financial literacy plans. The study shows the importance of financial education as a means of reducing agents’ probability distortion. This is crucial, given that probability distortion is one of the pillars of prospect theory. This experiment was based on prospect and hyperbolic discounting theory and used value, weight, and quasi-hyperbolic discounting functions within a maximum likelihood methodology to estimate the risk and time parameters with sociodemographic variables, and with the Financial Literacy Index moderating variable, in a private HEI, with 54 students and 5,940 lotteries. It was observed that low-income urban populations in emerging economies have similar risk and loss aversion parameters to rural populations in developing countries. Low-income students have a greater preference for the present, with it being perceived that a small increase in income is associated with a higher level of patience, making decisions more rational. A better financial education could lead to a smaller probability distortion.

  13. f

    National Economic Development and Disparities in Body Mass Index: A...

    • plos.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melissa Neuman; Ichiro Kawachi; Steven Gortmaker; SV. Subramanian (2023). National Economic Development and Disparities in Body Mass Index: A Cross-Sectional Study of Data from 38 Countries [Dataset]. http://doi.org/10.1371/journal.pone.0099327
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Melissa Neuman; Ichiro Kawachi; Steven Gortmaker; SV. Subramanian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundIncreases in body mass index (BMI) and the prevalence of overweight in low- and middle income countries (LMICs) are often ascribed to changes in global trade patterns or increases in national income. These changes are likely to affect populations within LMICs differently based on their place of residence or socioeconomic status (SES).ObjectiveUsing nationally representative survey data from 38 countries and national economic indicators from the World Bank and other international organizations, we estimated ecological and multilevel models to assess the association between national levels of gross domestic product (GDP), foreign direct investment (FDI), and mean tariffs and BMI.DesignWe used linear regression to estimate the ecological association between average annual change in economic indicators and BMI, and multilevel linear or ordered multinomial models to estimate associations between national economic indicators and individual BMI or over- and underweight. We also included cross-level interaction terms to highlight differences in the association of BMI with national economic indicators by type of residence or socioeconomic status (SES).ResultsThere was a positive but non-significant association of GDP and mean BMI. This positive association of GDP and BMI was greater among rural residents and the poor. There were no significant ecological associations between measures of trade openness and mean BMI, but FDI was positively associated with BMI among the poorest respondents and in rural areas and tariff levels were negatively associated with BMI among poor and rural respondents.ConclusionMeasures of national income and trade openness have different associations with the BMI across populations within developing countries. These divergent findings underscore the complexity of the effects of development on health and the importance of considering how the health effects of “globalizing” economic and cultural trends are modified by individual-level wealth and residence.

  14. f

    Panel GMM results with LogGDP as dependent variable.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajesh Sharma (2023). Panel GMM results with LogGDP as dependent variable. [Dataset]. http://doi.org/10.1371/journal.pone.0204940.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Rajesh Sharma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Panel GMM results with LogGDP as dependent variable.

  15. a

    Estimated Displacement Risk - Overall Displacement

    • affh-data-resources-cahcd.hub.arcgis.com
    Updated Sep 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Housing and Community Development (2022). Estimated Displacement Risk - Overall Displacement [Dataset]. https://affh-data-resources-cahcd.hub.arcgis.com/datasets/estimated-displacement-risk-overall-displacement
    Explore at:
    Dataset updated
    Sep 27, 2022
    Dataset authored and provided by
    Housing and Community Development
    Area covered
    Description

    Urban Displacement Project’s (UDP) Estimated Displacement Risk (EDR) model for California identifies varying levels of displacement risk for low-income renter households in all census tracts in the state from 2015 to 2019(1). The model uses machine learning to determine which variables are most strongly related to displacement at the household level and to predict tract-level displacement risk statewide while controlling for region. UDP defines displacement risk as a census tract with characteristics which, according to the model, are strongly correlated with more low-income population loss than gain. In other words, the model estimates that more low-income households are leaving these neighborhoods than moving in.This map is a conservative estimate of low-income loss and should be considered a tool to help identify housing vulnerability. Displacement may occur because of either investment, disinvestment, or disaster-driven forces. Because this risk assessment does not identify the causes of displacement, UDP does not recommend that the tool be used to assess vulnerability to investment such as new housing construction or infrastructure improvements. HCD recommends combining this map with on-the-ground accounts of displacement, as well as other related data such as overcrowding, cost burden, and income diversity to achieve a full understanding of displacement risk.If you see a tract or area that does not seem right, please fill out this form to help UDP ground-truth the method and improve their model.How should I read the displacement map layers?The AFFH Data Viewer includes three separate displacement layers that were generated by the EDR model. The “50-80% AMI” layer shows the level of displacement risk for low-income (LI) households specifically. Since UDP has reason to believe that the data may not accurately capture extremely low-income (ELI) households due to the difficulty in counting this population, UDP combined ELI and very low-income (VLI) household predictions into one group—the “0-50% AMI” layer—by opting for the more “extreme” displacement scenario (e.g., if a tract was categorized as “Elevated” for VLI households but “Extreme” for ELI households, UDP assigned the tract to the “Extreme” category for the 0-50% layer). For these two layers, tracts are assigned to one of the following categories, with darker red colors representing higher displacement risk and lighter orange colors representing less risk:• Low Data Quality: the tract has less than 500 total households and/or the census margins of error were greater than 15% of the estimate (shaded gray).• Lower Displacement Risk: the model estimates that the loss of low-income households is less than the gain in low-income households. However, some of these areas may have small pockets of displacement within their boundaries. • At Risk of Displacement: the model estimates there is potential displacement or risk of displacement of the given population in these tracts.• Elevated Displacement: the model estimates there is a small amount of displacement (e.g., 10%) of the given population.• High Displacement: the model estimates there is a relatively high amount of displacement (e.g., 20%) of the given population.• Extreme Displacement: the model estimates there is an extreme level of displacement (e.g., greater than 20%) of the given population. The “Overall Displacement” layer shows the number of income groups experiencing any displacement risk. For example, in the dark red tracts (“2 income groups”), the model estimates displacement (Elevated, High, or Extreme) for both of the two income groups. In the light orange tracts categorized as “At Risk of Displacement”, one or all three income groups had to have been categorized as “At Risk of Displacement”. Light yellow tracts in the “Overall Displacement” layer are not experiencing UDP’s definition of displacement according to the model. Some of these yellow tracts may be majority low-income experiencing small to significant growth in this population while in other cases they may be high-income and exclusive (and therefore have few low-income residents to begin with). One major limitation to the model is that the migration data UDP uses likely does not capture some vulnerable populations, such as undocumented households. This means that some yellow tracts may be experiencing high rates of displacement among these types of households. MethodologyThe EDR is a first-of-its-kind model that uses machine learning and household level data to predict displacement. To create the EDR, UDP first joined household-level data from Data Axle (formerly Infogroup) with tract-level data from the 2014 and 2019 5-year American Community Survey; Affirmatively Furthering Fair Housing (AFFH) data from various sources compiled by California Housing and Community Development; Longitudinal Employer-Household Dynamics (LEHD) Origin-Destination Employment Statistics (LODES) data; and the Environmental Protection Agency’s Smart Location Database.UDP then used a machine learning model to determine which variables are most strongly related to displacement at the household level and to predict tract-level displacement risk statewide while controlling for region. UDP modeled displacement risk as the net migration rate of three separate renter households income categories: extremely low-income (ELI), very low-income (VLI), and low-income (LI). These households have incomes between 0-30% of the Area Median Income (AMI), 30-50% AMI, and 50-80% AMI, respectively. Tracts that have a predicted net loss within these groups are considered to experience displacement in three degrees: elevated, high, and extreme. UDP also includes a “At Risk of Displacement” category in tracts that might be experiencing displacement.What are the main limitations of this map?1. Because the map uses 2019 data, it does not reflect more recent trends. The pandemic, which started in 2020, has exacerbated income inequality and increased housing costs, meaning that UDP’s map likely underestimates current displacement risk throughout the state.2. The model examines displacement risk for renters only, and does not account for the fact that many homeowners are also facing housing and gentrification pressures. As a result, the map generally only highlights areas with relatively high renter populations, and neighborhoods with higher homeownership rates that are known to be experiencing gentrification and displacement are not as prominent as one might expect.3. The model does not incorporate data on new housing construction or infrastructure projects. The map therefore does not capture the potential impacts of these developments on displacement risk; it only accounts for other characteristics such as demographics and some features of the built environment. Two of UDP’s other studies—on new housing construction and green infrastructure—explore the relationships between these factors and displacement.Variable ImportanceFigures 1, 2, and 3 show the most important variables for each of the three models—ELI, VLI, and LI. The horizontal bars show the importance of each variable in predicting displacement for the respective group. All three models share a similar order of variable importance with median rent, percent non-white, rent gap (i.e., rental market pressure calculated using the difference between nearby and local rents), percent renters, percent high-income households, and percent of low-income households driving much of the displacement estimation. Other important variables include building types as well as economic and socio-demographic characteristics. For a full list of the variables included in the final models, ranked by descending order of importance, and their definitions see all three tabs of this spreadsheet. “Importance” is defined in two ways: 1. % Inclusion: The average proportion of times this variable was included in the model’s decision tree as the most important or driving factor.2. MeanRank: The average rank of importance for each variable across the numerous model runs where higher numbers mean higher ranking. Figures 1 through 3 below show each of the model variable rankings ordered by importance. The red lines represent Jenks Breaks, which are designed to sort values into their most “natural” clusters. Variable importance for each model shows a substantial drop-off after about 10 variables, meaning a relatively small number of variables account for a large amount of the predictive power in UDP’s displacement model.Figure 1. Variable Importance for Low Income HouseholdsFor a description of each variable and its source, see this spreadsheet.Figure 2. Variable Importance for Very Low Income HouseholdsFor a description of each variable and its source, see this spreadsheet. Figure 3. Variable Importance for Extremely Low Income HouseholdsFor a description of each variable and its source, see this spreadsheet.Source: Chapple, K., & Thomas, T., and Zuk, M. (2022). Urban Displacement Project website. Berkeley, CA: Urban Displacement Project.(1) UDP used this time-frame because (a) the 2020 census had a large non-response rate and it implemented a new statistical modification that obscures and misrepresents racial and economic characteristics at the census tract level and (b) pandemic mobility trends are still in flux and UDP believes 2019 is more representative of “normal” or non-pandemic displacement trends.

  16. f

    Hyper-parameter search grid.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephan Dietrich; Aline Meysonnat; Francisco Rosales; Victor Cebotari; Franziska Gassmann (2023). Hyper-parameter search grid. [Dataset]. http://doi.org/10.1371/journal.pone.0271373.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Stephan Dietrich; Aline Meysonnat; Francisco Rosales; Victor Cebotari; Franziska Gassmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hyper-parameter search grid.

  17. f

    Routine gathering attendee and population-level parameters.

    • plos.figshare.com
    xls
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan A. Hansen; Alvin X. Han; Joshua M. Chevalier; Ethan Klock; Hiromi Pandithakoralage; Alexandra de Nooy; Tom Ockhuisen; Sarah J. Girdwood; Nkgomeleng A. Lekodeba; Shaukat Khan; Helen E. Jenkins; Cheryl C. Johnson; Jilian A. Sacks; Colin A. Russell; Brooke E. Nichols (2024). Routine gathering attendee and population-level parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0311198.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 4, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Megan A. Hansen; Alvin X. Han; Joshua M. Chevalier; Ethan Klock; Hiromi Pandithakoralage; Alexandra de Nooy; Tom Ockhuisen; Sarah J. Girdwood; Nkgomeleng A. Lekodeba; Shaukat Khan; Helen E. Jenkins; Cheryl C. Johnson; Jilian A. Sacks; Colin A. Russell; Brooke E. Nichols
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Routine gathering attendee and population-level parameters.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD

Current Population Survey (CPS)

Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description

analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

Search
Clear search
Close search
Google apps
Main menu