Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Importance (weight) of variables influencing grizzly bear abundance in northwestern Montana, USA, in 2004. Only candidate variables for abundance, not detection, are shown. Weights for variables that were in the model ≥50% of iterations are in bold. Data include only cells with both types of sampling. HT = Hair Trap, BR = Bear Rub. See Graves et al. (In Review) for more details on specific variables. We did not include further details to maintain focus on the influence of different detection methods.1Experts assigned a value 1–10 to ownership categories based on efforts to protect bears including 1) attractant storage management, 2) enforcement of food storage regulations, and 3) road density and use management. Glacier National Park = 10, US Forest Service = 7, other public land = 3, and private = 1.
Abstract copyright UK Data Service and data collection copyright owner.
Conventional survey tools such as weighting do not address non-ignorable nonresponse that occurs when nonresponse depends on the variable being measured. This paper describes non-ignorable nonresponse weighting and imputation models using randomized response instruments, which are variables that affect response but not the outcome of interest \citep{SunEtal2018}. The paper uses a doubly robust estimator that is valid if one, but not necessarily both, of the weighting and imputation models is correct. When applied to a national 2019 survey, these tools produce estimates that suggest there was non-trivial non-ignorable nonresponse related to turnout, and, for subgroups, Trump approval and policy questions. For example, the conventional MAR-based weighted estimates of Trump support in the Midwest were 10 percentage points lower than the MNAR-based estimates. Data to replicate estimation described in "Countering Non-Ignorable Nonresponse in Survey Models with Randomized Response Instruments and Doubly Robust Estimation"
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de438965https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de438965
Abstract (en): The American Time Use Survey (ATUS) collects information on how people living in the United States spend their time. Data collected in this study measured the amount of time that people spent doing various activities in 2005, such as paid work, child care, religious activities, volunteering, and socializing. Respondents were randomly selected from households that had completed their final month of the Current Population Survey (CPS), and were interviewed two to five months after their household's last CPS interview. Respondents were interviewed only once and reported their activities for the 24-hour period from 4 a.m. on the day before the interview until 4 a.m. on the day of the interview. Respondents indicated the total number of minutes spent on each activity, including where they were and whom they were with. Except for secondary child care, data on activities done simultaneously with primary activities were not collected. Part 1, Respondent and Activity Summary File, contains demographic information about respondents and a summary of the total amount of time they spent doing each activity that day. Part 2, Roster File, contains information about household members and nonhousehold children under the age of 18. Part 3, Activity File, includes additional information on activities in which respondents participated, including the location of each activity and the total time spent on secondary child care. Part 4, Who File, includes data on who was present during each activity. Part 5, ATUS-CPS 2005 File, contains data on respondents and members of their household collected two to five months prior to the ATUS interviews during their participation in the Current Population Survey (CPS). Parts 6-10 contain supplemental data files that can be used for further analysis of the data. Part 6, Case History File, contains information about the interview process, such as identifiers and interview outcome codes. Part 7, Call History File, gives information about each call attempt, including the call date and outcome. Part 8, Trips File, provides information about the number, duration, and purpose of overnight trips away from home for two or more nights in a row. Part 9, Replicate Weights File I, contains base weights, replicated base weights, and replicate final weights for each case that was selected to be interviewed for ATUS, while Part 10, Replicate Weights File II, contains replicate weights that were generated using the 2006 weighting method. Demographic variables include sex, age, race, ethnicity, education level, income, employment status, occupation, citizenship status, country of origin, relationship to household members, and the ages and number of children in the household. The data contain weight variables which should be used in analyzing the data. Unweighted data are not representative of the population due to differences between population groups in both sampling and nonresponse. ATUS weight variables include the ATUS final weight (TUFINLWGT), which indicates the number of person-days the respondent represents, the ATUS base weight (TUBWGT), and a ATUS final weight based on 2006 weighting methodology (TU06FWGT). ATUS weights were selected from the Current Population Survey (CPS), and CPS weights (after the first-stage adjustment) are the basis for the ATUS weights. These base weights were adjusted to account for the fact that less populous states were not oversampled in ATUS, as they were in the CPS. Further adjustments were made to account for the probability of selecting each household within the ATUS sampling strata and the probability of selecting each person from each sample household. Part 9 contains replicate weights for the variable TUFINLWGT, as well as base weights, while Part 10 contains replicate weights for the variable TU06FWGT. ATUS replicate weights were based on the replicate weights developed for the CPS. ATUS began with the CPS replicate weight after the first-stage ratio adjustment, and each replicate was processed through all of the stages of the ATUS weighting procedure. The CPS replicate weights were based on a modified balanced half-sample method of replication, developed in the 1980s by Robert Fay. For more information about the replicate weights, see the publication, Technical Paper 63RV: Current Population Survey -- Design and Methodology, available via the Bureau of Labor Statistics Web site. More information on the weighting variables used in this study can be found in t...
In the November 2016 U.S. presidential election, many state level public opinion polls, particularly in the Upper Midwest, incorrectly predicted the winning candidate. One leading explanation for this polling miss is that the precipitous decline in traditional polling response rates led to greater reliance on statistical methods to adjust for the corresponding bias---and that these methods failed to adjust for important interactions between key variables like educational attainment, race, and geographic region. Finding calibration weights that account for important interactions remains challenging with traditional survey methods: raking typically balances the margins alone, while post-stratification, which exactly balances all interactions, is only feasible for a small number of variables. In this paper, we propose multilevel calibration weighting, which enforces tight balance constraints for marginal balance and looser constraints for higher-order interactions. This incorporates some of the benefits of post-stratification while retaining the guarantees of raking. We then correct for the bias due to the relaxed constraints via a flexible outcome model; we call this approach Double Regression with Post-stratification (DRP). We use these tools to to re-assess a large-scale survey of voter intention in the 2016 U.S. presidential election, finding meaningful gains from the proposed methods. The approach is available in the multical R package. Contains replication materials for "Multilevel calibration weighting for survey data", including raw data, scripts to clean the raw data, scripts to replicate the analysis, and scripts to replicate the simulation study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IVFNs of linguistic variables for importance weights of criteria.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Weights to different model variables by experts (E1–E5).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset represents the anonymised data collected as part of the Viral Communication (Understand-ELSED) project. It includes the two measurements, Phase I (30 October 2020 and 14 December 2020) and Phase II (2 March 2021 and 22 March 2021).
As the dataset involves two measurements (i.e., Phase I and Phase II), it is split into two sections, each of which can be identified by looking at the variable names. Variables corresponding to the Phase I survey will have the prefix “PHASE1_”, while variables from the Phase II survey will have the prefix “PHASE2_”. Exceptions to this are the socio-demographic variables from the main section of the Phase I survey.
Each questionnaire was split into a main and an opt-in section, the cut-off points of which are located after the variables PHASE1_OI_AQ and PHASE2_OI_AQ, respectively. Furthermore, two sets of two weighting variables were calculated. The first set includes weights for analyses only involving Phase I variables, while the second set contains weights for analyses involving any Phase II variables. The corresponding variable labels specify how to use any of the weighting variables, which are located in positions 2 through 5.
In the Phase II survey, we included two experimental setups. For the vaccination origin experiment, we included a grouping variable, PHASE2_HM_VACC_GROUP. The same was done for the risk assessment experiment, with PHASE2_RA_INF_GROUP being the grouping variable.
Abstract copyright UK Data Service and data collection copyright owner.
The Health Survey Northern Ireland (HSNI) was commissioned by the Department of Health in Northern Ireland and the Central Survey Unit (CSU) of the Northern Ireland Statistics and Research Agency (NISRA) carried out the survey on their behalf. This survey series has been running on a continuous basis since April 2010 with separate modules for different policy areas included in different financial years. It covers a range of health topics that are important to the lives of people in Northern Ireland. The HSNI replaces the previous Northern Ireland Health and Social Wellbeing Survey (available under SNs 4589, 4590 and 5710).
Adult BMI, height and weight measurements, accompanying demographic and derived variables, geography, and a BMI weighting variable, are available in separate datasets for each survey year.
Further information is available from the Northern Ireland Statistics and Research Agency and the Department of Health (Northern Ireland) survey webpages.
Data gathered in the HSNI 2010-2011. Variables include measured height and weight, calculated BMI including groupings, age, sex and geography.
Abstract copyright UK Data Service and data collection copyright owner.
The Active Lives Children and Young People Survey, which was established in September 2017, provides a world-leading approach to gathering data on how children engage with sport and physical activity. This school-based survey is the first and largest established physical activity survey with children and young people in England. It gives anyone working with children aged 5-16 key insight to help understand children's attitudes and behaviours around sport and physical activity. The results will shape and influence local decision-making as well as inform government policy on the PE and Sport Premium, Childhood Obesity Plan and other cross-departmental programmes. More general information about the study can be found on the Sport England Active Lives Survey webpage and the Active Lives Online website, including reports and data tables.The Active Lives Children and Young People Survey, 2018-2019 was conducted during school academic year 2018 / 2019. It ran from autumn term 2018 to summer term 2019 and excludes school holidays. The survey identifies how participation varies across different activities and sports, by regions of England, between school types and terms, and between different demographic groups in the population. The survey measures levels of activity (active, fairly active and less active), attitudes towards sport and physical activity, swimming capability, the proportion of children and young people that volunteer in sport, sports spectating, and wellbeing measures such as happiness and life satisfaction. The questionnaire was designed to enable analysis of the findings by a broad range of variables, such as gender, family affluence and school year.
The following datasets are available:
For further information about the variables available for analysis, and the relevant school years asked survey questions, please see the supporting documentation. Please read the documentation before using the datasets.
Latest edition information
For the second edition (January 2024), the Teacher dataset now includes a weighting variable (‘wt_teacher’). Previously, weighting was not available for these data.
Topics covered in the Active Lives Children and Young People Survey include:
Survey weighting allows researchers to account for bias in survey samples, due to unit nonresponse or convenience sampling, using measured demographic covariates. Unfortunately, in practice, it is impossible to know whether the estimated survey weights are sufficient to alleviate concerns about bias due to unobserved confounders or incorrect functional forms used in weighting. In the following paper, we propose two sensitivity analyses for the exclusion of important covariates: (1) a sensitivity analysis for partially observed confounders (i.e., variables measured across the survey sample, but not the target population), and (2) a sensitivity analysis for fully unobserved confounders (i.e., variables not measured in either the survey or the target population). We provide graphical and numerical summaries of the potential bias that arises from such confounders, and introduce a benchmarking approach that allows researchers to quantitatively reason about the sensitivity of their results. We demonstrate our proposed sensitivity analyses using state-level 2020 U.S. Presidential Election polls.
The table All ESI ACS Matched Weights is part of the dataset Weighting Techniques for Large Private Claims Data, available at https://redivis.com/datasets/6f7e-cxanam2b8. It contains 540187074 rows across 6 variables.
The dataset in question comprises 741 individual records, each meticulously documented with the following attributes:
Furthermore, it is noteworthy that this dataset exhibits a high degree of data integrity, with no missing values across any of the aforementioned columns. Such completeness enhances its utility for advanced data analytics and visualization, enabling rigorous exploration of relationships between age, height, weight, BMI, and associated weight classifications.
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
https://www.icpsr.umich.edu/web/ICPSR/studies/25501/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/25501/terms
The National Health and Nutrition Examination Surveys (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The NHANES combines personal interviews and physical examinations, which focus on different population groups or health topics. These surveys have been conducted by the National Center for Health Statistics (NCHS) on a periodic basis from 1971 to 1994. In 1999 the NHANES became a continuous program with a changing focus on a variety of health and nutrition measurements which were designed to meet current and emerging concerns. The surveys examine a nationally representative sample of approximately 5,000 persons each year. These persons are located in counties across the United States, 15 of which are visited each year. The 1999-2000 NHANES contains data for 9,965 individuals (and MEC examined sample size of 9,282) of all ages. Many questions that were asked in NHANES II, 1976-1980, Hispanic HANES 1982-1984, and NHANES III, 1988-1994, were combined with new questions in the NHANES 1999-2000. The 1999-2000 NHANES collected data on the prevalence of selected chronic conditions and diseases in the population and estimates for previously undiagnosed conditions, as well as those known to and reported by respondents. Risk factors, those aspects of a person's lifestyle, constitution, heredity, or environment that may increase the chances of developing a certain disease or condition, were examined. Data on smoking, alcohol consumption, sexual practices, drug use, physical fitness and activity, weight, and dietary intake were collected. Information on certain aspects of reproductive health, such as use of oral contraceptives and breastfeeding practices, were also collected. The interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests. Demographic data file variables are grouped into three broad categories: (1) Status Variables: Provide core information on the survey participant. Examples of the core variables include interview status, examination status, and sequence number. (Sequence number is a unique ID assigned to each sample person and is required to match the information on this demographic file to the rest of the NHANES 1999-2000 data). (2) Recoded Demographic Variables: The variables include age (age in months for persons through age 19 years, 11 months; age in years for 1-84 year olds, and a top-coded age group of 85+ years), gender, a race/ethnicity variable, an education variable (high school, and more than high school education), country of birth (United States, Mexico, or other foreign born), and pregnancy status variable. Some of the groupings were made due to limited sample sizes for the two-year dataset. (3) Interview and Examination Sample Weight Variables: Sample weights are available for analyzing NHANES 1999-2000 data. For a complete listing of survey contents for all years of the NHANES see the document -- Survey Content -- NHANES 1999-2010.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This zip file contains a folder with two layers, where the first corresponds to a choice of geographical resolutions, and the second discriminates among the climate variables. Each dataset, saved in csv, is organized in wide format, where the first column refers to the month (or the day), and the remaining columns, which are identified by the GADM code of the geographical units, contain the values of the weighted climate variable.
This digital data release consists of seven national data files of area- and depth-weighted averages of select soil attributes for every available county in the conterminous United States and the District of Columbia as of March 2014. The files are derived from Natural Resources Conservations Service’s (NRCS) Soil Survey Geographic database (SSURGO). The data files can be linked to the raster datasets of soil mapping unit identifiers (MUKEY) available through the NRCS’s Gridded Soil Survey Geographic (gSSURGO) database (http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/geo/?cid=nrcs142p2_053628). The associated files, named DRAINAGECLASS, HYDRATING, HYDGRP, HYDRICCONDITION, LAYER, TEXT, and WTDEP are area- and depth-weighted average values for selected soil characteristics from the SSURGO database for the conterminous United States and the District of Columbia. The SSURGO tables were acquired from the NRCS on March 5, 2014. The soil characteristics in the DRAINAGE table are drainage class (DRNCLASS), which identifies the natural drainage conditions of the soil and refers to the frequency and duration of wet periods. The soil characteristics in the HYDRATING table are hydric rating (HYDRATE), a yes/no field that indicates whether or not a map unit component is classified as a "hydric soil". The soil characteristics in the HYDGRP table are the percentages for each hydrologic group per MUKEY. The soil characteristics in the HYDRICCONDITION table are hydric condition (HYDCON), which describes the natural condition of the soil component. The soil characteristics in the LAYER table are available water capacity (AVG_AWC), bulk density (AVG_BD), saturated hydraulic conductivity (AVG_KSAT), vertical saturated hydraulic conductivity (AVG_KV), soil erodibility factor (AVG_KFACT), porosity (AVG_POR), field capacity (AVG_FC), the soil fraction passing a number 4 sieve (AVG_NO4), the soil fraction passing a number 10 sieve (AVG_NO10), the soil fraction passing a number 200 sieve (AVG_NO200), and organic matter (AVG_OM). The soil characteristics in the TEXT table are percent sand, silt, and clay (AVG_SAND, AVG_SILT, and AVG_CLAY). The soil characteristics in the WTDEP table are the annual minimum water table depth (WTDEP_MIN), available water storage in the 0-25 cm soil horizon (AWS025), the minimum water table depth for the months April, May and June (WTDEPAMJ), the available water storage in the first 25 centimeters of the soil horizon (AWS25), the dominant drainage class (DRCLSD), the wettest drainage class (DRCLSWET), and the hydric classification (HYDCLASS), which is an indication of the proportion of the map unit, expressed as a class, that is "hydric", based on the hydric classification of a given MUKEY. (See Entity_Description for more detail). The tables were created with a set of arc macro language (aml) and awk (awk was created at Bell Labsin the 1970s and its name is derived from the first letters of the last names of its authors – Alfred Aho, Peter Weinberger, and Brian Kernighan) scripts. Send an email to mewieczo@usgs.gov to obtain copies of the computer code (See Process_Description.) The methods used are outlined in NRCS's "SSURGO Data Packaging and Use" (NRCS, 2011). The tables can be related or joined to the gSSURGO rasters of MUKEYs by the item 'MUKEY.' Joining or relating the tables to a MUKEY grid allows the creation of grids of area- and depth-weighted soil characteristics. A 90-meter raster of MUKEYs is provided which can be used to produce rasters of soil attributes. More detailed resolution rasters are available through NRCS via the link above.
The table Large Employer CPS Matched Weights is part of the dataset Weighting Techniques for Large Private Claims Data, available at https://redivis.com/datasets/6f7e-cxanam2b8. It contains 572912944 rows across 8 variables.
The table CPS weights - full ESI is part of the dataset MarketScan weights, available at https://redivis.com/datasets/7vaj-f29v5z61r. It contains 14912 rows across 8 variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Importance (weight) of variables influencing grizzly bear abundance in northwestern Montana, USA, in 2004. Only candidate variables for abundance, not detection, are shown. Weights for variables that were in the model ≥50% of iterations are in bold. Data include only cells with both types of sampling. HT = Hair Trap, BR = Bear Rub. See Graves et al. (In Review) for more details on specific variables. We did not include further details to maintain focus on the influence of different detection methods.1Experts assigned a value 1–10 to ownership categories based on efforts to protect bears including 1) attractant storage management, 2) enforcement of food storage regulations, and 3) road density and use management. Glacier National Park = 10, US Forest Service = 7, other public land = 3, and private = 1.