A random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Calculation strategy for survey and population weighting of the data.
Survey weighting allows researchers to account for bias in survey samples, due to unit nonresponse or convenience sampling, using measured demographic covariates. Unfortunately, in practice, it is impossible to know whether the estimated survey weights are sufficient to alleviate concerns about bias due to unobserved confounders or incorrect functional forms used in weighting. In the following paper, we propose two sensitivity analyses for the exclusion of important covariates: (1) a sensitivity analysis for partially observed confounders (i.e., variables measured across the survey sample, but not the target population), and (2) a sensitivity analysis for fully unobserved confounders (i.e., variables not measured in either the survey or the target population). We provide graphical and numerical summaries of the potential bias that arises from such confounders, and introduce a benchmarking approach that allows researchers to quantitatively reason about the sensitivity of their results. We demonstrate our proposed sensitivity analyses using state-level 2020 U.S. Presidential Election polls.
Background
The Annual Population Survey (APS) Household datasets are produced annually and are available from 2004 (Secure Access) and 2006 (End User Licence). They allow production of family and household labour market statistics at local areas and for small sub-groups of the population across the UK. The data comprise key variables from the Labour Force Survey (LFS) (held at the UK Data Archive under GN 33246) and the APS (person) datasets (held at the Data Archive under GN 33357). The former is a quarterly survey of households living at private addresses in the UK. The latter is created by combining individuals in waves one and five from four consecutive LFS quarters with the English, Welsh and Scottish Local Labour Force Surveys (LLFS). The APS Household datasets therefore contain results from four different sources.
The APS Household datasets include all the variables on the LFS and APS person datasets except for the income variables. They also include key family and household level derived variables. These variables allow for an analysis of the combined economic activity status of the family or household. In addition they also include more detailed geographical, industry, occupation, health and age variables.
For information on the main (person) APS datasets, for which EUL and Secure Access versions are available, please see GNs 33357 and 33427, respectively.
New reweighting policy
Following the new reweighting policy ONS has reviewed the latest population estimates made available during 2019 and have decided not to carry out a 2019 LFS and APS reweighting exercise. Therefore, the next reweighting exercise will take place in 2020. These will incorporate the 2019 Sub-National Population Projection data (published in May 2020) and 2019 Mid-Year Estimates (published in June 2020). It is expected that reweighted Labour Market aggregates and microdata will be published in 2021.
Secure Access APS Household data
Secure Access datasets for the APS Household survey include additional variables not included in the EUL versions (GN 33455). Extra variables that may be found in the Secure Access version but not in the EUL version relate to:
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
The City of Bloomington contracted with National Research Center, Inc. to conduct the 2019 Bloomington Community Survey. This was the second time a scientific citywide survey had been completed covering resident opinions on service delivery satisfaction by the City of Bloomington and quality of life issues. The first was in 2017. The survey captured the responses of 610 households from a representative sample of 3,000 residents of Bloomington who were randomly selected to complete the survey. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the City of Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the national health and nutrition examination survey (nhanes) with r nhanes is this fascinating survey where doctors and dentists accompany survey interviewers in a little mobile medical center that drives around the country. while the survey folks are interviewing people, the medical professionals administer laboratory tests and conduct a real doctor's examination. the b lood work and medical exam allow researchers like you and me to answer tough questions like, "how many people have diabetes but don't know they have diabetes?" conducting the lab tests and the physical isn't cheap, so a new nhanes data set becomes available once every two years and only includes about twelve thousand respondents. since the number of respondents is so small, analysts often pool multiple years of data together. the replication scripts below give a few different examples of how multiple years of data can be pooled with r. the survey gets conducted by the centers for disease control and prevention (cdc), and generalizes to the united states non-institutional, non-active duty military population. most of the data tables produced by the cdc include only a small number of variables, so importation with the foreign package's read.xport function is pretty straightforward. but that makes merging the appropriate data sets trickier, since it might not be clear what to pull for which variables. for every analysis, start with the table with 'demo' in the name -- this file includes basic demographics, weighting, and complex sample survey design variables. since it's quick to download the files directly from the cdc's ftp site, there's no massive ftp download automation script. this new github repository co ntains five scripts: 2009-2010 interview only - download and analyze.R download, import, save the demographics and health insurance files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the interview weights run a series of pretty generic analyses on the health insurance ques tions 2009-2010 interview plus laboratory - download and analyze.R download, import, save the demographics and cholesterol files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the mobile examination component (mec) weights perform a direct-method age-adjustment and matc h figure 1 of this cdc cholesterol brief replicate 2005-2008 pooled cdc oral examination figure.R download, import, save, pool, recode, create a survey object, run some basic analyses replicate figure 3 from this cdc oral health databrief - the whole barplot replicate cdc publications.R download, import, save, pool, merge, and recode the demographics file plus cholesterol laboratory, blood pressure questionnaire, and blood pressure laboratory files match the cdc's example sas and sudaan syntax file's output for descriptive means match the cdc's example sas and sudaan synta x file's output for descriptive proportions match the cdc's example sas and sudaan syntax file's output for descriptive percentiles replicate human exposure to chemicals report.R (user-contributed) download, import, save, pool, merge, and recode the demographics file plus urinary bisphenol a (bpa) laboratory files log-transform some of the columns to calculate the geometric means and quantiles match the 2007-2008 statistics shown on pdf page 21 of the cdc's fourth edition of the report click here to view these five scripts for more detail about the national health and nutrition examination survey (nhanes), visit: the cdc's nhanes homepage the national cancer institute's page of nhanes web tutorials notes: nhanes includes interview-only weights and interview + mobile examination component (mec) weights. if you o nly use questions from the basic interview in your analysis, use the interview-only weights (the sample size is a bit larger). i haven't really figured out a use for the interview-only weights -- nhanes draws most of its power from the combination of the interview and the mobile examination component variables. if you're only using variables from the interview, see if you can use a data set with a larger sample size like the current population (cps), national health interview survey (nhis), or medical expenditure panel survey (meps) instead. confidential to sas, spss, stata, sudaan users: why are you still riding around on a donkey after we've invented the internal combustion engine? time to transition to r. :D
CPS - Large Employers table based on a weighting method found in a KFF article on preventive services
CPS - All ESI expands the same method to include ESI from any size employer
This dataset pulls together different aggregations of IPUMS survey data to be used for weighting individual entries in the MarketScan Commercial Database.
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.
Longitudinal data
The LFS retains each sample household for five consecutive quarters, with a fifth of the sample replaced each quarter. The main survey was designed to produce cross-sectional data, but the data on each individual have now been linked together to provide longitudinal information. The longitudinal data comprise two types of linked datasets, created using the weighting method to adjust for non-response bias. The two-quarter datasets link data from two consecutive waves, while the five-quarter datasets link across a whole year (for example January 2010 to March 2011 inclusive) and contain data from all five waves. A full series of longitudinal data has been produced, going back to winter 1992. Linking together records to create a longitudinal dimension can, for example, provide information on gross flows over time between different labour force categories (employed, unemployed and economically inactive). This will provide detail about people who have moved between the categories. Also, longitudinal information is useful in monitoring the effects of government policies and can be used to follow the subsequent activities and circumstances of people affected by specific policy initiatives, and to compare them with other groups in the population. There are however methodological problems which could distort the data resulting from this longitudinal linking. The ONS continues to research these issues and advises that the presentation of results should be carefully considered, and warnings should be included with outputs where necessary.
New reweighting policy
Following the new reweighting policy ONS has reviewed the latest population estimates made available during 2019 and have decided not to carry out a 2019 LFS and APS reweighting exercise. Therefore, the next reweighting exercise will take place in 2020. These will incorporate the 2019 Sub-National Population Projection data (published in May 2020) and 2019 Mid-Year Estimates (published in June 2020). It is expected that reweighted Labour Market aggregates and microdata will be published towards the end of 2020/early 2021.
LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each user guide volume alongside the appropriate questionnaire for the year concerned. However, volumes are updated periodically by ONS, so users are advised to check the latest documents on the ONS Labour Force Survey - User Guidance pages before commencing analysis. This is especially important for users of older QLFS studies, where information and guidance in the user guide documents may have changed over time.
Additional data derived from the QLFS
The Archive also holds further QLFS series: End User Licence (EUL) quarterly data; Secure Access datasets; household datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.
Variables DISEA and LNGLST
Dataset A08 (Labour market status of disabled people) which ONS suspended due to an apparent discontinuity between April to June 2017 and July to September 2017 is now available. As a result of this apparent discontinuity and the inconclusive investigations at this stage, comparisons should be made with caution between April to June 2017 and subsequent time periods. However users should note that the estimates are not seasonally adjusted, so some of the change between quarters could be due to seasonality. Further recommendations on historical comparisons of the estimates will be given in November 2018 when ONS are due to publish estimates for July to September 2018.
An article explaining the quality assurance investigations that have been conducted so far is available on the ONS Methodology webpage. For any queries about Dataset A08 please email Labour.Market@ons.gov.uk.
Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
2022 Weighting
The population totals used for the latest LFS estimates use projected growth rates from Real Time Information (RTI) data for UK, EU and non-EU populations based on 2021 patterns. The total population used for the LFS therefore does not take into account any changes in migration, birth rates, death rates, and so on since June 2021, and hence levels estimates may be under- or over-estimating the true values and should be used with caution. Estimates of rates will, however, be robust.
Latest edition information
For the third edition (February 2025), the data file was resupplied with the 2024 weighting variable included (LGWT24).
Responses from the 2021 open participation (non-probability) survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The open participation survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the age, height, and weight data extracted from the NHANES 2017-2018 survey dataset. The original data were BMX_J.xpt (see https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Examination&CycleBeginYear=2017) and DEMO_J.xpt (see https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&CycleBeginYear=2017). I used Linux Mint 20 to get the CSV files from the above XPT files. First, I installed the R foreign package by the next command. $ sudo apt install r-cran-foreign Then, I developed two R scripts to extract the CSV data. The scripts are attached to this dataset. For analysis of the CSV file, I used the following commands within the R environment.
data h =20 & data$age w =20 & data$age wt ht model summary(model) Call: lm(formula = wt ~ ht) Residuals: Min 1Q Median 3Q Max -0.29406 -0.07182 -0.00558 0.06514 0.47048 Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.46404 0.01423 102.90
The Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.
Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).
The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.
The survey is focused on three core areas of research:
Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.
If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".
Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.
Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.
The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."
The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:
The survey data will be provided under embargo in both comma-delimited and statistical formats.
Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)
Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.
Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.
Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.
Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains estimates of the base rates of 550 food safety-relevant food handling practices in European households. The data are representative for the population of private households in the ten European countries in which the SafeConsume Household Survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK).
Sampling design
In each of the ten EU and EEA countries where the survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK), the population under study was defined as the private households in the country. Sampling was based on a stratified random design, with the NUTS2 statistical regions of Europe and the education level of the target respondent as stratum variables. The target sample size was 1000 households per country, with selection probability within each country proportional to stratum size.
Fieldwork
The fieldwork was conducted between December 2018 and April 2019 in ten EU and EEA countries (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, United Kingdom). The target respondent in each household was the person with main or shared responsibility for food shopping in the household. The fieldwork was sub-contracted to a professional research provider (Dynata, formerly Research Now SSI). Complete responses were obtained from altogether 9996 households.
Weights
In addition to the SafeConsume Household Survey data, population data from Eurostat (2019) were used to calculate weights. These were calculated with NUTS2 region as the stratification variable and assigned an influence to each observation in each stratum that was proportional to how many households in the population stratum a household in the sample stratum represented. The weights were used in the estimation of all base rates included in the data set.
Transformations
All survey variables were normalised to the [0,1] range before the analysis. Responses to food frequency questions were transformed into the proportion of all meals consumed during a year where the meal contained the respective food item. Responses to questions with 11-point Juster probability scales as the response format were transformed into numerical probabilities. Responses to questions with time (hours, days, weeks) or temperature (C) as response formats were discretised using supervised binning. The thresholds best separating between the bins were chosen on the basis of five-fold cross-validated decision trees. The binned versions of these variables, and all other input variables with multiple categorical response options (either with a check-all-that-apply or forced-choice response format) were transformed into sets of binary features, with a value 1 assigned if the respective response option had been checked, 0 otherwise.
Treatment of missing values
In many cases, a missing value on a feature logically implies that the respective data point should have a value of zero. If, for example, a participant in the SafeConsume Household Survey had indicated that a particular food was not consumed in their household, the participant was not presented with any other questions related to that food, which automatically results in missing values on all features representing the responses to the skipped questions. However, zero consumption would also imply a zero probability that the respective food is consumed undercooked. In such cases, missing values were replaced with a value of 0.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the national health interview survey (nhis) with r the national health interview survey (nhis) is a household survey about health status and utilization. each annual data set can be used to examine the disease burden and access to care that individuals and families are currently experiencing across the country. check out the wikipedia article (ohh hayy i wrote that) for more detail about its current and potential uses. if you're cooking up a health-related analysis that doesn't need medical expenditures or monthly health insurance coverage, look at nhis before the medical expenditure panel survey (it's sample is twice as big). the centers for disease control and prevention (cdc) has been keeping nhis real since 1957, and the scripts below automate the download, importation, and analysis of every file back to 1963. what happened in 1997, you ask? scientists cloned dolly the sheep, clinton started his second term, and the national health interview survey underwent its most recent major questionnaire re-design. here's how all the moving parts work: a person-level file (personsx) that merges onto other files using unique household (hhx), family (fmx), and person (fpx) identifiers. [note to data historians: prior to 2004, person number was (px) and unique within each household.] this file includes the complex sample survey variables needed to construct a taylor-series linearization design, and should be used if your analysis doesn't require variables from the sample adult or sample c hild files. this survey setup generalizes to the noninstitutional, non-active duty military population. a family-level file that merges onto other files using unique household (hhx) and family (fmx) identifiers. a household-level file that merges onto other files using the unique household (hhx) identifier. a sample adult file that includes questions asked of only one adult within each household (selected at random) - a subset of the main person-level file. hhx, fmx, and fpx identifiers will merge with each of the files above, but since not every adult gets asked thes e questions, this file contains its own set of weights: wtfa_sa instead of wtfa. you can merge on whatever other variables you need from the three files above, but if your analysis requires any variables from the sample adult questionnaire, you can't use records in the person-level file that aren't also in the sample adult file (a big sample size cut). this survey setup generalizes to the noninstitutional, non-active duty military adult population. a sample child file that includes questions asked of only one child within each household (if available, and also selected at random) - another subset of the main person-level file. same deal as the sample adult description, except use wtfa_sc instead of wtfa oh yeah and this one generalizes to the child population. five imputed income files. if you want income and/or poverty variables incorporated into any part of your analysis, you'll need these puppies. the replication example below uses these, but if that's impenetrable, post in the comments describing where you get stuck. some injury stuff and other miscellanea that varies by year. if anyone uses this, please share your experience. if you use anything more than the personsx file alone, you'll need to merge some tables together. make sure you understand the difference between setting the parameter all = TRUE versus all = FALSE -- not everyone in the personsx file has a record in the samadult and sam child files. this new github repository contains four scripts: 1963-2011 - download all microdata.R loop through every year and download every file hosted on the cdc's nhis ftp site import each file into r with SAScii save each file as an r d ata file (.rda) download all the documentation into the year-specific directory 2011 personsx - analyze.R load the r data file (.rda) created by the download script (above) set up a taylor-series linearization survey design outlined on page 6 of this survey document perform a smattering of analysis examples 2011 personsx plus samadult with multiple imputation - analyze.R load the personsx and samadult r data files (.rda) created by the download script (above) merge the personsx and samadult files, highlighting how to conduct analyses that need both create tandem survey designs for both personsx-only and merg ed personsx-samadult files perform just a touch of analysis examples load and loop through the five imputed income files, tack them onto the personsx-samadult file conduct a poverty recode or two analyze the multiply-imputed survey design object, just like mom used to analyze replicate cdc tecdoc - 2000 multiple imputation.R download and import the nhis 2000 personsx and imputed income files, using SAScii and this imputed income sas importation script (no longer hosted on the cdc's nhis ftp site). loop through each of the five imputed income files, merging each to the personsx file and performing the same set of...
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This workbook provides data and data dictionaries for the SFMTA 2017 Travel Decision Survey.
On behalf of San Francisco Municipal Transportation Agency (SFMTA), Corey, Canapary & Galanis (CC&G) undertook a Mode Share Survey within the City and County of San Francisco as well as the eight surrounding Bay Area counties of Alameda, Contra Costa, San Mateo, Marin, Santa Clara, Napa, Sonoma and Solano.
The primary goals of this study were to: • Assess percent mode share for travel in San Francisco for evaluation of the SFMTA Strategic Objective 2.3: Mode Share target of 50% non-private auto travel by FY2018 with a 95% confidence level and MOE +/- 5% or less. • Evaluate the above statement based on the following parameters: number of trips to, from, and within San Francisco by Bay Area residents. Trips by visitors to the Bay Area and for commercial purposes are not included. • Provide additional trip details, including trip purpose for each trip in the mode share question series. • Collect demographic data on the population of Bay Area residents who travel to, from, and within San Francisco. • Collect data on travel behavior and opinions that support other SFMTA strategy and project evaluation needs.
The survey was conducted as a telephone study among 804 Bay Area residents aged 18 and older. Interviewing was conducted in English, Spanish, Mandarin, Cantonese, and Tagalog. Surveying was conducted via random digit dial (RDD) and cell phone sample. All survey datasets incorporate respondent weighting based on age and home location; utilize the “weight” field when appropriate in your analysis.
The survey period for this survey is as follows: 2017: February - April 2017
The margin of error is related to sample size (n). For the total sample, the margin of error is 3.4% for a confidence level of 95%. When looking at subsets of the data, such as just the SF population, just the female population, or just the population of people who bicycle, the sample size decreases and the margin of error increases. Below is a guide of the margin of error for different samples sizes. Be cautious in making conclusions based off of small sample sizes.
At the 95% confidence level is: • n = 804(Total Sample). Margin of error = +/- 3.4% • n = 400. Margin of error = +/- 4.85% • n = 100. Margin of error = +/- 9.80%
By Meghan Hoyer [source]
The Associated Press is proud to present the COVID Impact Survey, a statistical survey providing data on how the coronavirus pandemic has affected people in the United States. Conducted by NORC at the University of Chicago with sponsorship from the Data Foundation and Federal Reserve Bank of Minneapolis, this probability-based survey offers valuable insight into three core areas related to physical health, economic and financial security, and social and mental health.
Through this vital survey data, we can gain a better understanding of how individuals are dealing with symptoms related to COVID-19, their financial situation during this time period as well as changes in employment or government assistance policies, food security ization (in both nationwide & regional scope), communication with friends and family members, anxiety levels & if people are volunteering more during pandemic restrictions; furthermore gaining an overall comprehensive snapshot into what factors are impacting public perception regarding COVID-19’s effect on US citizens.
Using these insights it's possible to track metrics over time - Observing which issues Americans face everyday but also long-term effects such as mental distress or self sacrificing volunteer activities that appear due to underlying stress factors. It’s imperative that we properly weight our analysis when using this data & never report raw numbers; instead we must apply queries using statistical software such R/SPSS - thus being able to find results nationally as well as within 10 states + metropolitan areas across America whilst utilising margin of error for detecting statistically significant differences between each researched segment!
Let’s open our minds today – digging beneath surface level information so data tells us stories about humanity & our social behavior patterns during these uncertain times!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains survey data related to the impact of COVID-19 on US adult residents. The survey covers physical health, mental health, economic security, and social dynamics that have been affected by the pandemic. It is important to remember that this is survey data and must be properly weighted when analyzing it. Raw or aggregated numbers should not be used to generate insights. In order to weight the data appropriately, we recommend using statistical software such as R or SPSS or our provided queries (linked in this guide).
To generate a table relating to a specific topic covered in the survey, use the survey questionnaire and code book to match a question (the variable label) with its corresponding variable name. For instance “How often have you felt lonely in the past 7 days?” is variable “soc5c”. After entering a variable name into one of our provided queries, a sentence summarizing national results can be written out such as “People in some states are less likely to report loneliness than others… nationally 60% of people said they hadn't felt lonely”
When making comparisons for numerical statistics between different regions it is important to consider the margin of error associated with each set of surveys for national and regional figures provided within this document; it will help determine if differences between groups are statistically significant. If differences are: at least twice as large as margin of error then there is clear difference; at least as large as margin then there is slight/apparent difference; less than/equal margin no real difference can be determined
Survey results are generally posted under embargo on Tuesday evenings with data release taking place at 1 pm ET Thursdays afterward under an appropriate title including month & year ie 01_April_30_covid_impact_survey). Data will come in comma-delimited & statistical formats containing necessary inferences regarding sample collection etc outlined within this guide
When citing survey results these should always attributed with qualification— The Covid Impact Survey conducted by NORC at University Chicago for The Data Foundation sponsored by Federal Reserve Bank Minneapolis & Packard Foundation .
Lastly more resources regarding AP’s data journalism& distributions capabilities can found via link here or contact kromanoap.org
- Comparing mental health outcomes of the pandemic in different states and metropolitan areas, such as rates of anxiety or lonelines...
This workbook provides data and data dictionaries for the SFMTA 2014 Travel Decision Survey. The 2014 Key Findings, Summary Report, and Methodology, including the survey instrument, can be found online at https://www.sfmta.com/about-sfmta/reports/travel-decision-survey-2014. On behalf of San Francisco Municipal Transportation Agency (SFMTA), Corey, Canapary & Galanis (CC&G) undertook a Mode Share Survey within the City and County of San Francisco as well as the eight surrounding Bay Area counties of Alameda, Contra Costa, San Mateo, Marin, Santa Clara, Napa, Sonoma and Solano. The primary goals of this study were to: • Assess percent mode share for travel in San Francisco for evaluation of the SFMTA Strategic Objective 2.3: Mode Share target of 50% non-private auto travel by FY2018 with a 95% confidence level and MOE +/- 5% or less. • Evaluate the above statement based on the following parameters: number of trips to, from, and within San Francisco by Bay Area residents. Trips by visitors to the Bay Area and for commercial purposes are not included. • Provide additional trip details, including trip purpose for each trip in the mode share question series. • Collect demographic data on the population of Bay Area residents who travel to, from, and within San Francisco. • Collect data on travel behavior and opinions that support other SFMTA strategy and project evaluation needs. The survey was conducted as a telephone study among with approximately 750 Bay Area residents aged 18 and older. Interviewing was conducted in English, Spanish, and Cantonese. Surveying was conducted via random digit dial (RDD) and cell phone sample. All three survey datasets incorporate respondent weighting based on age and home _location; utilize the “weight” field when appropriate in your analysis. The survey period for this survey is as follows: 2014: October – November 2014 A few questions in TDS 2014 were added after the survey began. In the report, responses that did not answer those questions were excluded from the analysis. The questions that were added late are noted in the TDS 2014 methodology survey instrument. The margin of error is related to sample size (n). For the total sample, the margin of error is 3.5% for a confidence level of 95%. When looking at subsets of the data, such as just the SF population, just the female population, or just the population of people who bicycle, the sample size decreases and the margin of error increases. Below is a guide of the margin of error for different samples sizes. Be cautious in making conclusions based off of small sample sizes. At the 95% confidence level is: • n = 767 (Total Sample). Margin of error = +/- 3.5% • n = 384. Margin of error = +/- 4.95% • n = 100. Margin of error = +/- 9.80%
This workbook provides data and data dictionaries for the SFMTA 2012 Travel Decision Survey. The 2012 Summary Report, with methodology and survey instrument, is located at https://www.sfmta.com/about-sfmta/reports/travel-decision-survey-2012. Data, methodologies, and summary report for other SFMTA travel decision surveys are available on sfmta.com. On behalf of San Francisco Municipal Transportation Agency (SFMTA), Corey, Canapary & Galanis (CC&G) undertook a Mode Share Survey within the City and County of San Francisco as well as the eight surrounding Bay Area counties of Alameda, Contra Costa, San Mateo, Marin, Santa Clara, Napa, Sonoma and Solano. The primary goals of this study were to: • Assess percent mode share for travel in San Francisco for evaluation of the SFMTA Strategic Objective 2.3: Mode Share target of 50% non-private auto travel by FY2018 with a 95% confidence level and MOE +/- 5% or less. • Evaluate the above statement based on the following parameters: number of trips to, from, and within San Francisco by Bay Area residents. Trips by visitors to the Bay Area and for commercial purposes are not included. • Provide additional trip details, including trip purpose for each trip in the mode share question series. • Collect demographic data on the population of Bay Area residents who travel to, from, and within San Francisco. • Collect data on travel behavior and opinions that support other SFMTA strategy and project evaluation needs. The survey was conducted as a telephone study among with approximately 750 Bay Area residents aged 18 and older. Interviewing was conducted in English, Spanish, and Cantonese. Surveying was conducted via random digit dial (RDD) and cell phone sample. This dataset incorporates respondent weighting based on age and home location; utilize the “weight” field when appropriate in your analysis. The survey period for this survey is as follows: 2012: October 2012 – January 2013 The margin of error is related to sample size (n). For the total sample, the margin of error is 3.5% for a confidence level of 95%. When looking at subsets of the data, such as just the SF population, just the female population, or just the population of people who bicycle, the sample size decreases and the margin of error increases. Below is a guide of the margin of error for different samples sizes. Be cautious in making conclusions based off of small sample sizes. At the 95% confidence level is: • n = 767 (Total Sample). Margin of error = +/- 3.5% • n = 384. Margin of error = +/- 4.95% • n = 100. Margin of error = +/- 9.80%
This workbook provides data and data dictionaries for the SFMTA 2015 Travel Decision SurveySFMTA Travel Decision Survey Data for 2015 On behalf of San Francisco Municipal Transportation Agency (SFMTA), Corey, Canapary & Galanis (CC&G) undertook a Mode Share Survey within the City and County of San Francisco as well as the eight surrounding Bay Area counties of Alameda, Contra Costa, San Mateo, Marin, Santa Clara, Napa, Sonoma and Solano. The primary goals of this study were to: • Assess percent mode share for travel in San Francisco for evaluation of the SFMTA Strategic Objective 2.3: Mode Share target of 50% non-private auto travel by FY2018 with a 95% confidence level and MOE +/- 5% or less. • Evaluate the above statement based on the following parameters: number of trips to, from, and within San Francisco by Bay Area residents. Trips by visitors to the Bay Area and for commercial purposes are not included. • Provide additional trip details, including trip purpose for each trip in the mode share question series. • Collect demographic data on the population of Bay Area residents who travel to, from, and within San Francisco. • Collect data on travel behavior and opinions that support other SFMTA strategy and project evaluation needs. The survey was conducted as a telephone study among 762 Bay Area residents aged 18 and older. Interviewing was conducted in English, Spanish, and Cantonese. Surveying was conducted via random digit dial (RDD) and cell phone sample. All three survey datasets incorporate respondent weighting based on age and home location; utilize the “weight” field when appropriate in your analysis. The survey period for this survey is as follows: 2015: August – October 2015 The margin of error is related to sample size (n). For the total sample, the margin of error is 3.5% for a confidence level of 95%. When looking at subsets of the data, such as just the SF population, just the female population, or just the population of people who bicycle, the sample size decreases and the margin of error increases. Below is a guide of the margin of error for different samples sizes. Be cautious in making conclusions based off of small sample sizes. At the 95% confidence level is: • n = 762(Total Sample). Margin of error = +/- 3.5% • n = 382. Margin of error = +/- 4.95% • n = 100. Margin of error = +/- 9.80%
A random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.