Facebook
TwitterThis package contains two files designed to help read individual level DHS data into Stata. The first file addresses the problem that versions of Stata before Version 7/SE will read in only up to 2047 variables and most of the individual files have more variables than that. The file will read in the .do, .dct and .dat file and output new .do and .dct files with only a subset of the variables specified by the user. The second file deals with earlier DHS surveys in which .do and .dct file do not exist and only .sps and .sas files are provided. The file will read in the .sas and .sps files and output a .dct and .do file. If necessary the first file can then be run again to select a subset of variables.
Facebook
TwitterThis workshop takes you on a quick tour of Stata, SPSS, and SAS. It examines a data file using each package. Is one more user friendly than the others? Are there significant differences in the codebooks created? This workshop also looks at creating a frequency and cross-tabulation table in each. Which output screen is easiest to read and interpret? The goal of this workshop is to give you an overview of these products and provide you with the information you need to determine whick package fits the requirements of you and your user.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The sample SAS and Stata code provided here is intended for use with certain datasets in the National Neighborhood Data Archive (NaNDA). NaNDA (https://www.openicpsr.org/openicpsr/nanda) contains some datasets that measure neighborhood context at the ZIP Code Tabulation Area (ZCTA) level. They are intended for use with survey or other individual-level data containing ZIP codes. Because ZIP codes do not exactly match ZIP code tabulation areas, a crosswalk is required to use ZIP-code-level geocoded datasets with ZCTA-level datasets from NaNDA. A ZIP-code-to-ZCTA crosswalk was previously available on the UDS Mapper website, which is no longer active. An archived copy of the ZIP-code-to-ZCTA crosswalk file has been included here. Sample SAS and Stata code are provided for merging the UDS mapper crosswalk with NaNDA datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Integrated Postsecondary Education Data System (IPEDS) Complete Data Files from 1980 to 2023. Includes data file, STATA data file, SPSS program, SAS program, STATA program, and dictionary. All years compressed into one .zip file due to storage limitations.Updated on 2/14/2025 to add Microsoft Access Database files.From IPEDS Complete Data File Help Page (https://nces.ed.gov/Ipeds/help/complete-data-files):Choose the file to download by reading the description in the available titles. Then, click on the link in that row corresponding to the column header of the type of file/information desired to download.To download and view the survey files in basic CSV format use the main download link in the Data File column.For files compatible with the Stata statistical software package, use the alternate download link in the Stata Data File column.To download files with the SPSS, SAS, or STATA (.do) file extension for use with statistical software packages, use the download link in the Programs column.To download the data Dictionary for the selected file, click on the corresponding link in the far right column of the screen. The data dictionary serves as a reference for using and interpreting the data within a particular survey file. This includes the names, definitions, and formatting conventions for each table, field, and data element within the file, important business rules, and information on any relationships to other IPEDS data.For statistical read programs to work properly, both the data file and the corresponding read program file must be downloaded to the same subdirectory on the computer’s hard drive. Download the data file first; then click on the corresponding link in the Programs column to download the desired read program file to the same subdirectory.When viewing downloaded survey files, categorical variables are identified using codes instead of labels. Labels for these variables are available in both the data read program files and data dictionary for each file; however, for files that automatically incorporate this information you will need to select the Custom Data Files option.
Facebook
Twitterhttps://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Discover the booming Statistical Analysis Software market! Our in-depth analysis reveals a $55.86B market (2025) projected to reach over $65B by 2033, driven by data analytics adoption and AI integration. Explore market trends, key players (like SAS, IBM, & MathWorks), and future growth projections.
Facebook
TwitterDatabase of the nation''s substance abuse and mental health research data providing public use data files, file documentation, and access to restricted-use data files to support a better understanding of this critical area of public health. The goal is to increase the use of the data to most accurately understand and assess substance abuse and mental health problems and the impact of related treatment systems. The data include the U.S. general and special populations, annual series, and designs that produce nationally representative estimates. Some of the data acquired and archived have never before been publicly distributed. Each collection includes survey instruments (when provided), a bibliography of related literature, and related Web site links. All data may be downloaded free of charge in SPSS, SAS, STATA, and ASCII formats and most studies are available for use with the online data analysis system. This system allows users to conduct analyses ranging from cross-tabulation to regression without downloading data or relying on other software. Another feature, Quick Tables, provides the ability to select variables from drop down menus to produce cross-tabulations and graphs that may be customized and cut and pasted into documents. Documentation files, such as codebooks and questionnaires, can be downloaded and viewed online.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Pseudo database of the commercial union hedge fund database; SAS and STATA files of codes
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data repository houses SAS and STATA files that correspond with studies evaluating Middle Eastern and North African (MENA) health using linked 2000-2017 NHIS and 2001-2018 MEPS data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SAS syntax replicates the analyses presented in the MAIHDA tutorial by Evans et al. 2024. The dataset and R and Stata codes for the tutorial are available at: https://www.sciencedirect.com/science/article/pii/S235282732400065X?via=ihub and https://osf.io/dtvc3/
Facebook
TwitterThe Uniform Appraisal Dataset (UAD) Appraisal-Level Public Use File (PUF) is the nation’s first publicly available appraisal-level dataset of appraisal records, giving the public new access to a selected set of data fields found in appraisal reports. The UAD Appraisal-Level PUF is based on a five percent nationally representative random sample of appraisals for single-family mortgages acquired by the Enterprises. The current release includes appraisals from 2013 through 2021. The UAD Appraisal-Level PUF is a resource for users capable of using statistical software to extract and analyze data. Users can download annual or combined files in CSV, R, SAS and Stata formats. All files are zipped for ease with download.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SAS and Stata codes and datasets for the paper "Gradual information diffusion across commonly owned firms."
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de455266https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de455266
Abstract (en): This round of Eurobarometer surveys queried respondents on standard Eurobarometer measures, such as whether they attempted to persuade others close to them to share their views on subjects they held strong opinions about, whether they discussed political matters, and what the goals of the European Union (EU) should be. Additional questions focused on the respondents' knowledge of and opinions on the EU, including how well-informed they felt about the EU, what sources of information about the EU they used, and whether their country had benefited from being an EU member. Another major focus of the surveys was elderly people and domestic violence. Respondents were asked whether retired people should be permitted to take paid employment and whether the government should introduce laws to try to stop age discrimination. Respondents were also queried as to whether they had extra family responsibilities involving looking after someone with a long-term illness or someone who was handicapped or elderly, and who respondents thought was in the best position to decide on the most appropriate services for elderly people needing long-term aid. The survey also explored violence against children and young people under age 18 as well as against women. Those queried were asked if they had heard of violence against women and children and what they believed constituted domestic violence against women and children. Given a situation in which a woman or child was a victim of violence, respondents were asked who might be the most likely perpetrator and what might be a general cause of violence against women and children. Respondents also commented on whether certain institutions and organizations should help victimized women and children, and ways that violence against women and children can be combatted. Demographic and other background information provided includes the respondent's age, gender, marital status, and left-right political self-placement, as well as household income, number of people residing in the home, occupation, religion, and region of residence. Citizens of the EU aged 15 and over residing in the 15 EU member countries: Austria, Belgium, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, the Netherlands, Portugal, Spain, Sweden, and the United Kingdom. Multistage national probability samples. 2007-01-26 The data for this study have undergone further processing completed by the Zentralarchiv (ZA). This study has been updated to include the full ICPSR product suite including SAS, SPSS, and Stata setup files in addition to SAS transport (XPORT), SPSS portable, and Stata system files.2005-11-04 On 2005-03-14 new files were added to one or more datasets. These files included additional setup files as well as one or more of the following: SAS program, SAS transport, SPSS portable, and Stata system files. The metadata record was revised 2005-11-04 to reflect these additions.2001-06-29 Data for all previously-embargoed variables are now available. Updated data for variables D8 and D11 have also been included. face-to-face interview(1) The files included with this collection derive from the data producer, INRA (International Research Associates) (Europe), and have been further processed by the Zentralarchiv (ZA). (2) Starting with Eurobarometer 34 and up to survey 61, NUTS 1 level data (REGION II) for the NETHERLANDS are not (re-)coded in accordance with the official EUROSTAT nomenclature of territorial unit statistics. The NUTS 2 level province ZEELAND should be coded as belonging to NUTS 1 region (landsdel) WEST instead of SOUTH Netherlands. (ZA editions will be corrected from EB 53 onwards, November 11, 2005) (3) The SPSS, SAS, and Stata setup files for this collection contain characters with diacritical marks used in many European languages. (4) D8/V454-V455: For 31 respondents the indicated age "WHEN STOPPED FULL-TIME EDUCATION" was too high for their actual age (D11/V457). These cases were recoded to '0' (QA) in V454 and V455. Fifteen missing cases which are coded '2' in D15A/V462 have been recoded to '98' in V454 and '10' in V455 (STILL STUDYING). (5) D29 INCOME HH QUARTILES: Please notice that the income quartiles are produced for comparison purposes and are retained as provided by the principal investigator. They are based on the country specific categorized income question.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This directory contains analytic code used to build cohorts, dependent variables, and covariates, and run all statistical analyses for the study, "Changes in care associated with integrating Medicare and Medicaid for dual eligible individuals: Examination of a Fully Integrated Special Needs Plan."The code files enclosed in this directory are:SAS_Cohorts_Outcomes 23-9-30.sas. This SAS code file builds study cohorts, dependent variables, and covariates. This code produced a person-by-month level database of outcomes and covariates for individuals in the integration and comparison cohorts.STATA_Models_23-6-5_weight_jama.do. This Stata program reads in the person-by-month level database (output from SAS) and conducts all statistical analyses used to produce the main and supplementary analyses reported in the manuscript.We have provided this code and documentation to disclose our study methods. Our Data Use Agreements prohibit publishing of row-level data for this study. Therefore, researchers would need to obtain Data Use Agreements with data providers to implement these analyses. We also note that some measures reference macros with proprietary code (e.g., Medispan® files) which require a separate user license to run. Interested readers should contact the study PI, Eric T. Roberts (eric.roberts@pennmedicine.upenn.edu) for further information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here, you will find resources to use the Bynum-Standard 1-Year Algorithm including a README file that accompanies SAS and Stata scripts for the 1-Year Standard Method for identifying Alzheimer’s Disease and Related Dementias (ADRD) in Medicare Claims data. There are seven script files (plus a parameters file for SAS [parm.sas]) for both SAS and Stata. The files are numbered in the order in which they should be run; the five “1” files may be run in any order.The full algorithm requires access to a single year of Medicare Claims data for (1) MedPAR, (2) Home Health Agency (HHA) Claims File, (3) Hospice Claims File, (4) Carrier Claims and Line Files, and (5) Hospital Outpatient File (HOF) Claims and Revenue Files. All Medicare Claims files are expected to be in SAS format (.sas7bdat).For each data source, the script will output three files*:Diagnosis-level file: Lists individual ADRD diagnoses for each beneficiary for a given visit. This file allows researchers to identify which ICD-9-CM or ICD-10-CM codes are used in the claims data.Service Date-level file: Aggregated from the Diagnosis-level file, this file includes all beneficiaries with an ADRD diagnosis by Service Date (date of a claim with at least one ADRD diagnosis).Beneficiary-level file: Aggregated from the Service Date-level file, this file includes all beneficiaries with at least one* ADRD diagnosis at any point in the year within a specific file* The algorithm combines the Carrier and HOF files at the Service Date-level. The final combined Carrier and HOF Beneficiary-level file includes those with at least two (2) claims that are seven (7) or more days apart.A final combined file is created by merging all Beneficiary-level files. This file is used to identify beneficiaries with ADRD and can be merged onto other files by the Beneficiary ID (BENE_ID).With appreciation & acknowledgement to colleagues from a grant funded by the NIA for their involvement in development & validation of the Bynum-Standard 1-Year Algorithm
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de454978https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de454978
Abstract (en): This special topic poll, conducted February 20-24, 1997, solicited responses from parents and their teenage children, aged 12-17, on the topic of illegal drug use among America's youth. One parent and one child from each household were asked a series of questions covering illegal drugs, violence in school, underage drinking, academic challenges, and parent-child communication. Respondents were asked to assess their understanding of the presence of drugs and drug users in their local schools, throughout the community, across the nation, among the teen's peer group, and within their own family. A series of topics covered the availability and effectiveness of school-sponsored anti-drug programs. Parents were asked how their possible past and present use and/or experimentation with marijuana and other illegal drugs, alcohol, and tobacco products influenced the manner in which they approached drug use with their own children. Teenage respondents were asked for their reaction to the use of drugs and alcohol by their friends, the seriousness of the contemporary drug problem, and whether they believed that their parents had used or experimented with illegal drugs. Other questions asked about teenage respondents' plans after high school and whether they attended a public or private school. Demographic variables for parental respondents included age, race, sex, education level, household income, political party affiliation, and type of residential area (e.g., urban or rural). Demographic variables for teenage respondents included age, race, sex, residential area, and grade level in school. The data contain a weight variable (WEIGHT) that should be used in analyzing the data. This poll consists of "standard" national representative samples of the adult population with sample balancing of sex, race, age, and education. The weight variable contains two implied decimal places, and applies only to the parental respondents. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created online analysis version with question text.. Persons aged 18 and over living in households with telephones in the contiguous 48 United States. Households were selected by random-digit dialing. Within households, the respondent selected was the adult living in the household who last had a birthday and who was home at the time of the interview. 2007-02-27 SAS, SPSS, and Stata setup files, and SAS and Stata supplemental files have been added to this data collection. Respondent names were removed from the data file and the CASEID variable was created for use with online analysis.2006-11-10 SAS, SPSS, and Stata setup files have been added to this data collection. telephone interview (1) The data available for download are not weighted and users will need to weight the data prior to analysis. (2) Original reports using these data may be found via the ABC News Web site. (3) According to the data collection instrument, code 3 in the variable P_EDUC also included respondents who answered that they had attended a technical college. (4) The CASEID variable was created for use with online analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These BRFSS datasets were downloaded prior to them being taken offline on January 31st, 2025. Special thanks to James Bailey & Doug Livingston who made earlier years of BRFSS data available!
Data 2000-2023 are provided in SAS, Stata, and R formats. Data for 1987-1999 are provided in CSV format.
This repository has a DOI assigned if you need to cite it.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
Facebook
TwitterThis package contains two files designed to help read individual level DHS data into Stata. The first file addresses the problem that versions of Stata before Version 7/SE will read in only up to 2047 variables and most of the individual files have more variables than that. The file will read in the .do, .dct and .dat file and output new .do and .dct files with only a subset of the variables specified by the user. The second file deals with earlier DHS surveys in which .do and .dct file do not exist and only .sps and .sas files are provided. The file will read in the .sas and .sps files and output a .dct and .do file. If necessary the first file can then be run again to select a subset of variables.