analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456864https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456864
Abstract (en): The purpose of this data collection is to provide an official public record of the business of the federal courts. The data originate from 94 district and 12 appellate court offices throughout the United States. Information was obtained at two points in the life of a case: filing and termination. The termination data contain information on both filing and terminations, while the pending data contain only filing information. For the appellate and civil data, the unit of analysis is a single case. The unit of analysis for the criminal data is a single defendant. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. All federal court cases, 1970-2000. 2012-05-22 All parts are being moved to restricted access and will be available only using the restricted access procedures.2005-04-29 The codebook files in Parts 57, 94, and 95 have undergone minor edits and been incorporated with their respective datasets. The SAS files in Parts 90, 91, 227, and 229-231 have undergone minor edits and been incorporated with their respective datasets. The SPSS files in Parts 92, 93, 226, and 228 have undergone minor edits and been incorporated with their respective datasets. Parts 15-28, 34-56, 61-66, 70-75, 82-89, 96-105, 107, 108, and 115-121 have had identifying information removed from the public use file and restricted data files that still include that information have been created. These parts have had their SPSS, SAS, and PDF codebook files updated to reflect the change. The data, SPSS, and SAS files for Parts 34-37 have been updated from OSIRIS to LRECL format. The codebook files for Parts 109-113 have been updated. The case counts for Parts 61-66 and 71-75 have been corrected in the study description. The LRECL for Parts 82, 100-102, and 105 have been corrected in the study description.2003-04-03 A codebook was created for Part 105, Civil Pending, 1997. Parts 232-233, SAS and SPSS setup files for Civil Data, 1996-1997, were removed from the collection since the civil data files for those years have corresponding SAS and SPSS setup files.2002-04-25 Criminal data files for Parts 109-113 have all been replaced with updated files. The updated files contain Criminal Terminations and Criminal Pending data in one file for the years 1996-2000. Part 114, originally Criminal Pending 2000, has been removed from the study and the 2000 pending data are now included in Part 113.2001-08-13 The following data files were revised to include plaintiff and defendant information: Appellate Terminations, 2000 (Part 107), Appellate Pending, 2000 (Part 108), Civil Terminations, 1996-2000 (Parts 103, 104, 115-117), and Civil Pending, 2000 (Part 118). The corresponding SAS and SPSS setup files and PDF codebooks have also been edited.2001-04-12 Criminal Terminations (Parts 109-113) data for 1996-2000 and Criminal Pending (Part 114) data for 2000 have been added to the data collection, along with corresponding SAS and SPSS setup files and PDF codebooks.2001-03-26 Appellate Terminations (Part 107) and Appellate Pending (Part 108) data for 2000 have been added to the data collection, along with corresponding SAS and SPSS setup files and PDF codebooks.1997-07-16 The data for 18 of the Criminal Data files were matched to the wrong part numbers and names, and now have been corrected. Funding insitution(s): United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. (1) Several, but not all, of these record counts include a final blank record. Researchers may want to detect this occurrence and eliminate this record before analysis. (2) In July 1984, a major change in the recording and disposition of an appeal occurred, and several data fields dealing with disposition were restructured or replaced. The new structure more clearly delineates mutually exclusive dispositions. Researchers must exercise care in using these fields for comparisons. (3) In 1992, the Administrative Office of the United States Courts changed the reporting period for statistical data. Up to 1992, the reporting period...
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov. The Low- to Moderate-Income (LMI) New York State (NYS) Census Population Analysis dataset is resultant from the LMI market database designed by APPRISE as part of the NYSERDA LMI Market Characterization Study (https://www.nyserda.ny.gov/lmi-tool). All data are derived from the U.S. Census Bureau’s American Community Survey (ACS) 1-year Public Use Microdata Sample (PUMS) files for 2013, 2014, and 2015. Each row in the LMI dataset is an individual record for a household that responded to the survey and each column is a variable of interest for analyzing the low- to moderate-income population. The LMI dataset includes: county/county group, households with elderly, households with children, economic development region, income groups, percent of poverty level, low- to moderate-income groups, household type, non-elderly disabled indicator, race/ethnicity, linguistic isolation, housing unit type, owner-renter status, main heating fuel type, home energy payment method, housing vintage, LMI study region, LMI population segment, mortgage indicator, time in home, head of household education level, head of household age, and household weight. The LMI NYS Census Population Analysis dataset is intended for users who want to explore the underlying data that supports the LMI Analysis Tool. The majority of those interested in LMI statistics and generating custom charts should use the interactive LMI Analysis Tool at https://www.nyserda.ny.gov/lmi-tool. This underlying LMI dataset is intended for users with experience working with survey data files and producing weighted survey estimates using statistical software packages (such as SAS, SPSS, or Stata).
Database of the nation''s substance abuse and mental health research data providing public use data files, file documentation, and access to restricted-use data files to support a better understanding of this critical area of public health. The goal is to increase the use of the data to most accurately understand and assess substance abuse and mental health problems and the impact of related treatment systems. The data include the U.S. general and special populations, annual series, and designs that produce nationally representative estimates. Some of the data acquired and archived have never before been publicly distributed. Each collection includes survey instruments (when provided), a bibliography of related literature, and related Web site links. All data may be downloaded free of charge in SPSS, SAS, STATA, and ASCII formats and most studies are available for use with the online data analysis system. This system allows users to conduct analyses ranging from cross-tabulation to regression without downloading data or relying on other software. Another feature, Quick Tables, provides the ability to select variables from drop down menus to produce cross-tabulations and graphs that may be customized and cut and pasted into documents. Documentation files, such as codebooks and questionnaires, can be downloaded and viewed online.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users are able to access data related discharge information on all emergency department visits. Data is focused on but not limited to emergency room diagnoses, procedures, demographics, and payment source. Background The State Emergency Department Databases (SEDD) is focused on capturing discharge information on all emergency department visits that do not result in an admission, (Information on patients initially seen in the emergency room and then admitted to the hospital is included in the State Inpatient Databases (SID)). The SEDD contains emergency department information from 27 states. The SEDD contain more than 100 clinical and non-clinical variables included in a hospital dis charge abstract, such as: diagnoses, procedures, patient demographics, expected payment source and total charges. User functionality Users must pay to access the SEDD database. SEDD files from 1999-2009 are available through the HCUP Central Distributor. The SEDD data set can be run on desktop computers with a CD-ROM reader, and comes in ASCII format. The data on the CD set require a statistical software package such as SAS or SPSS to use for analytic purposes. The data set comes with full documentation. SAS and SPSS users are provided programs for converting ASCII files. Data Notes Data is available from 1999-2009. The website does not indicate when new data will be updated. Twenty-seven States now currently participate in the SEDD including Arizona, California, Connecticut, Florida, Georgia, Hawaii, Indiana, Iowa, Kansas, Maine, Maryland, Massachusetts, Minnesota, Missouri, Nebraska, New Hampshire, New Jersey, New York, North Carolina, Ohio, Rhode Island, South Carolina, South Dakota, Tennessee, Utah, Vermont, and Wisconsin.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de444855https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de444855
Abstract (en): This data collection provides annual data on prisoners under a sentence of death and on those whose offense sentences were commuted or vacated. Information is available on basic sociodemographic characteristics such as age, sex, race and ethnicity, marital status at time of imprisonment, level of education, and state of incarceration. Criminal history data include prior felony convictions for criminal homicide and legal status at the time of the capital offense. Additional information is provided on those inmates removed from death row by yearend 1988 and those inmates who were executed. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Standardized missing values.; Checked for undocumented or out-of-range codes.. Inmates in state prisons under the sentence of death. 2008-11-12 Minor changes have been made to the metadata.2008-10-30 All parts have been moved to restricted access and are available only using the restricted access procedures.2006-01-12 All files were removed from dataset 3 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 3 and flagged as study-level files, so that they will accompany all downloads.2005-11-04 On 2005-03-14 new files were added to one or more datasets. These files included additional setup files as well as one or more of the following: SAS program, SAS transport, SPSS portable, and Stata system files. The metadata record was revised 2005-11-04 to reflect these additions.1997-05-30 SAS data definition statements are now available for this collection, and the SPSS data definition statements were updated. Funding insitution(s): United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. (1) Information collected prior to 1972 is in many cases incomplete and reflects vestiges in the reporting process. (2) The inmate identification numbers were assigned by the Bureau of Census and have no purpose outside this dataset.
https://timssandpirls.bc.edu/Copyright/index.htmlhttps://timssandpirls.bc.edu/Copyright/index.html
For the TIMSS 2015 fourth grade assessment, the database includes student mathematics and science achievement data as well as the student, parent, teacher, school, and curricular background data for the 47 participating countries and 6 benchmarking entities. For the TIMSS 2015 eighth grade assessment, the database includes student mathematics and science achievement data as well as the student, teacher, school, and curricular background data for the 39 participating countries and 6 benchmarking entities. The TIMSS 2015 International Database also includes data from the TIMSS Numeracy 2015 assessment, with the participation of 7 countries and 1 benchmarking entity. The student, parent, teacher, and school data files are in SAS and SPSS formats. The entire database and its supporting documents are described in the TIMSS 2015 User Guide (Foy, 2017) and its three supplements. The data can be analyzed using the downloadable IEA IDB Analyzer (version 4.0), an application developed by the IEA Data Processing and Research Center to facilitate the analysis of the TIMSS data. A restricted use version of the TIMSS 2015 International Database is available to users who require access to variables removed from the public use version (see Chapter 4 of the User Guide). Users who require access to the restricted use version of the International Database to conduct their analyses should contact the IEA through its Study Data Repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
!!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!Version 8 release notes:Adds 2019 dataVersion 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
A number of education data sets are available for use by policymakers, educators, the public, program directors and researchers through the Virginia Longitudinal Data System. For a complete list of all the table descriptions and data elements, refer to the data dictionary https://www.doe.virginia.gov/about-vdoe/search?q=data%20dictionary
These datasets are intended to be used in applications that have filtering and query building capabilities such as spreadsheet applications ( MS Excel or Numbers), analytical applications (SPSS or SAS), or development-type applications. The datasets are compiled using all the possible combinations of all the demographics about students so each row within the dataset contains a rate or count in addition to the demographics used to arrive at the rate or count.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de443631https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de443631
Abstract (en): This study is part of a time-series collection of national surveys fielded continuously since 1952. The election studies are designed to present data on Americans' social backgrounds, enduring political predispositions, social and political values, perceptions and evaluations of groups and candidates, opinions on questions of public policy, and participation in political life. In addition to core items, new content includes questions on values, political knowledge, and attitudes on racial policy, as well as more general attitudes conceptualized as antecedent to these opinions on racial issues. The Main Data File also contains vote validation data that were expanded to include information from the appropriate election office and were attached to the records of each of the respondents in the post-election survey. The expanded data consist of the respondent's post case ID, vote validation ID, and two variables to clarify the distinction between the office of registration and the office associated with the respondent's sample address. The second data file, Bias Nonresponse Data File, contains respondent-level field administration variables. Of 3,833 lines of sample that were originally issued for the 1990 Study, 2,176 resulted in completed interviews, others were nonsample, and others were noninterviews for a variety of reasons. For each line of sample, the Bias Nonresponse Data File includes sampling data, result codes, control variables, and interviewer variables. Detailed geocode data are blanked but available under conditions of confidential access (contact the American National Election Studies at the Center for Political Studies, University of Michigan, for further details). This is a specialized file, of particular interest to those who are interested in survey nonresponse. Demographic variables include age, party affiliation, marital status, education, employment status, occupation, religious preference, and ethnicity. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. Response Rates: The response rate for this study is 67.7 percent. The study was in the field until January 31, although 67 percent of the interviews were taken by November 25, 80 percent by December 7, and 93 percent by December 31. All United States households in the 50 states. National multistage area probability sample. 2015-11-10 The study metadata was updated.2009-01-09 YYYY-MM-DD Part 1, the Main Data File, incorporates errata that were posted separately under the Fourth ICPSR Edition. Part 2, the Bias Nonresponse Data File, has been added to the data collection, along with corresponding SAS, SPSS, and Stata setup files and documentation. The codebook has been updated by adding a technical memorandum on the sampling design of the study previously missing from the codebook. The nonresponse file contains respondent-level field administration variables for those interested in survey nonresponse. The collection now includes files in ASCII, SPSS portable, SAS transport (CPORT), and Stata system formats.2000-02-21 The data for this study are now available in SAS transport and SPSS export formats in addition to the ASCII data file. Variables in the dataset have been renumbered to the following format: 2-digit (or 2-character) year prefix + 4 digits + [optional] 1-character suffix. Dataset ID and version variables have also been added. Additionally, the Voter Validation Office Administration Interview File (Expanded Version) has been merged with the main data file, and the codebook and SPSS setup files have been replaced. Also, SAS setup files have been added to the collection, and the data collection instrument is now provided as a PDF file. Two files are no longer being released with this collection: the Voter Validation Office Administration Interview File (Unexpanded Version) and the Results of First Contact With Respondent file. Funding insitution(s): National Science Foundation (SOC77-08885 and SES-8341310). face-to-face interviewThere was significantly more content in this post-election survey than ...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the national health and nutrition examination survey (nhanes) with r nhanes is this fascinating survey where doctors and dentists accompany survey interviewers in a little mobile medical center that drives around the country. while the survey folks are interviewing people, the medical professionals administer laboratory tests and conduct a real doctor's examination. the b lood work and medical exam allow researchers like you and me to answer tough questions like, "how many people have diabetes but don't know they have diabetes?" conducting the lab tests and the physical isn't cheap, so a new nhanes data set becomes available once every two years and only includes about twelve thousand respondents. since the number of respondents is so small, analysts often pool multiple years of data together. the replication scripts below give a few different examples of how multiple years of data can be pooled with r. the survey gets conducted by the centers for disease control and prevention (cdc), and generalizes to the united states non-institutional, non-active duty military population. most of the data tables produced by the cdc include only a small number of variables, so importation with the foreign package's read.xport function is pretty straightforward. but that makes merging the appropriate data sets trickier, since it might not be clear what to pull for which variables. for every analysis, start with the table with 'demo' in the name -- this file includes basic demographics, weighting, and complex sample survey design variables. since it's quick to download the files directly from the cdc's ftp site, there's no massive ftp download automation script. this new github repository co ntains five scripts: 2009-2010 interview only - download and analyze.R download, import, save the demographics and health insurance files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the interview weights run a series of pretty generic analyses on the health insurance ques tions 2009-2010 interview plus laboratory - download and analyze.R download, import, save the demographics and cholesterol files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the mobile examination component (mec) weights perform a direct-method age-adjustment and matc h figure 1 of this cdc cholesterol brief replicate 2005-2008 pooled cdc oral examination figure.R download, import, save, pool, recode, create a survey object, run some basic analyses replicate figure 3 from this cdc oral health databrief - the whole barplot replicate cdc publications.R download, import, save, pool, merge, and recode the demographics file plus cholesterol laboratory, blood pressure questionnaire, and blood pressure laboratory files match the cdc's example sas and sudaan syntax file's output for descriptive means match the cdc's example sas and sudaan synta x file's output for descriptive proportions match the cdc's example sas and sudaan syntax file's output for descriptive percentiles replicate human exposure to chemicals report.R (user-contributed) download, import, save, pool, merge, and recode the demographics file plus urinary bisphenol a (bpa) laboratory files log-transform some of the columns to calculate the geometric means and quantiles match the 2007-2008 statistics shown on pdf page 21 of the cdc's fourth edition of the report click here to view these five scripts for more detail about the national health and nutrition examination survey (nhanes), visit: the cdc's nhanes homepage the national cancer institute's page of nhanes web tutorials notes: nhanes includes interview-only weights and interview + mobile examination component (mec) weights. if you o nly use questions from the basic interview in your analysis, use the interview-only weights (the sample size is a bit larger). i haven't really figured out a use for the interview-only weights -- nhanes draws most of its power from the combination of the interview and the mobile examination component variables. if you're only using variables from the interview, see if you can use a data set with a larger sample size like the current population (cps), national health interview survey (nhis), or medical expenditure panel survey (meps) instead. confidential to sas, spss, stata, sudaan users: why are you still riding around on a donkey after we've invented the internal combustion engine? time to transition to r. :D
The National Sample Survey of Registered Nurses (NSSRN) Download makes data from the survey readily available to users in a one-stop download. The Survey has been conducted approximately every four years since 1977. For each survey year, HRSA has prepared two Public Use File databases in flat ASCII file format without delimiters. The 2008 data are also offerred in SAS and SPSS formats. Information likely to point to an individual in a sparsely-populated county has been withheld. General Public Use Files are State-based and provide information on nurses without identifying the County and Metropolitan Area in which they live or work. County Public Use Files provide most, but not all, the same information on the nurse from the General Public Use File, and also identifies the County and Metropolitan Areas in which the nurses live or work. NSSRN data are to be used for research purposes only and may not be used in any manner to identify individual respondents.
https://timssandpirls.bc.edu/Copyright/index.htmlhttps://timssandpirls.bc.edu/Copyright/index.html
The PIRLS 2016 International Database is available for individuals interested in the data collected and analyzed as part of PIRLS 2016. The aim is to support and promote the use of these data by researchers, analysts, and others interested in improving education. For the PIRLS 2016 assessment, the database includes student reading achievement data as well as the student, parent, teacher, school, and curricular background data for 50 countries and 11 benchmarking entities. The ePIRLS 2016 International Database includes data from the ePIRLS 2016 assessment, with the participation of 14 countries and 2 benchmarking entities. The student, parent, teacher, and school data files are in SAS and SPSS formats.
The entire database and its supporting documents are described in the PIRLS 2016 User Guide (Foy, 2018) and its three supplements. The data can be analyzed using the downloadable IEA IDB Analyzer (version 4.0), an application developed by IEA Hamburg to facilitate the analysis of the PIRLS data.
A public use version of the datasets is available for download using the links below. A restricted use version of the PIRLS 2016 International Database is available to users who require access to variables removed from the public use version (see Chapter 4 of the User Guide). Users who require access to the restricted use version of the International Database to conduct their analyses should contact the IEA (RandA@iea-hamburg.de).
The Washington Post spent a year determining how many children have been affected by school shootings, beyond just those killed or injured. To do that, reporters attempted to identify every act of gunfire at a primary or secondary school during school hours since the Columbine High massacre on April 20, 1999. Using Nexis, news articles, open-source databases, law enforcement reports, information from school websites, and calls to schools and police departments, The Post reviewed more than 1,000 alleged incidents, but counted only those that happened on campuses immediately before, during or just after classes. Shootings at after-hours events, accidental discharges that caused no injuries to anyone other than the person handling the gun, and suicides that occurred privately or posed no threat to other children were excluded. Gunfire at colleges and universities, which affects young adults rather than kids, also was not counted. After finding more than 200 incidents of gun violence that met The Post’s criteria, reporters organized them in a database for analysis. Because the federal government does not track school shootings, it’s possible that the database does not contain every incident that would qualify. To calculate how many children were exposed to gunfire in each school shooting, The Post relied on enrollment figures and demographic information from the U.S. Education Department, including the Common Core of Data and the Private School Universe Survey. The analysis used attendance figures from the year of the shooting for the vast majority of the schools. Credits: Research and Reporting: John Woodrow Cox, Steven Rich and Allyson Chiu Production and Presentation: John Muyskens and Monica Ulmanu Per the terms of the Creative Commons license, CISER notes that: 1. the license for this dataset is attached as the files license.htm and license.pdf. A brief version of the Creative Commons license is also included but users should familiarize themselves with the full license before using. 2. the licensed material is located at https://github.com/washingtonpost/data-school-shootings 3. Several of the files have been modified from the format presented at the above url including creating pdf versions of the documentation files and adding SAS, Stata, and SPSS versions through the use of StatTransfer 13. 4. These adapted versions of the original files are also released through the same Creative Commons license as the original with the same license elements.
https://timssandpirls.bc.edu/Copyright/index.htmlhttps://timssandpirls.bc.edu/Copyright/index.html
The TIMSS Advanced 2015 International Database is available to all individuals interested in the data collected and analyzed as part of TIMSS Advanced 2015. The aim is to support and promote the use of these data by researchers, analysts, and others interested in improving education. A public use version of the database is available for download using the links below. For the TIMSS Advanced 2015 assessment, the database includes student achievement data for two subjects, advanced mathematics and physics, as well as the student, teacher, school, and curricular background data for the 9 participating countries. The student, teacher, and school data files are in SAS and SPSS formats. The entire database and its supporting documents are described in the TIMSS Advanced 2015 User Guide for the International Database (Foy, 2017) and its three supplements. The data can be analyzed using the downloadable IEA IDB Analyzer (version 4.0), an application developed by the IEA Data Processing and Research Center to facilitate the analysis of the TIMSS data. A restricted use version of the TIMSS Advanced 2015 International Database is available to users who require access to variables removed from the public use version (see Chapter 4 of the User Guide). Users who require access to the restricted use version of the International Database to conduct their analyses should contact the IEA through its Study Data Repository.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D