Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.
The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.
These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis.
The data include the following:
1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc).
2. A text file to import the analysis database into SAS
3. The SAS code to format the analysis database to be used for analytics
4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6
5. SAS code for deriving the multiple regression formula in Table 4.
6. SAS code for deriving the multiple regression formula in Table 5
7. SAS code for deriving the multiple regression formula in Supplementary Table 7
8. SAS code for deriving the multiple regression formula in Supplementary Table 8
9. The Excel files that accompanied the above SAS code to produce the tables
For questions, please email davidkcundiff@gmail.com. Thanks.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description of the experiment setting: location, influential climatic conditions, controlled conditions (e.g. temperature, light cycle) In 1986, the Congress enacted Public Laws 99-500 and 99-591, requiring a biennial report on the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC). In response to these requirements, FNS developed a prototype system that allowed for the routine acquisition of information on WIC participants from WIC State Agencies. Since 1992, State Agencies have provided electronic copies of these data to FNS on a biennial basis. FNS and the National WIC Association (formerly National Association of WIC Directors) agreed on a set of data elements for the transfer of information. In addition, FNS established a minimum standard dataset for reporting participation data. For each biennial reporting cycle, each State Agency is required to submit a participant-level dataset containing standardized information on persons enrolled at local agencies for the reference month of April. The 2016 Participant and Program Characteristics (PC2016) is the thirteenth data submission to be completed using the WIC PC reporting system. In April 2016, there were 90 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the American Virgin Islands, and 34 Indian tribal organizations. Processing methods and equipment used Specifications on formats (“Guidance for States Providing Participant Data”) were provided to all State agencies in January 2016. This guide specified 20 minimum dataset (MDS) elements and 11 supplemental dataset (SDS) elements to be reported on each WIC participant. Each State Agency was required to submit all 20 MDS items and any SDS items collected by the State agency. Study date(s) and duration The information for each participant was from the participants’ most current WIC certification as of April 2016. Due to management information constraints, Connecticut provided data for a month other than April 2016, specifically August 16 – September 15, 2016. Study spatial scale (size of replicates and spatial scale of study area) In April 2016, there were 90 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the American Virgin Islands, and 34 Indian tribal organizations. Level of true replication Unknown Sampling precision (within-replicate sampling or pseudoreplication) State Agency Data Submissions. PC2016 is a participant dataset consisting of 8,815,472 active records. The records, submitted to USDA by the State Agencies, comprise a census of all WIC enrollees, so there is no sampling involved in the collection of this data. PII Analytic Datasets. State agency files were combined to create a national census participant file of approximately 8.8 million records. The census dataset contains potentially personally identifiable information (PII) and is therefore not made available to the public. National Sample Dataset. The public use SAS analytic dataset made available to the public has been constructed from a nationally representative sample drawn from the census of WIC participants, selected by participant category. The nationally representative sample is composed of 60,003 records. The distribution by category is 5,449 pregnant women, 4,661 breastfeeding women, 3,904 postpartum women, 13,999 infants, and 31,990 children. Level of subsampling (number and repeat or within-replicate sampling) The proportionate (or self-weighting) sample was drawn by WIC participant category: pregnant women, breastfeeding women, postpartum women, infants, and children. In this type of sample design, each WIC participant has the same probability of selection across all strata. Sampling weights are not needed when the data are analyzed. In a proportionate stratified sample, the largest stratum accounts for the highest percentage of the analytic sample. Study design (before–after, control–impacts, time series, before–after-control–impacts) None – Non-experimental Description of any data manipulation, modeling, or statistical analysis undertaken Each entry in the dataset contains all MDS and SDS information submitted by the State agency on the sampled WIC participant. In addition, the file contains constructed variables used for analytic purposes. To protect individual privacy, the public use file does not include State agency, local agency, or case identification numbers. Description of any gaps in the data or other limiting factors Due to management information constraints, Connecticut provided data for a month other than April 2016, specifically August 16 – September 15, 2016. Outcome measurement methods and equipment used None Resources in this dataset:Resource Title: WIC Participant and Program Characteristics 2016. File Name: wicpc_2016_public.csvResource Description: The 2016 Participant and Program Characteristics (PC2016) is the thirteenth data submission to be completed using the WIC PC reporting system. In April 2016, there were 90 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the American Virgin Islands, and 34 Indian tribal organizations.Resource Software Recommended: SAS, version 9.4,url: https://www.sas.com/en_us/software/sas9.html Resource Title: WIC Participant and Program Characteristics 2016 Codebook. File Name: WICPC2016_PUBLIC_CODEBOOK.xlsxResource Software Recommended: SAS, version 9.4,url: https://www.sas.com/en_us/software/sas9.html Resource Title: WIC Participant and Program Characteristics 2016 - Zip File with SAS, SPSS and STATA data. File Name: WIC_PC_2016_SAS_SPSS_STATA_Files.zipResource Description: WIC Participant and Program Characteristics 2016 - Zip File with SAS, SPSS and STATA data
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Emerging Pathogens Initiative (EPI) database contains emerging pathogens information from the local Veterans Affairs Medical Centers (VAMCs). The EPI software package allows the VA to track emerging pathogens on the national level without additional data entry at the local level. The results from aggregation of data can be shared with the appropriate public health authorities including non-VA and the private health care sector allowing national planning, formulation of intervention strategies, and resource allocations. EPI is designed to automatically collect data on emerging diseases for Veterans Affairs Central Office (VACO) to analyze. The data is sent to the Austin Information Technology Center (AITC) from all Veterans Health Information Systems and Technology Architecture (VistA) systems for initial processing and combination with related workload data. VACO data retrieval and analysis is then carried out. The AITC creates two file structures both in Statistical Analysis Software (SAS) file format, which are used as a source of data for the Veterans Affairs Headquarters (VAHQ) Infectious Diseases Program Office. These files are manipulated and used for analysis and reporting by the National Infectious Diseases Service. Emerging Pathogens (as characterized by VACO) act as triggers for data acquisition activities in the automated program. The system retrieves relevant, predetermined, patient-specific information in the form of a Health Level Seven (HL7) message that is transmitted to the central data repository at the AITC. Once at that location, the data is converted to a SAS dataset for analysis by the VACO National Infectious Diseases Service. Before data transmission an Emerging Pathogens Verification Report is produced for the local sites to review, verify, and make corrections as needed. After data transmission to the AITC it is added to the EPI database.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the consumer expenditure survey (ce) with r the consumer expenditure survey (ce) is the primo data source to understand how americans spend money. participating households keep a running diary about every little purchase over the year. those diaries are then summed up into precise expenditure categories. how else are you gonna know that the average american household spent $34 (±2) on bacon, $826 (±17) on cellular phones, and $13 (±2) on digital e-readers in 2011? an integral component of the market basket calculation in the consumer price index, this survey recently became available as public-use microdata and they're slowly releasing historical files back to 1996. hooray! for a t aste of what's possible with ce data, look at the quick tables listed on their main page - these tables contain approximately a bazillion different expenditure categories broken down by demographic groups. guess what? i just learned that americans living in households with $5,000 to $9,999 of annual income spent an average of $283 (±90) on pets, toys, hobbies, and playground equipment (pdf page 3). you can often get close to your statistic of interest from these web tables. but say you wanted to look at domestic pet expenditure among only households with children between 12 and 17 years old. another one of the thirteen web tables - the consumer unit composition table - shows a few different breakouts of households with kids, but none matching that exact population of interest. the bureau of labor statistics (bls) (the survey's designers) and the census bureau (the survey's administrators) have provided plenty of the major statistics and breakouts for you, but they're not psychic. if you want to comb through this data for specific expenditure categories broken out by a you-defined segment of the united states' population, then let a little r into your life. fun starts now. fair warning: only analyze t he consumer expenditure survey if you are nerd to the core. the microdata ship with two different survey types (interview and diary), each containing five or six quarterly table formats that need to be stacked, merged, and manipulated prior to a methodologically-correct analysis. the scripts in this repository contain examples to prepare 'em all, just be advised that magnificent data like this will never be no-assembly-required. the folks at bls have posted an excellent summary of what's av ailable - read it before anything else. after that, read the getting started guide. don't skim. a few of the descriptions below refer to sas programs provided by the bureau of labor statistics. you'll find these in the C:\My Directory\CES\2011\docs directory after you run the download program. this new github repository contains three scripts: 2010-2011 - download all microdata.R lo op through every year and download every file hosted on the bls's ce ftp site import each of the comma-separated value files into r with read.csv depending on user-settings, save each table as an r data file (.rda) or stat a-readable file (.dta) 2011 fmly intrvw - analysis examples.R load the r data files (.rda) necessary to create the 'fmly' table shown in the ce macros program documentation.doc file construct that 'fmly' table, using five quarters of interviews (q1 2011 thru q1 2012) initiate a replicate-weighted survey design object perform some lovely li'l analysis examples replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using unimputed variables replicate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t -tests using unimputed variables create an rsqlite database (to minimize ram usage) containing the five imputed variable files, after identifying which variables were imputed based on pdf page 3 of the user's guide to income imputation initiate a replicate-weighted, database-backed, multiply-imputed survey design object perform a few additional analyses that highlight the modified syntax required for multiply-imputed survey designs replicate the %mean_variance() macro found in "ce macros.sas" and provide some examples of calculating descriptive statistics using imputed variables repl icate the %compare_groups() macro found in "ce macros.sas" and provide some examples of performing t-tests using imputed variables replicate the %proc_reg() and %proc_logistic() macros found in "ce macros.sas" and provide some examples of regressions and logistic regressions using both unimputed and imputed variables replicate integrated mean and se.R match each step in the bls-provided sas program "integr ated mean and se.sas" but with r instead of sas create an rsqlite database when the expenditure table gets too large for older computers to handle in ram export a table "2011 integrated mean and se.csv" that exactly matches the contents of the sas-produced "2011 integrated mean and se.lst" text file click here to view these three scripts for...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.
The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.
These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis.
The data include the following:
1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc).
2. A text file to import the analysis database into SAS
3. The SAS code to format the analysis database to be used for analytics
4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6
5. SAS code for deriving the multiple regression formula in Table 4.
6. SAS code for deriving the multiple regression formula in Table 5
7. SAS code for deriving the multiple regression formula in Supplementary Table 7
8. SAS code for deriving the multiple regression formula in Supplementary Table 8
9. The Excel files that accompanied the above SAS code to produce the tables
For questions, please email davidkcundiff@gmail.com. Thanks.