Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.
The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.
These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis.
The data include the following:
1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc).
2. A text file to import the analysis database into SAS
3. The SAS code to format the analysis database to be used for analytics
4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6
5. SAS code for deriving the multiple regression formula in Table 4.
6. SAS code for deriving the multiple regression formula in Table 5
7. SAS code for deriving the multiple regression formula in Supplementary Table 7
8. SAS code for deriving the multiple regression formula in Supplementary Table 8
9. The Excel files that accompanied the above SAS code to produce the tables
For questions, please email davidkcundiff@gmail.com. Thanks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SAS code to reproduce the simulation study and the analysis of the urine osmolarity example. (ZIP)
Facebook
TwitterThe SAS code used to fit a Poisson regression to detailed and aggregate data.
Facebook
TwitterThis data publication contains the data and SAS code corresponding to the examples provided in the publication "A tutorial on the piecewise regression approach applied to bedload transport data" by Sandra Ryan and Laurie Porth in 2007 (see cross-reference section). The data include rates of bedload transport and discharge recorded from 1985-1993 and 1997 at Little Granite Creek near Jackson, Wyoming as well as the bedload transport and discharge recorded during snowmelt runoff in 1998 and 1999 at Hayden Creek near Salida, Colorado. The SAS code demonstrates how to apply a piecewise linear regression model to these data, as well as bootstrapping techniques to obtain confidence limits for piecewise linear regression parameter estimates.These data were collected to measure rates of bedload transport in coarse grained channels.Original metadata date was 05/31/2007. Metadata modified on 03/19/2013 to adjust citation to include the addition of a DOI (digital object identifier) and other minor edits. Minor metadata updates on 12/20/2016.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ordinary least squares and stepwise selection are widespread in behavioral science research; however, these methods are well known to encounter overfitting problems such that R2 and regression coefficients may be inflated while standard errors and p values may be deflated, ultimately reducing both the parsimony of the model and the generalizability of conclusions. More optimal methods for selecting predictors and estimating regression coefficients such as regularization methods (e.g., Lasso) have existed for decades, are widely implemented in other disciplines, and are available in mainstream software, yet, these methods are essentially invisible in the behavioral science literature while the use of sub optimal methods continues to proliferate. This paper discusses potential issues with standard statistical models, provides an introduction to regularization with specific details on both Lasso and its related predecessor ridge regression, provides an example analysis and code for running a Lasso analysis in R and SAS, and discusses limitations and related methods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Urine osmolarity UOSM, the exposure of main interest, is included in all models. The initial set of adjustment variables for these models was selected by the disjunctive cause criterion. Hazard ratios (HR), confidence limits (CI) and p-values are given. Model stability was evaluated by bootstrap inclusion frequencies (based on bootstrap resamples). UOSM, creatinine clearance, and proteinuria were log2-transformed and therefore, corresponding hazard ratios are per doubling of each variable.Abbreviations and symbols: , significance threshold; ABE, augmented backward elimination; ACEI/ARBs, use of angiotensin-converting enzyme inhibitors and Angiotensin II type 1 receptor blockers; BE, backward elimination; CI, confidence interval; HR, hazard ratio; , change-in-estimate threshold; Uosm, urine osmolarity (mosm/L).Urine osmolarity example: final models selected by backward elimination (BE) with a significance threshold , augmented backward elimination (ABE) with and a change-in-estimate threshold , and unselected model (No selection).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for a quantitative study.Gauteng Covid-19 statistics of the second wave pandemic is captured. The data is comprised of the date, daily positive cases, cumulative confirmed deaths and daily deaths. This data was made available publicly by NICD.Also included is the software code sas file.The code calculates linear regression model estimates, and distributed-lag estimates using the Koyck approach and using the Almon approach.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and SAS code for one-way error component regression model-fixed effects
Facebook
Twitterhttps://pasteur.epa.gov/license/sciencehub-license.htmlhttps://pasteur.epa.gov/license/sciencehub-license.html
We compiled macroinvertebrate assemblage data collected from 1995 to 2014 from the St. Louis River Area of Concern (AOC) of western Lake Superior. Our objective was to define depth-adjusted cutoff values for benthos condition classes (poor, fair, reference) to provide tool useful for assessing progress toward achieving removal targets for the degraded benthos beneficial use impairment in the AOC. The relationship between depth and benthos metrics was wedge-shaped. We therefore used quantile regression to model the limiting effect of depth on selected benthos metrics, including taxa richness, percent non-oligochaete individuals, combined percent Ephemeroptera, Trichoptera, and Odonata individuals, and density of ephemerid mayfly nymphs (Hexagenia). We created a scaled trimetric index from the first three metrics. Metric values at or above the 90th percentile quantile regression model prediction were defined as reference condition for that depth. We set the cutoff between poor and fair condition as the 50th percentile model prediction. We examined sampler type, exposure, geographic zone of the AOC, and substrate type for confounding effects. Based on these analyses we combined data across sampler type and exposure classes and created separate models for each geographic zone. We used the resulting condition class cutoff values to assess the relative benthic condition for three habitat restoration project areas. The depth-limited pattern of ephemerid abundance we observed in the St. Louis River AOC also occurred elsewhere in the Great Lakes. We provide tabulated model predictions for application of our depth-adjusted condition class cutoff values to new sample data.
This dataset is associated with the following publication: Angradi, T., W. Bartsch, A. Trebitz, V. Brady, and J. Launspach. A depth-adjusted ambient distribution approach for setting numeric removal targets for a Great Lakes Area of Concern beneficial use impairment: Degraded benthos. JOURNAL OF GREAT LAKES RESEARCH. International Association for Great Lakes Research, Ann Arbor, MI, USA, 43(1): 108-120, (2017).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The existence of interactive effects of a dichotomous treatment variable on the relationship between the continuous predictor and response variables is an essential issue in biological and medical sciences. Also, considerable attention has been devoted to raising awareness of the often-untenable assumption of homogeneous error variance among treatment groups. Although the procedures for detecting interactions between treatment and predictor variables are well documented in the literature, the corresponding problem of power and sample size calculations has received relatively little attention. In order to facilitate interaction design planning, this article describes power and sample size procedures for the extended Welch test of difference between two regression slopes under heterogeneity of variance. Two different formulations are presented to explicate the implications of appropriate reliance on the predictor variables. The simplified method only utilizes the partial information of predictor variances and has the advantages of statistical and computational simplifications. However, extensive numerical investigations showed that it is relatively less accurate than the more profound procedure that accommodates the full distributional features of the predictors. According to the analytic justification and empirical performance, the proposed approach gives reliable solutions to power assessment and sample size determination in the detection of interaction effects. A numerical example involving kidney weigh and body weigh of crossbred diabetic and normal mice is used to illustrate the suggested procedures with flexible allocation schemes. Moreover, the organ and body weights data is incorporated in the accompany SAS and R software programs to illustrate the ease and convenience of the proposed techniques for design planning in interactive research.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Depending on the scale of the characteristic and its distribution either the median (1st, 3rd quartile), mean (standard deviation SD), or absolute number n (percentage) is given.Abbreviations: ACEI/ARBs, use of angiotensin-converting enzyme inhibitors and Angiotensin II type 1 receptor blockers; SD, standard deviation; UOSM, urine osmolarity.Urine osmolarity example: demographic and clinical characteristics of all 245 patients at baseline.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and SAS code to replicate the associated results:
PHYTODIV_DATASET --> Excel file with all the databases to conduct the analysis
Sequestration.csv --> Caterpillar cardenolide sequestration data
Sequestration.sas --> SAS code to run the analysis of caterpillar growth and sequestration
MonteCarlo_seqgrwt.csv --> Caterpillar growth and cardenolide sequestration data for regression analysis
MonteCarlo_seqgrwt.sas --> SAS code to perform growth-damage regression, and growth-sequestration regression of observed data and compare to 10.000 simulations of regression analysis
CAFEassay.csv --> Data of fly survival and feeding rate when feeding on toxic diets
CAFE_assay.sas --> SAS code to analyze fly survival and feeding rate when feeding on toxic diets
Enzyme_assay.csv --> Data of sodium pump inhibition due to cardenolide toxins
Enzyme_assay.sas --> SAS code to analyze and compare the sodium pump inhibition of single vs toxin mixtures
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Discover the booming market for Regression Data Analysis Software! This in-depth analysis reveals a $5 billion market in 2025, projected to reach $10 billion by 2033, driven by AI, ML, and increasing data volumes. Learn about key players, market trends, and growth opportunities.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Discover the booming market for regression analysis tools! This comprehensive analysis explores market size, growth trends (CAGR), key players (IBM SPSS, SAS, Python Scikit-learn), and regional insights (Europe, North America). Learn how data-driven decision-making fuels demand for these essential predictive analytics tools.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Discover the booming Regression Data Analysis Software market! Our comprehensive report reveals a $5B USD market in 2025, projecting 12% CAGR growth to 2033. Explore key trends, segments (healthcare, BFSI, etc.), leading companies (Microsoft, SAS, IBM), and regional insights. Get your data-driven advantage today!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abbreviations: SAS = Self-rating Anxiety Scale. Post-RT interval = post-radiotherapy interval; Adj. = adjusted; Std. = standard.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Discover the booming Regression Data Analysis Software market! Explore key trends, CAGR, market size projections, leading companies, and regional insights. Learn how this crucial technology is transforming healthcare, finance, and more. Get your competitive edge in the data-driven economy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This formatted dataset originates from raw data files from the Institute of Health Metrics and Evaluation Global Burden of Disease (GBD2017). It is population weighted worldwide data on male and female cohorts ages 15-69 years including body mass index (BMI) and cardiovascular disease (CVD) and associated dietary, metabolic and other risk factors. The purpose of creating this formatted database is to explore the univariate and multiple regression correlations of BMI and CVD and other health outcomes with risk factors. Our research hypothesis is that we can successfully apply artificial intelligence to model BMI and CVD risk factors and health outcomes. We derived a BMI multiple regression risk factor formula that satisfied all nine Bradford Hill causality criteria for epidemiology research. We found that animal products and added fats are negatively correlated with CVD early deaths worldwide but positively correlated with CVD early deaths in high quantities. We interpret this as showing that optimal cardiovascular outcomes come with moderate (not low and not high) intakes of animal foods and added fats.
For questions, please email davidkcundiff@gmail.com. Thanks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model-based standard error, robust standard error and standard error based on bootstrap resamples for models selected with backward elimination (BE) and augmented backward elimination (ABE).Abbreviations and symbols: , significance threshold; ABE, augmented backward elimination; BE, backward elimination; , change-in-estimate threshold; SE, standard error; Uosm, urine osmolarity (mosm/L).Urine osmolarity example: incorporating model uncertainty into standard error (SE) estimates of urine osmolarity UOSM.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.
The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.
These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis.
The data include the following:
1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc).
2. A text file to import the analysis database into SAS
3. The SAS code to format the analysis database to be used for analytics
4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6
5. SAS code for deriving the multiple regression formula in Table 4.
6. SAS code for deriving the multiple regression formula in Table 5
7. SAS code for deriving the multiple regression formula in Supplementary Table 7
8. SAS code for deriving the multiple regression formula in Supplementary Table 8
9. The Excel files that accompanied the above SAS code to produce the tables
For questions, please email davidkcundiff@gmail.com. Thanks.