100+ datasets found
  1. d

    Health and Retirement Study (HRS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

  2. f

    DataSheet1_ALASCA: An R package for longitudinal and cross-sectional...

    • frontiersin.figshare.com
    pdf
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anders Hagen Jarmund; Torfinn Støve Madssen; Guro F. Giskeødegård (2023). DataSheet1_ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods.pdf [Dataset]. http://doi.org/10.3389/fmolb.2022.962431.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    Frontiers
    Authors
    Anders Hagen Jarmund; Torfinn Støve Madssen; Guro F. Giskeødegård
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The increasing availability of multivariate data within biomedical research calls for appropriate statistical methods that can describe and model complex relationships between variables. The extended ANOVA simultaneous component analysis (ASCA+) framework combines general linear models and principal component analysis (PCA) to decompose and visualize the separate effects of experimental factors. It has recently been demonstrated how linear mixed models can be included in the framework to analyze data from longitudinal experimental designs with repeated measurements (RM-ASCA+). The ALASCA package for R makes the ASCA+ framework accessible for general use and includes multiple methods for validation and visualization. The package is especially useful for longitudinal data and the ability to easily adjust for covariates is an important strength. This paper demonstrates how the ALASCA package can be applied to gain insights into multivariate data from interventional as well as observational designs. Publicly available data sets from four studies are used to demonstrate the methods available (proteomics, metabolomics, and transcriptomics).

  3. f

    Initial data analysis checklist for data screening in longitudinal studies.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner (2024). Initial data analysis checklist for data screening in longitudinal studies. [Dataset]. http://doi.org/10.1371/journal.pone.0295726.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Initial data analysis checklist for data screening in longitudinal studies.

  4. Accompanying simulated data for "Go multivariate: a Monte Carlo study of a...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Mar 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts (2022). Accompanying simulated data for "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity" [Dataset]. http://doi.org/10.5281/zenodo.6384007
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 25, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Mildiner Moraga; Sebastian Mildiner Moraga; Emmeke Aarts; Emmeke Aarts
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.

    This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.

  5. f

    Likert longitudinal data.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Apr 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Horvath, Zsuzsa; Koury, Sarah E.; Vasquez, Camille S.; Kim, Jia; Wankiiri-Hale, Christine R.; Ceravolo, Kristina M.; Pavlowski, Emily M.; Leite, Taiana C.; Shah, Nilesh H.; Weinberg, Seth M. (2025). Likert longitudinal data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002094093
    Explore at:
    Dataset updated
    Apr 15, 2025
    Authors
    Horvath, Zsuzsa; Koury, Sarah E.; Vasquez, Camille S.; Kim, Jia; Wankiiri-Hale, Christine R.; Ceravolo, Kristina M.; Pavlowski, Emily M.; Leite, Taiana C.; Shah, Nilesh H.; Weinberg, Seth M.
    Description

    The goal of this study was to gain student-centered insights to better understand the challenges of transitioning from undergraduate to dental education. To this end, questionnaires were designed and distributed to incoming dental students, as well as second-, third-, and fourth-year students in the same year for a cross-sectional assessment in 2015/2016. The same questionnaires were also distributed to those same incoming students when they were in their second, third, and fourth years for a longitudinal assessment (2015–2019). There were both open-ended and Likert scale-type questions about expectations (incoming students) and experiences (years 2–4) in dental school compared to undergraduate education. Accordingly, data analysis involved a combination of qualitative and quantitative statistical approaches. Cross-sectional and longitudinal analyses showed that incoming students expected an increased workload in dental school, but also more attention, support, and access to faculty than they received as undergraduates (i.e., they expected a stronger academic support system). All students also reported experiencing more stress and greater difficulty managing their time than expected when compared to their undergraduate experiences. Thus, our study highlights areas of discrepancy between dental students’ initial expectations and their lived experience. Importantly, dental schools can take measures to address these discrepancies, foster a better learning environment, and improve students’ overall experience to help pave a smooth path for students to become successful and well-prepared oral health care providers.

  6. Number of interviews per participant.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner (2024). Number of interviews per participant. [Dataset]. http://doi.org/10.1371/journal.pone.0295726.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Initial data analysis (IDA) is the part of the data pipeline that takes place between the end of data retrieval and the beginning of data analysis that addresses the research question. Systematic IDA and clear reporting of the IDA findings is an important step towards reproducible research. A general framework of IDA for observational studies includes data cleaning, data screening, and possible updates of pre-planned statistical analyses. Longitudinal studies, where participants are observed repeatedly over time, pose additional challenges, as they have special features that should be taken into account in the IDA steps before addressing the research question. We propose a systematic approach in longitudinal studies to examine data properties prior to conducting planned statistical analyses. In this paper we focus on the data screening element of IDA, assuming that the research aims are accompanied by an analysis plan, meta-data are well documented, and data cleaning has already been performed. IDA data screening comprises five types of explorations, covering the analysis of participation profiles over time, evaluation of missing data, presentation of univariate and multivariate descriptions, and the depiction of longitudinal aspects. Executing the IDA plan will result in an IDA report to inform data analysts about data properties and possible implications for the analysis plan—another element of the IDA framework. Our framework is illustrated focusing on hand grip strength outcome data from a data collection across several waves in a complex survey. We provide reproducible R code on a public repository, presenting a detailed data screening plan for the investigation of the average rate of age-associated decline of grip strength. With our checklist and reproducible R code we provide data analysts a framework to work with longitudinal data in an informed way, enhancing the reproducibility and validity of their work.

  7. n

    Data and code for: Generation and applications of simulated datasets to...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Mar 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Silk; Olivier Gimenez (2023). Data and code for: Generation and applications of simulated datasets to integrate social network and demographic analyses [Dataset]. http://doi.org/10.5061/dryad.m0cfxpp7s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2023
    Dataset provided by
    Centre d'Écologie Fonctionnelle et Évolutive
    Authors
    Matthew Silk; Olivier Gimenez
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.

  8. f

    R-squared in longitudinal performance evaluation of the models with 20% of...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nasir,; Gholami, Shahrzad; Marfin, Anthony; Dodhia, Rahul; Weeks, William B.; Bhat, Niranjan; Alderson, Mark; Ferres, Juan Lavista; Taliesin, Brian; Leader, Troy (2025). R-squared in longitudinal performance evaluation of the models with 20% of Pers-007 as test data for different training datasets. Subjects under the test set were excluded from each training set. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002083413
    Explore at:
    Dataset updated
    May 15, 2025
    Authors
    Nasir,; Gholami, Shahrzad; Marfin, Anthony; Dodhia, Rahul; Weeks, William B.; Bhat, Niranjan; Alderson, Mark; Ferres, Juan Lavista; Taliesin, Brian; Leader, Troy
    Description

    R-squared in longitudinal performance evaluation of the models with 20% of Pers-007 as test data for different training datasets. Subjects under the test set were excluded from each training set.

  9. f

    Data from: Challenges of mismatching timescales in longitudinal studies of...

    • datasetcatalog.nlm.nih.gov
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Strauss, Eli; Farine, Damien R; Ogino, Mina (2022). Data from: Challenges of mismatching timescales in longitudinal studies of collective behaviour [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000235015
    Explore at:
    Dataset updated
    Nov 21, 2022
    Authors
    Strauss, Eli; Farine, Damien R; Ogino, Mina
    Description

    Datasets and R scripts for the analysis in Ogino M, Strauss E, Farine D. 2022. Challenges of mismatching timescales in longitudinal studies of collective behaviour (doi.org/10.1098/rstb.2022.0064). 20201001_20201031, 20201101_20201130, 10101201_20211231, 20210101_20210131, 20210201_20210228, 20210301_20210331: files containing daily averaged GPS pairwise distance between males for each month Seasons_Monthly.csv: dataset containing the metadata for each sampling periods Census.Rdata: dataset (list) with dataframes for individual metadata (data$Birds) and census observation data (data$ind_obs, data$grp_obs). The grp_obs dataframe contains the metadata for the ind_obs dataset. Script1_GroupDetection.r: code to detect group memberships over time, using different approaches Script2_GPSpairwisedistance.R: code to produce the boxplot showing averaged GPS pairwise distances within detected communities and GLM to quantify how methodological processes applied in different approaches drive differences in cohesiveness of detected groups. This code requires the outputs of Script1. Script3_GroupSize.R: code to produce the boxplots showing the distribution of detected social unit sizes and Jaccard similarity between social units in consecutive sampling periods, and GLM to quantidy how methodological processes drive differences in group size and Jaccard similarity. This code requires the outputs of Script1 and Script2.

  10. Data_Sheet_3_SplinectomeR Enables Group Comparisons in Longitudinal...

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin R. Shields-Cutler; Gabe A. Al-Ghalith; Moran Yassour; Dan Knights (2023). Data_Sheet_3_SplinectomeR Enables Group Comparisons in Longitudinal Microbiome Studies.PDF [Dataset]. http://doi.org/10.3389/fmicb.2018.00785.s003
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Robin R. Shields-Cutler; Gabe A. Al-Ghalith; Moran Yassour; Dan Knights
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Longitudinal, prospective studies often rely on multi-omics approaches, wherein various specimens are analyzed for genomic, metabolomic, and/or transcriptomic profiles. In practice, longitudinal studies in humans and other animals routinely suffer from subject dropout, irregular sampling, and biological variation that may not be normally distributed. As a result, testing hypotheses about observations over time can be statistically challenging without performing transformations and dramatic simplifications to the dataset, causing a loss of longitudinal power in the process. Here, we introduce splinectomeR, an R package that uses smoothing splines to summarize data for straightforward hypothesis testing in longitudinal studies. The package is open-source, and can be used interactively within R or run from the command line as a standalone tool. We present a novel in-depth analysis of a published large-scale microbiome study as an example of its utility in straightforward testing of key hypotheses. We expect that splinectomeR will be a useful tool for hypothesis testing in longitudinal microbiome studies.

  11. f

    DataSheet1_Kalpra: A kernel approach for longitudinal pathway regression...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heilbronner, Urs; Falkai, Peter; Bickeböller, Heike; Papiol, Sergi; Kohshour, Mojtaba Oraki; Heilbronner, Maria; Heidenreich, Markus; Wendel, Bernadette; Budde, Monika; Schulze, Thomas G. (2022). DataSheet1_Kalpra: A kernel approach for longitudinal pathway regression analysis integrating network information with an application to the longitudinal PsyCourse Study.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000216583
    Explore at:
    Dataset updated
    Dec 6, 2022
    Authors
    Heilbronner, Urs; Falkai, Peter; Bickeböller, Heike; Papiol, Sergi; Kohshour, Mojtaba Oraki; Heilbronner, Maria; Heidenreich, Markus; Wendel, Bernadette; Budde, Monika; Schulze, Thomas G.
    Description

    A popular approach to reduce the high dimensionality resulting from genome-wide association studies is to analyze a whole pathway in a single test for association with a phenotype. Kernel machine regression (KMR) is a highly flexible pathway analysis approach. Initially, KMR was developed to analyze a simple phenotype with just one measurement per individual. Recently, however, the investigation into the influence of genomic factors in the development of disease-related phenotypes across time (trajectories) has gained in importance. Thus, novel statistical approaches for KMR analyzing longitudinal data, i.e. several measurements at specific time points per individual are required. For longitudinal pathway analysis, we extend KMR to long-KMR using the estimation equivalence of KMR and linear mixed models. We include additional random effects to correct for the dependence structure. Moreover, within long-KMR we created a topology-based pathway analysis by combining this approach with a kernel including network information of the pathway. Most importantly, long-KMR not only allows for the investigation of the main genetic effect adjusting for time dependencies within an individual, but it also allows to test for the association of the pathway with the longitudinal course of the phenotype in the form of testing the genetic time-interaction effect. The approach is implemented as an R package, kalpra. Our simulation study demonstrates that the power of long-KMR exceeded that of another KMR method previously developed to analyze longitudinal data, while maintaining (slightly conservatively) the type I error. The network kernel improved the performance of long-KMR compared to the linear kernel. Considering different pathway densities, the power of the network kernel decreased with increasing pathway density. We applied long-KMR to cognitive data on executive function (Trail Making Test, part B) from the PsyCourse Study and 17 candidate pathways selected from Reactome. We identified seven nominally significant pathways.

  12. d

    Replication Data for: The Opportunity Atlas: Mapping the Childhood Roots of...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chetty, Raj; Friedman, John; Hendren, Nathaniel; Jones, Maggie R.; Porter, Sonya R. (2023). Replication Data for: The Opportunity Atlas: Mapping the Childhood Roots of Social Mobility [Dataset]. http://doi.org/10.7910/DVN/NKCQM1
    Explore at:
    Dataset updated
    Nov 12, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chetty, Raj; Friedman, John; Hendren, Nathaniel; Jones, Maggie R.; Porter, Sonya R.
    Description

    This dataset contains replication files for "The Opportunity Atlas: Mapping the Childhood Roots of Social Mobility" by Raj Chetty, John Friedman, Nathaniel Hendren, Maggie R. Jones, and Sonya R. Porter. For more information, see https://opportunityinsights.org/paper/the-opportunity-atlas/. A summary of the related publication follows. We construct a publicly available atlas of children’s outcomes in adulthood by Census tract using anonymized longitudinal data covering nearly the entire U.S. population. For each tract, we estimate children’s earnings distributions, incarceration rates, and other outcomes in adulthood by parental income, race, and gender. These estimates allow us to trace the roots of outcomes such as poverty and incarceration back to the neighborhoods in which children grew up. We find that children’s outcomes vary sharply across nearby tracts: for children of parents at the 25th percentile of the income distribution, the standard deviation of mean household income at age 35 is $5,000 across tracts within counties. We illustrate how these tract-level data can provide insight into how neighborhoods shape the development of human capital and support local economic policy using two applications. First, we show that the estimates permit precise targeting of policies to improve economic opportunity by uncovering specific neighborhoods where certain subgroups of children grow up to have poor outcomes. Neighborhoods matter at a very granular level: conditional on characteristics such as poverty rates in a child’s own Census tract, characteristics of tracts that are one mile away have little predictive power for a child’s outcomes. Our historical estimates are informative predictors of outcomes even for children growing up today because neighborhood conditions are relatively stable over time. Second, we show that the observational estimates are highly predictive of neighborhoods’ causal effects, based on a comparison to data from the Moving to Opportunity experiment and a quasi-experimental research design analyzing movers’ outcomes. We then identify high-opportunity neighborhoods that are affordable to low-income families, providing an input into the design of affordable housing policies. Our measures of children’s long-term outcomes are only weakly correlated with traditional proxies for local economic success such as rates of job growth, showing that the conditions that create greater upward mobility are not necessarily the same as those that lead to productive labor markets. Click here to view the Opportunity Atlas Any opinions and conclusions expressed herein are those of the authors and do not necessarily reflect the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. The statistical summaries reported in this paper have been cleared by the Census Bureau’s Disclosure Review Board release authorization number CBDRB-FY18-319.

  13. S

    Cross-lagged panel network analysis

    • scidb.cn
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu (2024). Cross-lagged panel network analysis [Dataset]. http://doi.org/10.57760/sciencedb.psych.00410
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 10, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Liu
    License

    https://api.github.com/licenses/unlicensehttps://api.github.com/licenses/unlicense

    Description

    The R code for Cross-lagged Panel Network Analysis. The code was used to explore anxiety and depression among college students before and after the COVID-19 pandemic lockdown Lift. A longitudinal survey with two specific points among 705 college students were conducted from 12 December to 30 December 2022 (lockdown period, T1), and from 8 February to 13 March 2023 (lockdown lift period, T2). Contemporaneous network and cross-lagged panel network (CLPN) analysis were conducted to examine the issue from both cross-sectional and longitudinal perspectives.

  14. Microdata: Australian Census Longitudinal Dataset, 2006-2011

    • researchdata.edu.au
    Updated May 2, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Bureau of Statistics (2016). Microdata: Australian Census Longitudinal Dataset, 2006-2011 [Dataset]. https://researchdata.edu.au/microdata-australian-census-2006-2011/2999917
    Explore at:
    Dataset updated
    May 2, 2016
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Australian Bureau of Statistics
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    The Australian Census Longitudinal Dataset (ACLD) brings together a 5% sample from the 2006 Census with records from the 2011 Census to create a research tool for exploring how Australian society is changing over time. In taking a longitudinal view of Australians, the ACLD may uncover new insights into the dynamics and transitions that drive social and economic change over time, conveying how these vary for diverse population groups and geographies. It is envisaged that the 2016 and successive Censuses will be added in the future, as well as administrative data sets. The ACLD is released in ABS TableBuilder and as a microdata product in the ABS Data Laboratory. \r \r The Census of Population and Housing is conducted every five years and aims to measure accurately the number of people and dwellings in Australia on Census Night. \r \r Microdata products are the most detailed information available from a Census or survey and are generally the responses to individual questions on the questionnaire. They also include derived data from answers to two or more questions and are released with the approval of the Australian Statistician.\r The following microdata products are available for this longitudinal dataset: \r •ACLD in TableBuilder - an online tool for creating tables and graphs. \r •ACLD in ABS Data Laboratory (ABSDL) - for in-depth analysis using a range of statistical software packages.\r \r

  15. Data from: Automatic Definition of Robust Microbiome Sub-states in...

    • zenodo.org
    txt, zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beatriz García-Jiménez; Mark D. Wilkinson; Beatriz García-Jiménez; Mark D. Wilkinson (2020). Data from: Automatic Definition of Robust Microbiome Sub-states in Longitudinal Data [Dataset]. http://doi.org/10.5281/zenodo.167376
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Beatriz García-Jiménez; Mark D. Wilkinson; Beatriz García-Jiménez; Mark D. Wilkinson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Output files of the application of our R software (available at https://github.com/wilkinsonlab/robust-clustering-metagenomics) to different microbiome datasets already published.

    Prefixes:

    Suffixes:

    • _All: all taxa

    • _Dominant: only 1% most abundant taxa

    • _NonDominant: remaining taxa after removing above dominant taxa

    • _GenusAll: taxa aggregated at genus level

    • _GenusDominant: taxa aggregated at genes level and then to select only 1% most abundant taxa

    • _GenusNonDominant: taxa aggregated at genus level and then to remove 1% most abundant taxa

    Each folder contains 3 output files related to the same input dataset:
    - data.normAndDist_definitiveClustering_XXX.RData: R data file with a) a phyloseq object (including OTU table, meta-data and cluster assigned to each sample); and b) a distance matrix object.
    - definitiveClusteringResults_XXX.txt: text file with assessment measures of the selected clustering.
    - sampleId-cluster_pairs_XXX.txt: text file. Two columns, comma separated file: sampleID,clusterID

    Abstract of the associated paper:

    The analysis of microbiome dynamics would allow us to elucidate patterns within microbial community evolution; however, microbiome state-transition dynamics have been scarcely studied. This is in part because a necessary first-step in such analyses has not been well-defined: how to deterministically describe a microbiome's "state". Clustering in states have been widely studied, although no standard has been concluded yet. We propose a generic, domain-independent and automatic procedure to determine a reliable set of microbiome sub-states within a specific dataset, and with respect to the conditions of the study. The robustness of sub-state identification is established by the combination of diverse techniques for stable cluster verification. We reuse four distinct longitudinal microbiome datasets to demonstrate the broad applicability of our method, analysing results with different taxa subset allowing to adjust it depending on the application goal, and showing that the methodology provides a set of robust sub-states to examine in downstream studies about dynamics in microbiome.

  16. r

    Data from: Respiratory symptoms after coalmine fire and pandemic: a...

    • researchdata.edu.au
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tyler Lane (2023). Respiratory symptoms after coalmine fire and pandemic: a longitudinal analysis of the Hazelwood Health Study adult cohort [Dataset]. http://doi.org/10.26180/22596994.V8
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset provided by
    Monash University
    Authors
    Tyler Lane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository includes cleaning and analytical R code for a longitudinal study of smoke effects on respiratory symptoms and whether this is moderated by development of COVID-19 at a later date. Unfortunately, data are confidential and therefore cannot be shared.

  17. h

    Understanding Society: Longitudinal Teaching Dataset, Waves 1-9, 2009-2018

    • harmonydata.ac.uk
    Updated Jan 9, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Essex, Institute for Social and Economic Research (2009). Understanding Society: Longitudinal Teaching Dataset, Waves 1-9, 2009-2018 [Dataset]. http://doi.org/10.5255/UKDA-SN-8715-1
    Explore at:
    Dataset updated
    Jan 9, 2009
    Dataset authored and provided by
    University of Essex, Institute for Social and Economic Research
    Description

    Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991. The Understanding Society: Longitudinal Teaching Dataset, Waves 1-9, 2009-2018 is a teaching resource using data from Understanding Society, the UK Household Longitudinal Study, which interviews individuals in the sampled households every year. There are two target audiences – 1) lecturers who would like to use the data file provided for longitudinal methods teaching purposes, and 2) data users who are new to using longitudinal data and can get a better understanding of using longitudinal data by using the supplied analysis guidance which utilizes the data file. The statistical software used to construct the dataset is Stata and the analysis guidance provided is accompanied by Stata syntax only. The datafile is also available to download in SPSS and tab-delimited text formats. The User Guide includes guidance on how to convert the datafile in Stata format to R. A second teaching resource using the Understanding Society survey is also available, see SN 8465, Understanding Society: Ethnicity and Health Teaching Dataset. For information on the main Understanding Society study, see SN 6614, Understanding Society and Harmonised BHPS.

    This study covers topics such as socio-demographic characteristics, education and labour market information, residential information, income, health and wellbeing, political behaviour and opinions, environmental attitudes and behaviours.

  18. Data from: Source code for R tutorials and dataset for empirical case study...

    • zenodo.org
    • data.niaid.nih.gov
    • +2more
    txt
    Updated Jun 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martijn van de Pol; Martijn van de Pol; Lyanne Brouwer; Lyanne Brouwer (2022). Source code for R tutorials and dataset for empirical case study on Malurus elegans (red-winged fairy wren) [Dataset]. http://doi.org/10.5061/dryad.7h44j0ztw
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Martijn van de Pol; Martijn van de Pol; Lyanne Brouwer; Lyanne Brouwer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Biological processes exhibit complex temporal dependencies due to the sequential nature of allocation decisions in organisms' life-cycles, feedback loops, and two-way causality. Consequently, longitudinal data often contain cross-lags: the predictor variable depends on the response variable of the previous time-step. Although statisticians have warned that regression models that ignore such covariate endogeneity in time series are likely to be inappropriate, this has received relatively little attention in biology. Furthermore, the resulting degree of estimation bias remains largely unexplored.

    We use a graphical model and numerical simulations to understand why and how regression models that ignore cross-lags can be biased, and how this bias depends on the length and number of time series. Ecological and evolutionary examples are provided to illustrate that cross-lags may be more common than is typically appreciated and that they occur in functionally different ways.

    We show that routinely used regression models that ignore cross-lags are asymptotically unbiased. However, this offers little relief, as for most realistically feasible lengths of time series conventional methods are biased. Furthermore, collecting time series on multiple subjects–such as populations, groups or individuals—does not help to overcome this bias when the analysis focusses on within-subject patterns (often the pattern of interest). Simulations (R tutorial 1 & 2), a literature search and a real-world empirical example on fairy wrens (data archived here with analyses presented in R-tutorial 3) together suggest that approaches that ignore cross-lags are likely biased in the direction opposite to the sign of the cross-lag (e.g. towards detecting density-dependence of vital rates and against detecting life history trade-offs and benefits of group living). Next, we show that multivariate (e.g. structural equation) models can dynamically account for cross-lags, and simultaneously address additional bias induced by measurement error, but only if the analysis considers multiple time series.

    We provide guidance on how to identify a cross-lag and subsequently specify it in a multivariate model, which can be far from trivial. Our tutorials with data and R code of the worked examples provide step‐by‐step instructions on how to perform such analyses.

    Our study offers insights into situations in which cross-lags can bias analysis of ecological and evolutionary time series and suggests that adopting dynamical models can be important, as this directly affects our understanding of population regulation, the evolution of life histories and cooperation, and possibly many other topics. Determining how strong estimation bias due to ignoring covariate endogeneity has been in the ecological literature requires further study, also because it may interact with other sources of bias.

  19. Data and code for 'Telomere length measurement for longitudinal analysis:...

    • zenodo.org
    • data.europa.eu
    csv
    Updated Oct 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Nettle; Daniel Nettle (2020). Data and code for 'Telomere length measurement for longitudinal analysis: the role of assay precision' [Dataset]. http://doi.org/10.5281/zenodo.3929510
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 22, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Nettle; Daniel Nettle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and code for 'Telomere length measurement for longitudinal analysis: the role of assay precision' by Nettle, Gadalla, Susser, Bateson and Aviv'

    The R script will run both the numerical simulations and the analysis of the empirical data. The exact results of the simulation will vary slightly from run to run. One frozen set of simulation data is also included in the archive.

  20. o

    Social-Relationship-Quality-Depression-and-Inflammation-A-Cross-Cultural-Longitudinal-Study...

    • openicpsr.org
    delimited, zip
    Updated Sep 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Kaveladze; Allison Diamond Altman; Meike Niederhausen; Jennifer M. Loftis; Alan R. Teo (2020). Social-Relationship-Quality-Depression-and-Inflammation-A-Cross-Cultural-Longitudinal-Study [Dataset]. http://doi.org/10.3886/E121261V1
    Explore at:
    zip, delimitedAvailable download formats
    Dataset updated
    Sep 8, 2020
    Dataset provided by
    University of California-Berkeley
    VA Portland Health Care System; Oregon Health & Science University-Portland State University School of Public Health; Oregon Health & Science University
    VA Portland Health Care System; Oregon Health & Science University-Portland State University School of Public Health
    VA Portland Health Care System; Oregon Health & Science University
    University of California-Irvine
    Authors
    Benjamin Kaveladze; Allison Diamond Altman; Meike Niederhausen; Jennifer M. Loftis; Alan R. Teo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2004 - 2010
    Area covered
    United States and Tokyo, Japan
    Description

    Dataset and analysis code for "Social Relationship Quality, Depression and Inflammation: A Cross-Cultural Longitudinal Study in the United States and Japan". Email Benjamin Kaveladze at bkavelad@uci.edu with any questions.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY

Health and Retirement Study (HRS)

Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description

analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

Search
Clear search
Close search
Google apps
Main menu