Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Correlations (above diagonal), standard deviations (diagonal) and covariances (below diagonal) of grip strength across waves for males.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Social networks are tied to population dynamics; interactions are driven by population density and demographic structure, while social relationships can be key determinants of survival and reproductive success. However, difficulties integrating models used in demography and network analysis have limited research at this interface. We introduce the R package genNetDem for simulating integrated network-demographic datasets. It can be used to create longitudinal social networks and/or capture-recapture datasets with known properties. It incorporates the ability to generate populations and their social networks, generate grouping events using these networks, simulate social network effects on individual survival, and flexibly sample these longitudinal datasets of social associations. By generating co-capture data with known statistical relationships it provides functionality for methodological research. We demonstrate its use with case studies testing how imputation and sampling design influence the success of adding network traits to conventional Cormack-Jolly-Seber (CJS) models. We show that incorporating social network effects in CJS models generates qualitatively accurate results, but with downward-biased parameter estimates when network position influences survival. Biases are greater when fewer interactions are sampled or fewer individuals are observed in each interaction. While our results indicate the potential of incorporating social effects within demographic models, they show that imputing missing network measures alone is insufficient to accurately estimate social effects on survival, pointing to the importance of incorporating network imputation approaches. genNetDem provides a flexible tool to aid these methodological advancements and help researchers test other sampling considerations in social network studies. Methods The dataset and code stored here is for Case Studies 1 and 2 in the paper. Datsets were generated using simulations in R. Here we provide 1) the R code used for the simulations; 2) the simulation outputs (as .RDS files); and 3) the R code to analyse simulation outputs and generate the tables and figures in the paper.
The English Longitudinal Study of Ageing (ELSA) is a longitudinal survey of ageing and quality of life among older people that explores the dynamic relationships between health and functioning, social networks and participation, and economic position as people plan for, move into and progress beyond retirement. The main objectives of ELSA are to:
Further information may be found on the "https://www.elsa-project.ac.uk/"> ELSA project website, the or Natcen Social Research: ELSA web pages.
Health conditions research with ELSA - June 2021
The ELSA Data team have found some issues with historical data measuring health conditions. If you are intending to do any analysis looking at the following health conditions, then please contact elsadata@natcen.ac.uk for advice on how you should approach your analysis. The affected conditions are: eye conditions (glaucoma; diabetic eye disease; macular degeneration; cataract), CVD conditions (high blood pressure; angina; heart attack; Congestive Heart Failure; heart murmur; abnormal heart rhythm; diabetes; stroke; high cholesterol; other heart trouble) and chronic health conditions (chronic lung disease; asthma; arthritis; osteoporosis; cancer; Parkinson's Disease; emotional, nervous or psychiatric problems; Alzheimer's Disease; dementia; malignant blood disorder; multiple sclerosis or motor neurone disease).
For information on obtaining data from ELSA that are not held at the UKDS, see the ELSA Genetic data access and Accessing ELSA data webpages.
Wave 10 Health data
Users should note that in Wave 10, the health section of the ELSA questionnaire has been revised and all respondents were asked anew about their health conditions, rather than following the prior approach of asking those who had taken part in the past waves to confirm previously recorded conditions. Due to this reason, the health conditions feed-forward data will not be archived, as was done in previous waves.
ELSA IFS Derived and Financial Derived data and documentation update, February 2025:
For the 44th edition (February 2025), all IFS derived and financial derived datasets and accompanying documentation have been updated. The IFS has improved some calculations of derived variables, so a redeposit was required.
Harmonized dataset:
Users of the Harmonized dataset who prefer to use the Stata version will need access to Stata MP software, as the version G3 file contains 11,779 variables (the limit for the standard Stata 'Intercooled' version is 2,047).
ELSA COVID-19 study:
A separate ad-hoc study conducted with ELSA respondents, measuring the socio-economic effects/psychological impact of the lockdown on the aged 50+ population of England, is also available under SN 8688,
English Longitudinal Study of Ageing COVID-19 Study.
https://www.icpsr.umich.edu/web/ICPSR/studies/37287/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37287/terms
The Longitudinal Study of American Youth (LSAY) is a project that was originally funded by the National Science Foundation in 1985 and was designed to examine the development of: (1) student attitudes toward and achievement in science, (2) student attitudes toward and achievement in mathematics, and (3) student interest in and plans for a career in science, mathematics, or engineering, during middle school, high school, and the first four years post-high school. The relative influence parents, home, teachers, school, peers, media, and selected informal learning experiences had on these developmental patterns was considered as well. The LSAY was designed to select and follow two cohorts of students in 1987. Cohort One was a national sample of approximately 3,000 tenth grade students in public high schools throughout the United States. Cohort Two, consisted of a national sample of 3,116 seventh grade students in public schools that served as feeder schools to the same high schools in which the older cohort was enrolled. Data collection continues for Cohorts One and Two, 31 years after the study began. In the fall of 2015, data collection began on a third cohort: Cohort Three. Cohort Three consisted of 3,721 students in the seventh grade in public schools throughout the United States. The data in this release provides seventh grade comparison data across a 28-year timespan: Cohort Two (1987-1988) and Cohort Three (2015-2016). This study includes arts-related variables about student and parent participation in music, art, literary, dance, and theatrical pursuits. For a more details please see Description of Variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ESM file 1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R markdown analysis outline. The code used to read study data, aggregate OTUs for heatmap analysis, determine OTU proportional abundance, evaluate alpha and beta diversity, and perform generalized estimating equations (GEE) modeling of longitudinal data is demonstrated. (RMD 8.30 KB)
https://www.icpsr.umich.edu/web/ICPSR/studies/20520/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/20520/terms
Children of Immigrants Longitudinal Study (CILS) was designed to study the adaptation process of the immigrant second generation which is defined broadly as United States-born children with at least one foreign-born parent or children born abroad but brought at an early age to the United States. The original survey was conducted with large samples of second-generation immigrant children attending the 8th and 9th grades in public and private schools in the metropolitan areas of Miami/Ft. Lauderdale in Florida and San Diego, California. Conducted in 1992, the first survey had the purpose of ascertaining baseline information on immigrant families, children's demographic characteristics, language use, self-identities, and academic attainment. The total sample size was 5,262. Respondents came from 77 different nationalities, although the sample reflects the most sizable immigrant nationalities in each area. Three years later, corresponding to the time in which respondents were about to graduate from high school, the first follow-up survey was conducted. Its purpose was to examine the evolution of key adaptation outcomes including language knowledge and preference, ethnic identity, self-esteem, and academic attainment over the adolescent years. The survey also sought to establish the proportion of second-generation youths who dropped out of school before graduation. This follow-up survey retrieved 4,288 respondents or 81.5 percent of the original sample. Together with this follow-up survey, a parental survey was conducted. The purpose of this interview was to establish directly characteristics of immigrant parents and families and their outlooks for the future including aspirations and plans for the children. Since many immigrant parents did not understand English, this questionnaire was translated and administered in six different foreign languages. In total, 2,442 parents or 46 percent of the original student sample were interviewed. During 2001-2003, or a decade after the original survey, a final follow-up was conducted. The sample now averaged 24 years of age and, hence, patterns of adaptation in early adulthood could be readily assessed. The original and follow-up surveys were conducted mostly in schools attended by respondents, greatly facilitating access to them. Most respondents had already left school by the time of the second follow-up so they had to be contacted individually in their place of work or residence. Respondents were located not only in the San Diego and Miami areas, but also in more than 30 different states, with some surveys returned from military bases overseas. Mailed questionnaires were the principal source of completed data in this third survey. In total, CILS-III retrieved complete or partial information on 3,613 respondents representing 68.9 percent of the original sample and 84.3 percent of the first follow-up.Relevant adaptation outcomes measured in this survey include educational attainment, employment and occupational status, income, civil status and ethnicity of spouses/partners, political attitudes and participation, ethnic and racial identities, delinquency and incarceration, attitudes and levels of identification with American society, and plans for the future.
https://www.icpsr.umich.edu/web/ICPSR/studies/39093/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39093/terms
The Home Mortgage Disclosure Act (HMDA) database (Consumer Financial Protection Bureau, 2022) has compiled mortgage lending data since 1981, but the collection and dissemination methods have changed over time (Federal Financial Institutions Examination Council, 2018), creating barriers to conducting longitudinal analyses. This HMDA Longitudinal Dataset (HLD) organizes and standardizes information across different eras of HMDA data collection between 1981 and 2021, enabling such analysis. This collection contains two types of datasets: 1) HMDA aggregated data by census tract for each decade and 2) HMDA aggregated data by census tract for individual years. Items for analysis include borrower income values, mortgages by loan type (e.g., conventional, Federal Housing Administration (FHA), Veterans Affairs (VA), refinances), and mortgages by borrower race and gender.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The multilevel hidden Markov model (MHMM) is a promising vehicle to investigate latent dynamics over time in social and behavioral processes. By including continuous individual random effects, the model accommodates variability between individuals, providing individual-specific trajectories and facilitating the study of individual differences. However, the performance of the MHMM has not been sufficiently explored. Currently, there are no practical guidelines on the sample size needed to obtain reliable estimates related to categorical data characteristics We performed an extensive simulation to assess the effect of the number of dependent variables (1-4), the number of individuals (5-90), and the number of observations per individual (100-1600) on the estimation performance of group-level parameters and between-individual variability on a Bayesian MHMM with categorical data of various levels of complexity. We found that using multivariate data generally alleviates the sample size needed and improves the stability of the results. Regarding the estimation of group-level parameters, the number of individuals and observations largely compensate for each other. Meanwhile, only the former drives the estimation of between-individual variability. We conclude with guidelines on the sample size necessary based on the complexity of the data and the study objectives of the practitioners.
This repository contains data generated for the manuscript: "Go multivariate: a Monte Carlo study of a multilevel hidden Markov model with categorical data of varying complexity". It comprehends: (1) model outputs (maximum a posteriori estimates) for each repetition (n=100) of each scenario (n=324) of the main simulation, (2) complete model outputs (including estimates for 4000 MCMC iterations) for two chains of each repetition (n=3) of each scenario (n=324). Please note that the empirical data used in the manuscript is not available as part of this repository. A subsample of the data used in the empirical example are openly available as an example data set in the R package mHMMbayes on CRAN. The full data set is available on request from the authors.
The dataset contains infant freezing observations at 15-months of age, as well as raw data of self-reported internalizing symptoms (ages 9, 12, 14, and 17), parent-reported internalizing symptoms (ages 5, 9, 12, 14, and 17), parent-reported externalizing symptoms, quality of parental behavior, social peer preference, other temperamental fearfulness assessments and genetic data on 5-HTTLPR. This data was used for analyses in the research described in the Developmental Science paper by Niermann et al. (2018). The current study tested prospectively whether observed freezing in infancy predicted the development of internalizing symptoms from childhood through late adolescence. A full description of the procedure and the measures is given in the Methodology file. The R-syntax files contain a description of data as well as all steps of data analysis that were performed. The results of these analyses are described in the paper.
This is the R script used for the analysis for Socio-spatial stratification of housing tenure trajectories in Sweden – A longitudinal cohort study. Note that we cannot share the micro-data used for this analysis because it belongs to the SCB, Statistics Sweden. To arrange access to Swedish micro-data go to: https://www.scb.se/en/services/ordering-data-and-statistics/ordering-microdata/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4.R scripts and data for preparing the tomato data and carrying out the reported analyses. The data is provided in the file tomato.dat.csv, but in R is also available with the growthPheno package. The script global.r provides settings, constants and functions that are used across all scripts and is executed at the beginning of most scripts. The script SET.r gives the code for obtaining the smoothed longitudinal data (Steps 1–4 of the SET process). Cart.dat.r extracts the per-cart.traits (Step 5 of the SET process). Cart.anal.r analyses the per-cart data and Cart.predict.r obtains the predictions based on the selected models (Step 6 of the SET-based analysis). Cart.joint.r performs the extra joint analysis of per-cart traits. Longi.anal.r fits several models to all the tomato data for PSA and ln(PSA) in order to establish a variance model for each and then, for the selected variance model, the number of knots for the splines describing the curved trend for each combination of Zn and AMF is varied (Stages 1–2). Longi.predict.r obtains the predictions for the different numbers of knots (Stage 3). Longi.trend.r investigates the effect of Zn and AMF on the time trend when 10 knots are used and does diagnostic checking of the residuals (Stage 4); it also fits a reduced variance model that assumes equal variances for different DAPs and zero correlation between DAPs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data and code archive provides all the files that are necessary to replicate the empirical analyses that are presented in the paper "Climate impacts and adaptation in US dairy systems 1981-2018" authored by Maria Gisbert-Queral, Arne Henningsen, Bo Markussen, Meredith T. Niles, Ermias Kebreab, Angela J. Rigden, and Nathaniel D. Mueller and published in 'Nature Food' (2021, DOI: 10.1038/s43016-021-00372-z). The empirical analyses are entirely conducted with the "R" statistical software using the add-on packages "car", "data.table", "dplyr", "ggplot2", "grid", "gridExtra", "lmtest", "lubridate", "magrittr", "nlme", "OneR", "plyr", "pracma", "quadprog", "readxl", "sandwich", "tidyr", "usfertilizer", and "usmap". The R code was written by Maria Gisbert-Queral and Arne Henningsen with assistance from Bo Markussen. Some parts of the data preparation and the analyses require substantial amounts of memory (RAM) and computational power (CPU). Running the entire analysis (all R scripts consecutively) on a laptop computer with 32 GB physical memory (RAM), 16 GB swap memory, an 8-core Intel Xeon CPU E3-1505M @ 3.00 GHz, and a GNU/Linux/Ubuntu operating system takes around 11 hours. Running some parts in parallel can speed up the computations but bears the risk that the computations terminate when two or more memory-demanding computations are executed at the same time.
This data and code archive contains the following files and folders:
README Description: text file with this description
flowchart.pdf Description: a PDF file with a flow chart that illustrates how R scripts transform the raw data files to files that contain generated data sets and intermediate results and, finally, to the tables and figures that are presented in the paper.
runAll.sh Description: a (bash) shell script that runs all R scripts in this data and code archive sequentially and in a suitable order (on computers with a "bash" shell such as most computers with MacOS, GNU/Linux, or Unix operating systems)
Folder "DataRaw" Description: folder for raw data files This folder contains the following files:
DataRaw/COWS.xlsx Description: MS-Excel file with the number of cows per county Source: USDA NASS Quickstats Observations: All available counties and years from 2002 to 2012
DataRaw/milk_state.xlsx Description: MS-Excel file with average monthly milk yields per cow Source: USDA NASS Quickstats Observations: All available states from 1981 to 2018
DataRaw/TMAX.csv Description: CSV file with daily maximum temperatures Source: PRISM Climate Group (spatially averaged) Observations: All counties from 1981 to 2018
DataRaw/VPD.csv Description: CSV file with daily maximum vapor pressure deficits Source: PRISM Climate Group (spatially averaged) Observations: All counties from 1981 to 2018
DataRaw/countynamesandID.csv Description: CSV file with county names, state FIPS codes, and county FIPS codes Source: US Census Bureau Observations: All counties
DataRaw/statecentroids.csv Descriptions: CSV file with latitudes and longitudes of state centroids Source: Generated by Nathan Mueller from Matlab state shapefiles using the Matlab "centroid" function Observations: All states
Folder "DataGenerated" Description: folder for data sets that are generated by the R scripts in this data and code archive. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these generated data files so that parts of the analysis can be replicated (e.g., on computers with insufficient memory to run all parts of the analysis).
Folder "Results" Description: folder for intermediate results that are generated by the R scripts in this data and code archive. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these intermediate results so that parts of the analysis can be replicated (e.g., on computers with insufficient memory to run all parts of the analysis).
Folder "Figures" Description: folder for the figures that are generated by the R scripts in this data and code archive and that are presented in our paper. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these figures so that people who replicate our analysis can more easily compare the figures that they get with the figures that are presented in our paper. Additionally, this folder contains CSV files with the data that are required to reproduce the figures.
Folder "Tables" Description: folder for the tables that are generated by the R scripts in this data and code archive and that are presented in our paper. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these tables so that people who replicate our analysis can more easily compare the tables that they get with the tables that are presented in our paper.
Folder "logFiles" Description: the shell script runAll.sh writes the output of each R script that it runs into this folder. We provide these log files so that people who replicate our analysis can more easily compare the R output that they get with the R output that we got.
PrepareCowsData.R Description: R script that imports the raw data set COWS.xlsx and prepares it for the further analyses
PrepareWeatherData.R Description: R script that imports the raw data sets TMAX.csv, VPD.csv, and countynamesandID.csv, merges these three data sets, and prepares the data for the further analyses
PrepareMilkData.R Description: R script that imports the raw data set milk_state.xlsx and prepares it for the further analyses
CalcFrequenciesTHI_Temp.R Description: R script that calculates the frequencies of days with the different THI bins and the different temperature bins in each month for each state
CalcAvgTHI.R Description: R script that calculates the average THI in each state
PreparePanelTHI.R Description: R script that creates a state-month panel/longitudinal data set with exposure to the different THI bins
PreparePanelTemp.R Description: R script that creates a state-month panel/longitudinal data set with exposure to the different temperature bins
PreparePanelFinal.R Description: R script that creates the state-month panel/longitudinal data set with all variables (e.g., THI bins, temperature bins, milk yield) that are used in our statistical analyses
EstimateTrendsTHI.R Description: R script that estimates the trends of the frequencies of the different THI bins within our sampling period for each state in our data set
EstimateModels.R Description: R script that estimates all model specifications that are used for generating results that are presented in the paper or for comparing or testing different model specifications
CalcCoefStateYear.R Description: R script that calculates the effects of each THI bin on the milk yield for all combinations of states and years based on our 'final' model specification
SearchWeightMonths.R Description: R script that estimates our 'final' model specification with different values of the weight of the temporal component relative to the weight of the spatial component in the temporally and spatially correlated error term
TestModelSpec.R Description: R script that applies Wald tests and Likelihood-Ratio tests to compare different model specifications and creates Table S10
CreateFigure1a.R Description: R script that creates subfigure a of Figure 1
CreateFigure1b.R Description: R script that creates subfigure b of Figure 1
CreateFigure2a.R Description: R script that creates subfigure a of Figure 2
CreateFigure2b.R Description: R script that creates subfigure b of Figure 2
CreateFigure2c.R Description: R script that creates subfigure c of Figure 2
CreateFigure3.R Description: R script that creates the subfigures of Figure 3
CreateFigure4.R Description: R script that creates the subfigures of Figure 4
CreateFigure5_TableS6.R Description: R script that creates the subfigures of Figure 5 and Table S6
CreateFigureS1.R Description: R script that creates Figure S1
CreateFigureS2.R Description: R script that creates Figure S2
CreateTableS2_S3_S7.R Description: R script that creates Tables S2, S3, and S7
CreateTableS4_S5.R Description: R script that creates Tables S4 and S5
CreateTableS8.R Description: R script that creates Table S8
CreateTableS9.R Description: R script that creates Table S9
Introduction
The ZIP file contains all data and code to replicate the analyses reported in the following paper.
Reber, U., Fischer, M., Ingold, K., Kienast, F., Hersperger, A. M., Grütter, R., & Benz, R. (2022). Integrating biodiversity: A longitudinal and cross-sectoral analysis of Swiss politics. Policy Sciences. https://doi.org/10.1007/s11077-022-09456-4
If you use any of the material included in this repository, please refer to the paper. If you use (parts of) the text corpus, please also refer to the sources used for its compilation listed below. The content of the texts may not be changed.
Data folder
The data folder contains the following files.
The corpus and the dictionary were compiled by the authors specifically for this project. The labels/codes for policy sectors are based on the coding scheme of the Swiss Parliament.
Text corpus
The text corpus consists of 439,984 Swiss policy documents in German, French, and Italian from 1999 to 2018. The corpus was compiled from the following source between 2020-10-01 and 2021-01-31.
The corpus is stored in a single data frame to use with R saved as PARQUET file (corpus.parquet). The data frame has the following structure.
The following list contains the coding scheme for the doc_type variable.
503: Court decisions // Federal Administrative Court
Code folder
The code folder contains all R code for the analyses. The files are numbered chronologically.
The code/functions folder contains custom functions used in the scripts, e.g. to support topic model interpretation.
Package versions and setup details are noted in the code files.
Contact
Please direct any questions to Ueli Reber (ueli.reber@eawag.ch).
This dataset contains replication files for "The Opportunity Atlas: Mapping the Childhood Roots of Social Mobility" by Raj Chetty, John Friedman, Nathaniel Hendren, Maggie R. Jones, and Sonya R. Porter. For more information, see https://opportunityinsights.org/paper/the-opportunity-atlas/. A summary of the related publication follows. We construct a publicly available atlas of children’s outcomes in adulthood by Census tract using anonymized longitudinal data covering nearly the entire U.S. population. For each tract, we estimate children’s earnings distributions, incarceration rates, and other outcomes in adulthood by parental income, race, and gender. These estimates allow us to trace the roots of outcomes such as poverty and incarceration back to the neighborhoods in which children grew up. We find that children’s outcomes vary sharply across nearby tracts: for children of parents at the 25th percentile of the income distribution, the standard deviation of mean household income at age 35 is $5,000 across tracts within counties. We illustrate how these tract-level data can provide insight into how neighborhoods shape the development of human capital and support local economic policy using two applications. First, we show that the estimates permit precise targeting of policies to improve economic opportunity by uncovering specific neighborhoods where certain subgroups of children grow up to have poor outcomes. Neighborhoods matter at a very granular level: conditional on characteristics such as poverty rates in a child’s own Census tract, characteristics of tracts that are one mile away have little predictive power for a child’s outcomes. Our historical estimates are informative predictors of outcomes even for children growing up today because neighborhood conditions are relatively stable over time. Second, we show that the observational estimates are highly predictive of neighborhoods’ causal effects, based on a comparison to data from the Moving to Opportunity experiment and a quasi-experimental research design analyzing movers’ outcomes. We then identify high-opportunity neighborhoods that are affordable to low-income families, providing an input into the design of affordable housing policies. Our measures of children’s long-term outcomes are only weakly correlated with traditional proxies for local economic success such as rates of job growth, showing that the conditions that create greater upward mobility are not necessarily the same as those that lead to productive labor markets. Click here to view the Opportunity Atlas Any opinions and conclusions expressed herein are those of the authors and do not necessarily reflect the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. The statistical summaries reported in this paper have been cleared by the Census Bureau’s Disclosure Review Board release authorization number CBDRB-FY18-319.
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Replication pack, FSE2018 submission #164: ------------------------------------------
**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem **Note:** link to data artifacts is already included in the paper. Link to the code will be included in the Camera Ready version as well. Content description =================== - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files described below - **settings.py** - settings template for the code archive. - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset. This dataset only includes stats aggregated by the ecosystem (PyPI) - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages themselves, which take around 2TB. - **build_model.r, helpers.r** - R files to process the survival data (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, `common.cache/survival_data.pypi_2008_2017-12_6.csv` in **dataset_full_Jan_2018.tgz**) - **Interview protocol.pdf** - approximate protocol used for semistructured interviews. - LICENSE - text of GPL v3, under which this dataset is published - INSTALL.md - replication guide (~2 pages)
Replication guide ================= Step 0 - prerequisites ---------------------- - Unix-compatible OS (Linux or OS X) - Python interpreter (2.7 was used; Python 3 compatibility is highly likely) - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible) Depending on detalization level (see Step 2 for more details): - up to 2Tb of disk space (see Step 2 detalization levels) - at least 16Gb of RAM (64 preferable) - few hours to few month of processing time Step 1 - software ---------------- - unpack **ghd-0.1.0.zip**, or clone from gitlab: git clone https://gitlab.com/user2589/ghd.git git checkout 0.1.0 `cd` into the extracted folder. All commands below assume it as a current directory. - copy `settings.py` into the extracted folder. Edit the file: * set `DATASET_PATH` to some newly created folder path * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` - install docker. For Ubuntu Linux, the command is `sudo apt-get install docker-compose` - install libarchive and headers: `sudo apt-get install libarchive-dev` - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools` Without this dependency, you might get an error on the next step, but it's safe to ignore. - install Python libraries: `pip install --user -r requirements.txt` . - disable all APIs except GitHub (Bitbucket and Gitlab support were not yet implemented when this study was in progress): edit `scraper/init.py`, comment out everything except GitHub support in `PROVIDERS`. Step 2 - obtaining the dataset ----------------------------- The ultimate goal of this step is to get output of the Python function `common.utils.survival_data()` and save it into a CSV file: # copy and paste into a Python console from common import utils survival_data = utils.survival_data('pypi', '2008', smoothing=6) survival_data.to_csv('survival_data.csv') Since full replication will take several months, here are some ways to speedup the process: ####Option 2.a, difficulty level: easiest Just use the precomputed data. Step 1 is not necessary under this scenario. - extract **dataset_minimal_Jan_2018.zip** - get `survival_data.csv`, go to the next step ####Option 2.b, difficulty level: easy Use precomputed longitudinal feature values to build the final table. The whole process will take 15..30 minutes. - create a folder `
Three cohorts of Pacific oyster (Crassostrea gigas) larvae at Whiskey Creek Shellfish Hatchery (WCH) in Netarts Bay, Oregon, were monitored for stable isotope incorporation and biochemical composition: one in May 2011 and two in August 2011. Along with measures of growth and calcification, we present measurements of stable isotopes of carbon in water, algal food, and the shell and tissue, and nitrogen in food and tissue across larval development and growth. These relatively unique measures through larval ontogeny allow us to document isotopic shifts associated with initiation and rate of feeding, and the catabolism of C-rich (lipid) and N-rich (protein) pools. Similar ontological patterns in growth and bulk composition among the cohorts reinforce prior results, suggesting that the creation of the initial shell is energetically expensive, that the major carbon source is ambient dissolved inorganic carbon, and that the major energetic source during this period is maternally derived egg lipids. The May cohort did not isotopically reflect its food source as rapidly as the August cohorts, indicating slower feeding and/or higher catabolism versus anabolism. Our measurements also document differences in bulk turnover of organic carbon and nitrogen pools within the larvae, showing far greater conservation of nitrogen than carbon. These stable isotope and bulk biochemical measurements appear to be more sensitive indicators of sub-lethal environmental stress than the commonly used metrics of development and growth. In order to allow full comparability with other ocean acidification data sets, the R package seacarb (Gattuso et al, 2016) was used to compute a complete and consistent set of carbonate system variables, as described by Nisumaa et al. (2010). In this dataset the original values were archived in addition with the recalculated parameters (see related PI). The date of carbonate chemistry calculation is 2017-03-07.
No special programs of software are required to open the data files. The R scripts that were used to analyze the data can be found on GitHub (https://github.com/sjbd1/5xfAD_mBio2022). PRJNA902000 is the bioaccession number for the raw sequence reads on the SRA: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA902000/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and R script to reproduce the analyses of the manuscript:
Vinh-Hung V, Gorobets O, Adriaenssens N, Van Parijs H, Storme G, Verellen D, Nguyen NP, Magne N, De Ridder M.
Lung-heart outcomes and mortality through the 2020 COVID-19 pandemic in a prospective cohort of breast cancer radiotherapy patients.
Cancers 2022; 14(24):6241. https:// doi.org/10.3390/cancers14246241
https://www.mdpi.com/2072-6694/14/24/6241
PubMed: PMID: 36551726
PMCID: PMC9777311
Info on the variables in file "aelq6_public.R"
reproduced in "aelq_2_3_readme.txt":
"aelq2_base2.txt" = baseline characteristics.
"aelq3.txt" = longitudinal maesurements.
Variables in "aelq2_base2.txt":
"aelq2_base2.txt" = baseline characteristics.
# Age at randomization, years.
# RTdose: cf TomoBreast papers.
# 51 Gy = hypofractionated, simultaneous integrated boost
# 42 Gy = hypofractionated, no boost, mastectomy cases only
# 50 Gy = conventional, no boost, mastectomy cases only
# 66 Gy = conventional, sequential boost
# Weight kg, Height cm,
# Detection 1=found by screening (senology follow-up/controle)
# 2=found by symptoms (pain, palpable)
# 9=unknown
# Smoker 0= Not smoker
# 1= Smoker
# 2=ex-smoker
# Mastectomy (and other binary coded) 1= yes
# chemosched 0=none
# 1= planned after RT (sequential)
# 2= prior to RT and is finished (sequential)
# 3= chemo is on-going or is planned to start with RT (concomitant)
# hormonetherapy 0=no
# 1=tamoxifen (nolvadex)
# 2=Femara (Letrozole)
# 3=zoladex
# 4=tamoxifen + zoladex
# Laterality 1,=Right, 2=Left, 3=Bilateral
# LengthFU: length of follow-up, days from randomization
"aelq3.txt" = longitudinal maesurements.
# "Nr" = Case ID
# "Time" in days from origin (origin =date of randomization),
# if negative =before randomization
# "KPS" "Weight"
# "Died" "LocalRec" "Metast" "NewPrim" = binary code, 0=no, 1=yes
# "fAEBreast" "fAEHeart" "fAELung" "fAEOther"
# fAE = freedom from breast, heart, lung, other adverse event score
# "LVEF2" = ejection fraction, %
# "MacIver" = estimated cardiac strain
# the following are pulmonary function tests, untransformed units
# "FVC", "FEV1", "PEF", "VC", "TLC", "RV", "FRC", "Raw", "sRaw", "DLCO",
# "VA", "PF"
# "fDY", "fFA", "fPA" = freedom from dyspnea, from fatigue, from pain
# range 0 to 100 (best)
# see papers:
# Van Parijs, H.; Vinh-Hung, V.; Fontaine, C.; Storme, G.; Verschraegen, C.;
# Nguyen, D.M.; Adriaenssens, N.; Nguyen, N.P.; Gorobets, O.; De Ridder, M.
# Cardiopulmonary-related patient-reported outcomes in a randomized clinical
# trial of radiation therapy for breast cancer. BMC Cancer 2021, 21, 1177,
# doi:10.1186/s12885-021-08916-z.
# preprint:
# Van Parijs, H.; Cecilia-Joseph, E.; Gorobets, O.; Storme, G.;
# Adriaenssens, N.; Heyndrickx, B.; Verschraegen, C.; Nguyen, N.P.;
# De Ridder, M.; Vinh-Hung, V. Lung-heart toxicity in a randomized
# clinical trial of hypofractionated image guided radiation therapy for
# breast cancer. Preprints 2022, 202212, 0214.
# https://doi.org/10.20944/preprints202212.0214.v1
#
# "Year" = year of the observation
# example: randomized 1/1/2011, measurement done 1/31/2011, time = 30 days,
# Year =2011
#
Evidence of social disengagement, network narrowing, and social selectivity with advancing age in several non-human animals challenges our understanding of the causes of social ageing. Natural animal populations are needed to test whether social ageing and selectivity occur under natural predation and extrinsic mortality pressures, and longitudinal studies are particularly valuable to disentangle the contribution of within-individual ageing from the demographic processes that shape social ageing at the population level. Data on wild Assamese macaques (Macaca assamensis) were collected between 2013 and 2020 at the Phu Khieo Wildlife Sanctuary, Thailand. We investigated the social behaviour of 61 adult females observed for 13,270 hours to test several mechanistic hypotheses of social ageing and evaluated the consistency between patterns from mixed-longitudinal and within-individual analyses. With advancing age, females reduced the size of their social network, which could not be explained..., The data presented were collected as part of an ongoing long-term field research project at the Phu Khieo Wildlife Sanctuary. Macaques' behaviour is collected through standard animal focal follow protocols. Data are subsequently archived, and the behaviour relevant to the current study extracted, and aggregated or directly indexed to the respective time window following the definition detailed in the manuscript associated with this dataset., , # Social network shrinking is explained by active and passive effects but not increasing selectivity with age in wild macaques
Data is provided as separate .csv files, and as single-object .RData workspaces because the latter ensure robust formatting across operating systems and operating systems' regional settings. RData are loaded at the start of the respective R scripts which include the code to analyze the behavior specified.
approach_behaviour_data.RData/csv: Female metadata and count of approach given and received. Variables:
focal_animal: identification of the focal female for whom the count of approaches is calculated.
season_year: time window considered for analysing the behaviour. Combination of the season (mating / non-mating) and year.
group: social group identity as a categorical variable (STU, MST, SST, OTH, MOT).
season: mating (Oct–Mar) and non-mating (i.e., period of birth, Apr–Sep)....
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Correlations (above diagonal), standard deviations (diagonal) and covariances (below diagonal) of grip strength across waves for males.