These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
This dataset contains the prioritization provided by a panel of 15 experts to a set of 28 barriers categories for 8 different roles of the future energy system. A Delphi method was followed and the scores provided in the three rounds carried out are included. The dataset also contains the scripts used to assess the results and the output of this assessment. A list of the information contained in this file is: data folder: this folders includes the scores given by the 15 experts in the 3 rounds. Every round is in an individual folder. There is a file per expert that has the scores between -5 (not relevant at all) to 5 (completely relevant) per barrier (rows) and actor (columns). There is also a file with the description of the experts in terms of their position in the company, the type of company and the country. fig folder: this folder includes the figures created to assess the information provided by the experts. For each round, the following figures are created (in each respective folder): Boxplot with the distribution of scores per barriers and roles. Heatmap with the mean scores per barriers and roles. Boxplots with the comparison of the different distributions provided by the experts of each group (depending on the keywords) per barrier and role. Heatmap with the mean score per barrier weighted depeding on the importance of the role in each use case and the final prioritization. Finally, bar plots with the mean scores differences between rounds and boxplot with comparisons of the scores distributions are also provided. stat folder: this folder includes the files with the results of the different statistical assessment carried out. For each round, the following figures are created (in each respective folder): The statistics used to assess the scores (Intraclass correlation coefficient, Inter-rater agreement, Inter-rater agreement p-value, Homogeneity of Variances, Average interquartile range, Standard Deviation of interquartile ranges, Friedman test p-value Average power post hoc) per barrier and per role. The results of the post hoc of the Friedman Test per berries and per roles. The average score per barrier and per role. The mean value of the scores provided by the experts grouped by the keywords per barrier and role. P-value of the comparison of these two values. The end prioritization of the barrier for the use case (averaging the scores or fuzzy merging of the critical sets) Finally, the differences between the mean and standard deviations of the scores between two consecutive rounds are provided.
This dataset contains the prioritization provided by a panel of 15 experts to a set of 28 barriers categories for 8 different roles of the future energy system. A Delphi method was followed and the scores provided in the three rounds carried out are included. The dataset also contains the scripts used to assess the results and the output of this assessment. A list of the information contained in this file is: data folder: this folders includes the scores given by the 15 experts in the 3 rounds. Every round is in an individual folder. There is a file per expert that has the scores between -5 (not relevant at all) to 5 (completely relevant) per barrier (rows) and actor (columns). There is also a file with the description of the experts in terms of their position in the company, the type of company and the country. fig folder: this folder includes the figures created to assess the information provided by the experts. For each round, the following figures are created (in each respective folder): Boxplot with the distribution of scores per barriers and roles. Heatmap with the mean scores per barriers and roles. Boxplots with the comparison of the different distributions provided by the experts of each group (depending on the keywords) per barrier and role. Heatmap with the mean score per barrier and use case and with the prioritization per barrier and use case. Finally, bar plots with the mean scores differences between rounds and boxplot with comparisons of the scores distributions are also provided. stat folder: this folder includes the files with the results of the different statistical assessment carried out. For each round, the following figures are created (in each respective folder): The statistics used to assess the scores (Intraclass correlation coefficient, Inter-rater agreement, Inter-rater agreement p-value, Homogeneity of Variances, Average interquartile range, Standard Deviation of interquartile ranges, Friedman test p-value Average power post hoc) per barrier and per role. The results of the post hoc of the Friedman Test per berries and per roles. The average score per barrier and per role. The mean value of the scores provided by the experts grouped by the keywords per barrier and role. P-value of the comparison of these two values. The end prioritization of the barrier for the use case (averaging the scores or merging the critical sets) Finally, the differences between the mean and standard deviations of the scores between two consecutive rounds are provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.
If you find the data helpful or you use the data for your research, please cite our work:
Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).
The data record is structured into 4 comma-separated value (CSV) files, as follows:
id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:
COD_PROV is an integer field that is used to identify a province in all other data records;
SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);
DEN_PCM is the full name of the province.
OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:
p1: COD_PROV of origin,
p2: COD_PROV of destination,
day: in the format yyyy-mm-dd.
median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.
week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,
week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,
average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
day in the format yyyy-mm-dd.
ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
Our target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
This table contains a source catalog based on 90-cm (324-MHz) Very Large Array (VLA) imaging of the COSMOS field, comprising a circular area of 3.14 square degrees centered on 10h 00m 28.6s, 02o 12' 21" (J2000.0 RA and Dec). The image from the merger of 3 nights of observations using all 27 VLA antennas had an effective total integration time of ~ 12 hours, an 8.0 arcsecond x 6.0 arcsecond angular resolution, and an average rms of 0.5 mJy beam-1. The extracted catalog contains 182 sources (down to 5.5 sigma), 30 of which are multi-component sources. Using Monte Carlo artificial source simulations, the authors derive the completeness of the catalog, and show that their 90-cm source counts agree very well with those from previous studies. In their paper, the authors use X-ray, NUV-NIR and radio COSMOS data to investigate the population mix of this 90-cm radio sample, and find that the sample is dominated by active galactic nuclei. The average 90-20 cm spectral index (S_nu~ nualpha, where Snu is the flux density at frequency nu and alpha the spectral index) of the 90-cm selected sources is -0.70, with an interquartile range from -0.90 to -0.53. Only a few ultra-steep-spectrum sources are present in this sample, consistent with results in the literature for similar fields. These data do not show clear steepening of the spectral index with redshift. Nevertheless, this sample suggests that sources with spectral indices steeper than -1 all lie at z >~ 1, in agreement with the idea that ultra-steep-spectrum radio sources may trace intermediate-redshift galaxies (z >~ 1). Using both the signal and rms maps (see Figs. 1 and 2 in the reference paper) as input data, the authors ran the AIPS task SAD to obtain a catalog of candidate components above a given local signal-to-noise ratio (S/N) threshold. The task SAD was run four times with search S/N levels of 10, 8, 6 and 5, using the resulting residual image each time. They recovered all the radio components with a local S/N > 5.00. Subsequently, all the selected components were visually inspected, in order to check their reliability, especially for the components near strong side-lobes. After a careful analysis, a S/N threshold of 5.50 was adopted as the best compromise between a deep and a reliable catalog. The procedure yielded a total of 246 components with a local S/N > 5.50. More than one component, identified in the 90-cm map sometimes belongs to a single radio source (e.g. large radio galaxies consist of multiple components). Using the 90-cm COSMOS radio map, the authors combined the various components into single sources based on visual inspection. The final catalog (contained in this HEASARC table) lists 182 radio sources, 30 of which have been classified as multiple, i.e. they are better described by more than a single component. Moreover, in order to ensure a more precise classification, all sources identified as multi-component sources have been also double-checked using the 20-cm radio map. The authors found that all the 26 multiple 90-cm radio sources within the 20-cm map have 20-cm counterpart sources already classified as multiple. The authors have made use of the VLA-COSMOS Large and Deep Projects over 2 square degrees, reaching down to an rms of ~15 µJy beam1 ^ at 1.4 GHz and 1.5 arcsec resolution (Schinnerer et al. 2007, ApJS, 172, 46: the VLACOSMOS table in the HEASARC database). The 90-cm COSMOS radio catalog has, however, been extracted from a larger region of 3.14 square degrees (see Fig. 1 and Section 3.1 of the reference paper). This implies that a certain number of 90-cm sources (48) lie outside the area of the 20-cm COSMOS map used to select the radio catalog. Thus, to identify the 20-cm counterparts of the 90-cm radio sources, the authors used the joint VLA-COSMOS catalog (Schinnerer et al. 2010, ApJS, 188, 384: the VLACOSMJSC table in the HEASARC database) for the 134 sources within the 20-cm VLA-COSMOS area and the VLA- FIRST survey (White et al. 1997, ApJ, 475, 479: the FIRST table in the HEASARC database) for the remaining 48 sources. The 90-cm sources were cross-matched with the 20-cm VLA-COSMOS sources using a search radius of 2.5 arcseconds, while the cross-match with the VLA-FIRST sources has been done using a search radius of 4 arcseconds in order to take into account the larger synthesized beam of the VLA-FIRST survey of ~5 arcseconds. Finally, all the 90 cm - 20 cm associations were visually inspected in order to ensure also the association of the multiple 90-cm radio sources for which the value of the search radius used during the cross-match could be too restrictive. In summary, out of the total of 182 sources in the 90-cm catalog, 168 have counterparts at 20 cm. This table was created by the HEASARC in October 2014 based on an electronic version of Table 1 from the reference paper which was obtained from the COSMOS web site at IRSA, specifically the file vla-cosmos_327_sources_published_version.tbl at http://irsa.ipac.caltech.edu/data/COSMOS/tables/vla/. This is a service provided by NASA HEASARC .
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
a : Median absolute percent bias of σ2u was calculated for each simulation scenario, then summarized across scenarios.b : This is the number of simulation scenarios used to calculate the information.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objectives: This study aimed to ascertain utility and vision-related quality of life in patients awaiting access to specialist eye care. A secondary aim was to evaluate the association of utility indices with demographic profile and waiting time. Methods: Consecutive patients that had been waiting for ophthalmology care answered the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25). The questionnaire was administered when patients arrived at the clinics for their first visit. We derived a utility index (VFQ-UI) from the patients’ responses, then calculated the correlation between this index and waiting time and compared utility across demographic subgroups stratified by age, sex, and care setting. Results: 536 individuals participated in the study (mean age 52.9±16.6 years; 370 women, 69% women). The median utility index was 0.85 (interquartile range [IQR] 0.70–0.92; minimum 0.40, maximum 0.97). The mean VFQ-25 score was 70.88±14.59. Utility correlated weakly and nonsignificantly with waiting time (-0.05, P = 0.24). It did not vary across age groups (P = 0.85) or care settings (P = 0.77). Utility was significantly lower for women (0.84, IQR 0.70–0.92) than men (0.87, IQR 0.73–0.93, P = 0.03), but the magnitude of this difference was small (Cohen’s d = 0.13). Conclusion: Patients awaiting access to ophthalmology care had a utility index of 0.85 on a scale of 0 to 1. This measurement was not previously reported in the literature. Utility measures can provide insight into patients’ perspectives and support economic health analyses and inform health policies.
This dataset consists of near-global, analysis-ready, multi-resolution gridded vegetation structure metrics derived from NASA Global Ecosystem Dynamics Investigation (GEDI) Level 2 and 4A products associated with 25-m diameter lidar footprints. This dataset provides a comprehensive representation of near-global vegetation structure that is inclusive of the entire vertical profile, based solely on GEDI lidar, and validated with independent data. The GEDI sensor, mounted on the International Space Station (ISS), uses eight laser beams spaced by 60 m along-track and 600 m across-track on the Earth surface to measure ground elevation and vegetation structure between approximately 52 degrees North and South latitude. Between April 17th 2019 and March 16th 2023, GEDI acquired 11 and 7.7 billion quality waveforms suitable for measuring ground elevation and vegetation structure, respectively. This dataset provides GEDI shot metrics aggregated into raster grids at three spatial resolutions: 1 km, 6 km, and 12 km. In addition to many of the standard L2 and L4A shot metrics, several additional metrics have been derived which may be particularly useful for applications in carbon and water cycling processes in earth system models, as well as forest management, biodiversity modeling, and habitat assessment. Variables include canopy height, canopy cover, plant area index, foliage height diversity, and plant area volume density at 5 m strata. Eight statistics are included for each GEDI shot metric: mean, bootstrapped standard error of the mean, median, standard deviation, interquartile range, 95th percentile, Shannon's diversity index, and shot count. Quality shot filtering methodology that aligns with the GEDI L4B Gridded Aboveground Biomass Density, Version 2.1 was used. In comparison to the current GEDI L3 dataset, this dataset provides additional gridded metrics at multiple spatial resolutions and over several temporal periods (annual and the full mission duration). Files are provided in cloud optimized GeoTIFF format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains raw data related to article “Incidence and predictors of hepatocellular carcinoma in patients with autoimmune hepatitis"
Abstract
Background and aims: Autoimmune hepatitis (AIH) is a rare chronic liver disease of unknown aetiology; the risk of hepatocellular carcinoma (HCC) remains unclear and risk factors are not well-defined. We aimed to investigate the risk of HCC across a multicentre AIH cohort and to identify predictive factors.
Methods: We performed a retrospective, observational, multicentric study of patients included in the International Autoimmune Hepatitis Group Retrospective Registry. The assessed clinical outcomes were HCC development, liver transplantation, and death. Fine and Gray regression analysis stratified by centre was applied to determine the effects of individual covariates; the cumulative incidence of HCC was estimated using the competing risk method with death as a competing risk.
Results: A total of 1,428 patients diagnosed with AIH from 1980 to 2020 from 22 eligible centres across Europe and Canada were included, with a median follow-up of 11.1 years (interquartile range 5.2-15.9). Two hundred and ninety-three (20.5%) patients had cirrhosis at diagnosis. During follow-up, 24 patients developed HCC (1.7%), an incidence rate of 1.44 cases/1,000 patient-years; the cumulative incidence of HCC increased over time (0.6% at 5 years, 0.9% at 10 years, 2.7% at 20 years, and 6.6% at 30 years of follow-up). Patients who developed cirrhosis during follow-up had a significantly higher incidence of HCC. The cumulative incidence of HCC was 2.6%, 4.6%, 5.6% and 6.6% at 5, 10, 15, and 20 years after the development of cirrhosis, respectively. Obesity (hazard ratio [HR] 2.94, p = 0.04), cirrhosis (HR 3.17, p = 0.01), and AIH/PSC variant syndrome (HR 5.18, p = 0.007) at baseline were independent risk factors for HCC development.
Conclusions: HCC incidence in AIH is low even after cirrhosis development and is associated with risk factors including obesity, cirrhosis, and AIH/PSC variant syndrome.
Impact and implications: The risk of developing hepatocellular carcinoma (HCC) in individuals with autoimmune hepatitis (AIH) seems to be lower than for other aetiologies of chronic liver disease. Yet, solid data for this specific patient group remain elusive, given that most of the existing evidence comes from small, single-centre studies. In our study, we found that HCC incidence in patients with AIH is low even after the onset of cirrhosis. Additionally, factors such as advanced age, obesity, cirrhosis, alcohol consumption, and the presence of the AIH/PSC variant syndrome at the time of AIH diagnosis are linked to a higher risk of HCC. Based on these findings, there seems to be merit in adopting a specialized HCC monitoring programme for patients with AIH based on their individual risk factors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tracks the daily sea ice extent for the Arctic Circle and Antarctica using the NSIDC's Sea Ice Index dataset, as well as pre-calculating several useful measures: historical inter-quartile range across the year, the previous lowest year and the previous year.
Background: In advanced heart failure (HF), levosimendan increases peak oxygen uptake (VO2). We investigated whether peak VO2 increase is linked to cardiovascular, respiratory, or muscular performance changes. Methods and results: Twenty patients hospitalized for advanced HF underwent, before and shortly after levosimendan infusion, 2 different cardiopulmonary exercise tests: (a) a personalized ramp protocol with repeated arterial blood gas analysis and standard spirometry including alveolar-capillary gas diffusion measurements at rest and at peak exercise, and (b) a step incremental workload cardiopulmonary exercise testing with continuous near-infrared spectroscopy analysis and cardiac output assessment by bioelectrical impedance analysis.Levosimendan significantly decreased natriuretic peptides, improved peak VO2 (11.3 [interquartile range 10.1-12.8] to 12.6 [10.2-14.4] mL/kg/min, P < .01) and decreased minute ventilation to carbon dioxide production relationship slope (47.7 �� 10.7 to 43.4 �� 8.1, P < .01). In parallel, spirometry showed only a minor increase in forced expiratory volume, whereas the peak exercise dead space ventilation was unchanged. However, during exercise, a smaller edema formation was observed after levosimendan infusion, as inferable from the changes in diffusion components, that is, the membrane diffusion and capillary volume. The end-tidal pressure of CO2 during the isocapnic buffering period increased after levosimendan (from 28 �� 3 mm Hg to 31 �� 2 mm Hg, P < .01). During exercise, cardiac output increased in parallel with VO2. After levosimendan, the total and oxygenated tissue hemoglobin, but not deoxygenated hemoglobin, increased in all exercise phases. Conclusions: In advanced HF, levosimendan increases peak VO2, decreases the formation of exercise-induced lung edema, increases ventilation efficiency owing to a decrease of reflex hyperventilation, and increases cardiac output and muscular oxygen delivery and extraction.
Tracks the daily sea ice extent for the Arctic Circle and Antarctica using the NSIDC's Sea Ice Index dataset, as well as pre-calculating several useful measures: historical inter-quartile range across the year, the previous lowest year and the previous year.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Large mammalian herbivores (megafauna) have experienced extinctions and declines since prehistory. Introduced megafauna have partly counteracted these losses yet are thought to have unusually negative effects compared to native megafauna. Using a meta-analysis of 3,995 plot-scale plant abundance and diversity responses from 221 studies, we found no evidence that megafauna impacts were shaped by nativeness, ‘invasiveness’, ‘feralness’, coevolutionary history, or functional and phylogenetic novelty. Nor was there evidence that introduced megafauna facilitate introduced plants more than native megafauna. Instead, we found strong evidence that functional traits shaped megafauna impacts, with larger-bodied and bulk-feeding megafauna promoting plant diversity. Our work suggests that trait-based ecology provides better insight into interactions between megafauna and plants than concepts of nativeness. Methods Literature screening and digitization This meta-analysis was part of a larger effort to understand megafauna impacts on multiple facets of ecosystems (e.g. including soil nutrients, invertebrates, etc). This ensured that the dataset included plant responses that were also measured in studies focused on other response variables (e.g., spider diversity). We searched Web of Science with a string of search terms that included the common names and Latin genera of all terrestrial mammalian megafauna species (common names from HerbiTraits v1.2 (Lundgren et al. 2021)) separated with an ‘OR’ operand, along with the following search terms: “disturb*, graz*, brows*, impact*, effect, affect, disrupt, facilitate, invasi*, ecosystem*, vegetat*, plant*, fauna*, reptil*, amphib*, bird*, rodent*, fish*, invertebrat*, insect*, soil*, carbon, climate, albedo, river*, riparian, desert*, forest*, tundra, decomposition, grassland*, savanna*, chaparral, scrub, shrub, diversity, heterogeneity, extinction, richness, environment, reptile*, ecolog*, hydrolog*, disturbance, density, biodiversity, response*, ecosystem, herbaceous, canopy, germination, cover, pollinator*, tree, nutrient*, understorey, erosion, grass*, vegetation, community, exclosure, competition, effect*, abundance, productivity”. To reduce unrelated results we also included a Web of Science category filter (“WC”) of “ECOLOGY OR ZOOLOGY OR ENVIRONMENTAL SCIENCES OR BIODIVERSITY CONSERVATION OR EVOLUTIONARY BIOLOGY OR GEOGRAPHY PHYSICAL OR REMOTE SENSING OR PLANT SCIENCES OR MULTIDISCIPLINARY SCIENCE OR FORESTRY OR ENTOMOLOGY OR MARINE & FRESHWATER BIOLOGY OR MYCOLOGY OR BIOLOGY OR OCEANOGRAPHY OR ORNITHOLOGY OR BEHAVIORAL SCIENCES OR FISHERIES”. The Web of Science review was concluded on the 18th of February 2021 and returned 60,537 studies. We removed duplicate studies using the fuzzy matching algorithm with the function ‘find_duplicates’ in the R package ‘revtools’ (version 0.4.1) (Westgate 2019). After removing duplicates, our final search returned 46,825 studies. Title screening reduced the number of studies to 2,369. We screened the full text of these studies to only include studies focused on wild megafauna (≥45 kg) and that compared areas with low versus high megafauna densities due to exclosures, policy-driven differences (hunting versus no-hunting in adjacent properties), and differences in introduction or eradication histories (adjacent islands with and without megafauna). Some studies compared areas with and without focal megafauna populations for unknown reasons (e.g., a site with and without horses with no indication of why horses might be absent (Robertson et al. 2019)), which were excluded due to low confidence in the ultimate drivers of observed differences. We excluded all before-after comparisons (e.g., a plot measured prior to exclosure construction and then afterwards) because of the high rates of change in many systems through time (via afforestation, shifts in climate, succession, etc.). Studies that excluded megafauna but also all vertebrates were excluded. Two additional studies reported data from extremely limiting resources (i.e., wetlands in deserts). These were excluded given that such scenarios should be analyzed separately, for which we did not have sufficient sample size. Studies that evaluated the effects of megafauna on transplants or agricultural crops (including plantations) were not digitized. Studies that included an appropriate comparison and reported a central tendency (mean or median), a measurement of error (standard deviation, standard error, variance, etc), and sample size were digitized (n=154). This literature list was supplemented by the literature contained in other relevant meta-analyses (Daskin and Pringle 2016, Eldridge et al. 2020) and those encountered in the bibliographies of the studies we digitized. Given the limited number of studies from oceanic islands and regarding widely distributed introduced species (feral pigs, goats) in our initial Web of Science search, we conducted focused Google Scholar searches on July 15th, 2022 with the following terms: “ungulate impacts island*”, “introduced goat impact island*”, “introduced deer impact*”, “feral camel impact*”, “wild OR feral boar OR hog OR swine impact*”, “feral cattle impact*”, “invasive ungulate hawaii OR guam OR new zealand OR pacific island OR new caledonia OR galapagos OR caribbean OR oceanic island” and a Web of Science search on the 22nd of December 2022 using the search string “herbivore* AND plant* AND response*”. This uncovered an additional 482 studies of which 66 studies were fit for inclusion, leading to a total of 221 studies in our final dataset. We digitized central tendencies (mean, median), error (standard deviation, standard error, interquartile ranges), and sample sizes for each response (diversity, richness, and abundance) in each study. We used ImageJ to extract data from figures (Schneider et al. 2012). Interquartile ranges and medians (e.g., as extracted from boxplots) were converted to means and standard deviation using the function qe.mean.sd in the package ‘estmeansd’ version 1.0.0 (McGrath et al. 2022). Means and SD/SE were reported by 213 studies (3,846 observations) while 11 studies (149 observations) reported medians and interquartile ranges. We also digitized relevant covariates from the text, which included time since treatment (e.g., exclosure construction, introduction, eradication, etc), study coordinates (latitude, longitude), megafauna density (standardized to kg per hectare), relative abundance of megafauna (in the case of multispecies megafauna communities and if density was not provided), and the scale of measurement (treated both as area, m2, and maximum measurement length, m, to allow the comparison of transects to plots). If study coordinates were not exactly provided, we extracted latitude and longitude from the approximate center of each study location in Google Maps. Maximum measurement length was calculated as either the hypotenuse of square/rectangular plots, the length of transects, or the diameter of circular plots. Distributions of megafauna traits, environmental variables (see below), and methodological variables were similar between native and introduced megafauna communities in our final dataset. We treated measurements of species richness and species diversity (e.g., Shannon Weiner index) as ‘diversity’ responses and density estimates (individual plants per plot), % cover, and biomass as measurements of abundance. Analyzing these responses alone led to similar results. We excluded seed abundance and diversity responses, given that seedbanks can be at disequilibrium from realized plant communities. We included all true plant species, excluding multicellular algae and lichen. Effect sizes Given the presence of negative values and zeros in our dataset, we calculated effect sizes using Hedges’ g, a unitless measure of standardized mean difference between groups. Each effect size was associated with sampling variance calculated from the sample size and standard deviation of each observation. Effect sizes and sampling variances were calculated with the function ‘escalc’ in the R package ‘metafor’ (version 3.5-12) (Viechtbauer 2010). Megafauna and plant nativeness Megafauna nativeness was based on study author designations or IUCN range maps (17), if unreported. While many communities had both native and introduced megafauna present, the vast majority of studies only manipulated (excluded) the introduced megafauna, which was possible because of body size differences or through eradication. Only one study manipulated both native and introduced megafauna (Ward‐Jones et al. 2019). Given that the majority of megafauna biomass in this study consisted of introduced megafauna, we classified this study as introduced. Excluding it (only relevant for abundance analyses) led to similar results. The evolutionary exposure of study sites to megafauna (i.e., oceanic islands versus continents and offshore islands) was determined using PHYLACINE v1.2 range maps- (Faurby et al. 2018). We considered New Zealand, which possessed avian megafauna, an oceanic island without coevolutionary history with mammalian megafauna (due to distinctive foraging strategies of avian versus mammalian herbivores). However, counting New Zealand as an offshore island led to similar results. The nativeness of collective plant responses was assigned as reported by the authors (1,864 observations from 104 studies). In cases where plant nativeness was unspecified (2,136 observations from 155 studies) we evaluated nativeness based on author-provided flora descriptions of the study site by referring to the Plants of the World Online (POWO n.d.) and the study site location. If introduced plants were described in the study system, we described the study as mixed (and thus excluded it) unless the introduced plants collectively constituted <5% relative abundance (cover, biomass,
Supplementary figure 1Rank abundance distributions for habitats at three taxonomic levelsSuppl_fig_1.pdfSupplementary figure 2Evenness and species richness of the four habitats at three taxonomic levels.Suppl_fig_2.pdfSupplementary figure 3Distribution of p-values from Mantel test for Spearman correlation between dissimilarity matrices representing different taxonomic and numerical levels. A-C, Correlation between taxonomic levels at different numerical resolutions. D-F, Correlation between proportional abundance data and higher levels of numerical transformation. Filled points represent median p-values across 1000 subsampling iterations, empty points are outliers that lie beyond 1.5 times the interquartile range from the upper quartile.Suppl_fig_3.pdfSupplementary figure 4NMDS ordination of a double-standardized subsample of the total dataset comparing individual habitats along the depth- and salinity gradient for species and families using proportional abundances and presence/absence ...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data supports a meta-analysis investigating ecological impacts of intense lawn management (mowing). Raw data on invertebrate abundance and temperature data was collected by Léonie Carignan-Guillemette (2018) and Caroline Turcotte (2017) under the supervision of Raphaël Proulx and Vincent Maire (refer to Appendix S1 within related publication for more information). Other data was gathered and processed according to the following: We searched the Scopus database on 8 February, 2019 with the following combinations of keywords: (lawn OR turf) AND mowing AND (urban OR city). Generally, studies were ineligible when: full-text of the article was not available even after contacting the authors; mowing was incidental to the study and not an experimental factor; response variables were not ecologically relevant; confounding factors (e.g. fertilisation) could not be isolated; a non-urban context was used; or simulated data were presented. We extracted the mean and statistical variation (standard deviation or standard error) for each response variable in control (less-intensively mown) and treatment (intensively mown) groups. Reported data were used when available. Otherwise, data were extracted from published figures using the Web Plot Digitizer tool. Where summary data on median, and interquartile range was presented, mean and standard deviation was estimated. Variables with multi-temporal data (e.g. soil moisture) were summarised using the mean and pooled standard deviation to provide an aggregated value per site per year. Where seasonal trends were evident in raw multi-temporal data (e.g. soil temperature), data was detrended using a polynomial function and analysis applied to the residuals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundRheumatic and musculoskeletal disorders (RMDs) are associated with cardiovascular diseases (CVDs), with hypertension being the most common. We aimed to determine the prevalence of high blood pressure (HBP), awareness, treatment, and blood pressure control among patients with RMDs seen in a Rheumatology clinic in Uganda.MethodsWe conducted a cross-sectional study at the Rheumatology Clinic of Mulago National Referral Hospital (MNRH), Kampala, Uganda. Socio-demographic, clinical characteristics and anthropometric data were collected. Multivariable logistic regression was performed using STATA 16 to determine factors associated with HBP in patients with RMDs.ResultsA total of 100 participants were enrolled. Of these, majority were female (84%, n = 84) with mean age of 52.1 (standard deviation: 13.8) years and median body mass index of 28 kg/m2 (interquartile range (IQR): 24.8 kg/m2–32.9 kg/m2). The prevalence of HBP was 61% (n = 61, 95% CI: 51.5–70.5), with the majority (77%, n = 47, 95% CI: 66.5–87.6) being aware they had HTN. The prevalence of HTN was 47% (n = 47, 37.2–56.8), and none had it under control. Factors independently associated with HBP were age 46-55years (adjusted prevalence ratio (aPR): 2.5, 95% confidence interval (CI): 1.06–5.95), 56–65 years (aPR: 2.6, 95% CI: 1.09–6.15), >65 years (aPR: 2.5, 95% CI: 1.02–6.00), obesity (aPR: 3.7, 95% CI: 1.79–7.52), overweight (aPR: 2.7, 95% CI: 1.29–5.77).ConclusionThere was a high burden of HBP among people with RMDs in Uganda with poor blood pressure control, associated with high BMI and increasing age. There is a need for further assessment of the RMD specific drivers of HBP and meticulous follow up of patients with RMDs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundTuberculosis (TB) remains a significant public health challenge, particularly among vulnerable populations like children. This is especially true in Sub-Saharan Africa, where the burden of TB in children is substantial. Zambia ranks 21st among the top 30 high TB endemic countries globally. While studies have explored TB in adults in Zambia, the prevalence and associated factors in children are not well documented. This study aimed to determine the prevalence and sociodemographic, and clinical factors associated with active TB disease in hospitalized children under the age of 15 years at Livingstone University Teaching Hospital (LUTH), the largest referral center in Zambia’s Southern Province.MethodsThis retrospective cross-sectional study of 700 pediatric patients under 15 years old, utilized programmatic data from the Pediatrics Department at LUTH. A systematic sampling method was used to select participants from medical records. Data on demographics, medical conditions, anthropometric measurements, and blood tests were collected. Data analysis included descriptive statistics, chi-square tests, and multivariable logistic regression to identify factors associated with TB.ResultsThe median age was 24 months (interquartile range (IQR): 11, 60) and majority were male (56.7%, n = 397/700). Most participants were from urban areas (59.9%, n = 419/700), and 9.2% (n = 62/675) were living with HIV. Malnutrition and comorbidities were present in a significant portion of the participants (19.0% and 25.1%, respectively). The prevalence of active TB cases was 9.4% (n = 66/700) among hospitalized children. Persons living with HIV (Adjusted odds ratio (AOR) of 6.30; 95% confidence interval (CI) of 2.85, 13.89, p< 0.001), and those who were malnourished (AOR: 10.38, 95% CI: 4.78, 22.55, p< 0.001) had a significantly higher likelihood of developing active TB disease.ConclusionThis study revealed a prevalence 9.4% active TB among hospitalized children under 15 years at LUTH. HIV status and malnutrition emerged as significant factors associated with active TB disease. These findings emphasize the need for pediatric TB control strategies that prioritize addressing associated factors to effectively reduce the burden of tuberculosis in Zambian children.
In the patient information sheet, outcome variables [bacterial pathogens and viral-bacterial coinfections (simultaneous occurrences)] and predictor variables (patient demographics, time frame, specimen type, type of bacterial isolate(s), and antimicrobial susceptibility patterns) were collected from the hospital records. The data were anonymized to ensure patient confidentiality. Data was entered and managed using Microsoft Excel, version 13.0, and analyzed using Statistical Package for Social Sciences (SPSS), version 17.0. Descriptive data were analyzed in terms of frequency and percentage. Quantitative data were reported as mean, median, and interquartile range (IQR). Qualitative variables were analyzed using the Chi-square test, while quantitative variables were analyzed using the independent student t-test, with statistical significance determined at a p-value of <0.05 within a 95% confidence interval (CI).
These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).