Facebook
TwitterWe include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
The dataset has been created specifically for practicing Python, NumPy, Pandas, and Matplotlib. It is designed to provide a hands-on learning experience in data manipulation, analysis, and visualization using these libraries.
Specifics of the Dataset:
The dataset consists of 5000 rows and 20 columns, representing various features with different data types and distributions. The features include numerical variables with continuous and discrete distributions, categorical variables with multiple categories, binary variables, and ordinal variables. Each feature has been generated using different probability distributions and parameters to introduce variations and simulate real-world data scenarios. The dataset is synthetic and does not represent any real-world data. It has been created solely for educational purposes.
One of the defining characteristics of this dataset is the intentional incorporation of various real-world data challenges:
Certain columns are randomly selected to be populated with NaN values, effectively simulating the common challenge of missing data. - The proportion of these missing values in each column varies randomly between 1% to 70%. - Statistical noise has been introduced in the dataset. For numerical values in some features, this noise adheres to a distribution with mean 0 and standard deviation 0.1. - Categorical noise is introduced in some features', with its categories randomly altered in about 1% of the rows. Outliers have also been embedded in the dataset, resonating with the Interquartile Range (IQR) rule
Context of the Dataset:
The dataset aims to provide a comprehensive playground for practicing Python, NumPy, Pandas, and Matplotlib. It allows learners to explore data manipulation techniques, perform statistical analysis, and create visualizations using the provided features. By working with this dataset, learners can gain hands-on experience in data cleaning, preprocessing, feature engineering, and visualization. Sources of the Dataset:
The dataset has been generated programmatically using Python's random number generation functions and probability distributions. No external sources or real-world data have been used in creating this dataset.
Facebook
TwitterThe dataset provides the median, 25th percentile, and 75th percentile of carbon monoxide (CO) concentrations in Delhi, measured in moles per square meter and vertically integrated over a 9-day mean period. This data offers insights into the distribution and variability of CO levels over time.
The data, collected from July 10, 2018, to August 10, 2024, is sourced from the Tropomi Explorer
CO is a harmful gas that can significantly impact human health. High levels of CO can lead to respiratory issues, cardiovascular problems, and even be life-threatening in extreme cases. Forecasting CO levels helps in predicting and managing air quality to protect public health.
CO is often emitted from combustion processes, such as those in vehicles and industrial activities. Forecasting CO levels can help in monitoring the impact of these sources and evaluating the effectiveness of emission control measures.**
Accurate CO forecasts can assist in urban planning and pollution control strategies, especially in densely populated areas where air quality issues are more pronounced.
Columns and Data Description: system:time_start: This column represents the date when the CO measurements were taken. p25: This likely represents the 25th percentile value of CO levels for the given date, providing insight into the lower range of the distribution. Median: The median CO level for the given date, which is the middle value of the dataset and represents a typical value. IQR: The Interquartile Range, which measures the spread of the middle 50% of the data. It’s calculated as the difference between the 75th percentile (p75) and the 25th percentile (p25) values.
Facebook
TwitterThis table contains a source catalog based on 90-cm (324-MHz) Very Large Array (VLA) imaging of the COSMOS field, comprising a circular area of 3.14 square degrees centered on 10h 00m 28.6s, 02o 12' 21" (J2000.0 RA and Dec). The image from the merger of 3 nights of observations using all 27 VLA antennas had an effective total integration time of ~ 12 hours, an 8.0 arcsecond x 6.0 arcsecond angular resolution, and an average rms of 0.5 mJy beam-1. The extracted catalog contains 182 sources (down to 5.5 sigma), 30 of which are multi-component sources. Using Monte Carlo artificial source simulations, the authors derive the completeness of the catalog, and show that their 90-cm source counts agree very well with those from previous studies. In their paper, the authors use X-ray, NUV-NIR and radio COSMOS data to investigate the population mix of this 90-cm radio sample, and find that the sample is dominated by active galactic nuclei. The average 90-20 cm spectral index (S_nu~ nualpha, where Snu is the flux density at frequency nu and alpha the spectral index) of the 90-cm selected sources is -0.70, with an interquartile range from -0.90 to -0.53. Only a few ultra-steep-spectrum sources are present in this sample, consistent with results in the literature for similar fields. These data do not show clear steepening of the spectral index with redshift. Nevertheless, this sample suggests that sources with spectral indices steeper than -1 all lie at z >~ 1, in agreement with the idea that ultra-steep-spectrum radio sources may trace intermediate-redshift galaxies (z >~ 1). Using both the signal and rms maps (see Figs. 1 and 2 in the reference paper) as input data, the authors ran the AIPS task SAD to obtain a catalog of candidate components above a given local signal-to-noise ratio (S/N) threshold. The task SAD was run four times with search S/N levels of 10, 8, 6 and 5, using the resulting residual image each time. They recovered all the radio components with a local S/N > 5.00. Subsequently, all the selected components were visually inspected, in order to check their reliability, especially for the components near strong side-lobes. After a careful analysis, a S/N threshold of 5.50 was adopted as the best compromise between a deep and a reliable catalog. The procedure yielded a total of 246 components with a local S/N > 5.50. More than one component, identified in the 90-cm map sometimes belongs to a single radio source (e.g. large radio galaxies consist of multiple components). Using the 90-cm COSMOS radio map, the authors combined the various components into single sources based on visual inspection. The final catalog (contained in this HEASARC table) lists 182 radio sources, 30 of which have been classified as multiple, i.e. they are better described by more than a single component. Moreover, in order to ensure a more precise classification, all sources identified as multi-component sources have been also double-checked using the 20-cm radio map. The authors found that all the 26 multiple 90-cm radio sources within the 20-cm map have 20-cm counterpart sources already classified as multiple. The authors have made use of the VLA-COSMOS Large and Deep Projects over 2 square degrees, reaching down to an rms of ~15 µJy beam1 ^ at 1.4 GHz and 1.5 arcsec resolution (Schinnerer et al. 2007, ApJS, 172, 46: the VLACOSMOS table in the HEASARC database). The 90-cm COSMOS radio catalog has, however, been extracted from a larger region of 3.14 square degrees (see Fig. 1 and Section 3.1 of the reference paper). This implies that a certain number of 90-cm sources (48) lie outside the area of the 20-cm COSMOS map used to select the radio catalog. Thus, to identify the 20-cm counterparts of the 90-cm radio sources, the authors used the joint VLA-COSMOS catalog (Schinnerer et al. 2010, ApJS, 188, 384: the VLACOSMJSC table in the HEASARC database) for the 134 sources within the 20-cm VLA-COSMOS area and the VLA- FIRST survey (White et al. 1997, ApJ, 475, 479: the FIRST table in the HEASARC database) for the remaining 48 sources. The 90-cm sources were cross-matched with the 20-cm VLA-COSMOS sources using a search radius of 2.5 arcseconds, while the cross-match with the VLA-FIRST sources has been done using a search radius of 4 arcseconds in order to take into account the larger synthesized beam of the VLA-FIRST survey of ~5 arcseconds. Finally, all the 90 cm - 20 cm associations were visually inspected in order to ensure also the association of the multiple 90-cm radio sources for which the value of the search radius used during the cross-match could be too restrictive. In summary, out of the total of 182 sources in the 90-cm catalog, 168 have counterparts at 20 cm. This table was created by the HEASARC in October 2014 based on an electronic version of Table 1 from the reference paper which was obtained from the COSMOS web site at IRSA, specifically the file vla-cosmos_327_sources_published_version.tbl at http://irsa.ipac.caltech.edu/data/COSMOS/tables/vla/. This is a service provided by NASA HEASARC .
Facebook
TwitterThis dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description:
The dataset of environment-independent in-baggage object identification system leveraging low-cost WiFi. The dataset contains the extracted CSI features from 14 representative in-baggage objects of 4 different materials. The experiments are conducted in 3 different office environments with different sizes. We hope this dataset will help researchers to reproduce the former work of in-baggage object identification through WiFi sensing.
Dataset Format:
.mat files
Section 1: Device Configuration:
Transmitter: Aaronia HyperLOG 7060 direction antenna with a Dell Inspiron 3910 desktop for control.
Receiver: Hawking HD9DP orthogonal antennas with a Dell Inspiron 3910 desktop for control
NIC: Atheros QCA9590. The configuration and installation guide of CSI tool can be found at https://wands.sg/research/wifi/AtherosCSI/
WiFi Packet Rate: 1000 pkts/s
Section 2: Data Format
We provide the CSI features through .mat files. The details are shown in the following:
14 different objects made of 4 different materials are included in 3 different environments and 3 different days.
Each object is tested for 60 seconds and repeated for 3 times.
The dataset file name is presented as "Object_Number". The detailed information are:
Object: The object we involved in the experiment (e.g., book, laptop)
Number: The number of repeats.
Section 3: Experimental Setups
There are 3 different office experiment setups for our data collection. The detailed setups are shown in the paper. For the objects, we involve 14 types of objects made of 4 different materials.
Environments:
3 different environments are involved, including 3 office environments with the size of 15 ft × 13 ft, 16 ft × 12 ft, 28 ft × 23 ft, respectively.
For each room environment, data is collected on different days and with different furniture settings (i.e., 2 desks and 2 chairs are moved at least 3 ft. )
Representative objects:
Data is collected using 14 representative objects of 4 different materials including fiber: book, magazine, newspaper; metal: thermal cup, laptop; cotton/polyester: cotton T-shirts (×2), cotton T-shirts (×4), hoodie, polyester T-shirts, polyester pants; water: 1L bottle with 1L water, 1L bottle with 500ml water, 500ml bottle with 500ml water.
Section 4: Data Description
For our data organization, we separate the data files into different folders based on different days and different environments. Under these folders, data are further distributed in terms of different objects and repeat times. All the files are .mat files, which can be directly read for further applications.
Features of CSI amplitude: We calculate 7 different types of statistical features, including mean, variance, median, skewness, kurtosis, interquartile range and range, and polarization feature from CSI amplitude. Particularly, we calculate the features for all 56 subcarriers with different operating frequencies and responses to the target object.
Features of CSI phase: For the features of CSI phase, the same features with CSI amplitude are extracted and stored in the dataset.
Section 6: Citations
If your work is related to our work, please cite our papers as follows.
https://ieeexplore.ieee.org/document/9637801
Shi, Cong, Tianming Zhao, Yucheng Xie, Tianfang Zhang, Yan Wang, Xiaonan Guo, and Yingying Chen. "Environment-independent in-baggage object identification using wifi signals." In 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pp. 71-79. IEEE, 2021.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.
This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.
This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:
This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.
📥 Download the dataset and explore Walmart’s stock performance today! 🚀
Facebook
TwitterOur target was to predict gender, age and emotion from audio. We found audio labeled datasets on Mozilla and RAVDESS. So by using R programming language 20 statistical features were extracted and then after adding the labels these datasets were formed. Audio files were collected from "Mozilla Common Voice" and “Ryerson AudioVisual Database of Emotional Speech and Song (RAVDESS)”.
Datasets contains 20 feature columns and 1 column for denoting the label. The 20 statistical features were extracted through the Frequency Spectrum Analysis using R programming Language. They are: 1) meanfreq - The mean frequency (in kHz) is a pitch measure, that assesses the center of the distribution of power across frequencies. 2) sd - The standard deviation of frequency is a statistical measure that describes a dataset’s dispersion relative to its mean and is calculated as the variance’s square root. 3) median - The median frequency (in kHz) is the middle number in the sorted, ascending, or descending list of numbers. 4) Q25 - The first quartile (in kHz), referred to as Q1, is the median of the lower half of the data set. This means that about 25 percent of the data set numbers are below Q1, and about 75 percent are above Q1. 5) Q75 - The third quartile (in kHz), referred to as Q3, is the central point between the median and the highest distributions. 6) IQR - The interquartile range (in kHz) is a measure of statistical dispersion, equal to the difference between 75th and 25th percentiles or between upper and lower quartiles. 7) skew - The skewness is the degree of distortion from the normal distribution. It measures the lack of symmetry in the data distribution. 8) kurt - The kurtosis is a statistical measure that determines how much the tails of distribution vary from the tails of a normal distribution. It is actually the measure of outliers present in the data distribution. 9) sp.ent - The spectral entropy is a measure of signal irregularity that sums up the normalized signal’s spectral power. 10) sfm - The spectral flatness or tonality coefficient, also known as Wiener entropy, is a measure used for digital signal processing to characterize an audio spectrum. Spectral flatness is usually measured in decibels, which, instead of being noise-like, offers a way to calculate how tone-like a sound is. 11) mode - The mode frequency is the most frequently observed value in a data set. 12) centroid - The spectral centroid is a metric used to describe a spectrum in digital signal processing. It means where the spectrum’s center of mass is centered. 13) meanfun - The meanfun is the average of the fundamental frequency measured across the acoustic signal. 14) minfun - The minfun is the minimum fundamental frequency measured across the acoustic signal 15) maxfun - The maxfun is the maximum fundamental frequency measured across the acoustic signal. 16) meandom - The meandom is the average of dominant frequency measured across the acoustic signal. 17) mindom - The mindom is the minimum of dominant frequency measured across the acoustic signal. 18) maxdom - The maxdom is the maximum of dominant frequency measured across the acoustic signal 19) dfrange - The dfrange is the range of dominant frequency measured across the acoustic signal. 20) modindx - the modindx is the modulation index, which calculates the degree of frequency modulation expressed numerically as the ratio of the frequency deviation to the frequency of the modulating signal for a pure tone modulation.
Gender and Age Audio Data Souce: Link: https://commonvoice.mozilla.org/en Emotion Audio Data Souce: Link : https://smartlaboratory.org/ravdess/
Facebook
TwitterThis 5 km resolution grid indicates what borehole yields (in l/s) can reasonably be expected in different hydrogeological units. The ranges indicate the approximate interquartile range of the yield of boreholes that have been sited and drilled using appropriate techniques. Groundwater productivity is given in liters per second.Detailed description of the methodology, and a full list of data sources used to develop the layer can be found in the peer-reviewed paper available here: http://iopscience.iop.org/article/10.1088/1748-9326/7/2/024009/pdf The raster and a high resolution PDF file are available for download on the website of British Geological Survey (BGS): http://www.bgs.ac.uk/research/groundwater/international/africanGroundwater/mapsDownload.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Incidence rate ratios (IRR) and 95% confidence intervals (CIs), were estimated using quasi-Poisson regression and the robust sandwich estimator for variance [42,43] and correspond to an increase in exposure equivalent to the exposures interquartile range within the dataset (0.67 mm precipitation (P); 1.19 mm soil moisture (θ)). Reference values for the each of the exposure variables are presented in Table 2. Bolded values correspond to associations that are statistically significant at the 95% confidence level. Each row corresponds to one model fit. Information supporting variable selection can be found in S1 Text. Results for regressions including other hydroclimatic predictors are presented in S1 Table.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This repository contains various resources related to the study on post-stroke recovery in a mouse model, focusing on the application of the Proportional Recovery Rule (PRR).
code/: Contains all the code used for the analysis in this study. Detailed information is available in the README within the code folder.input/: This folder contains all datasets used in the publication.output/: This directory includes the final results generated for each dataset. Detailed information for each dataset's output can be found in their respective subfolders.docs/: Additional documentation related to this project, including extra resources in the form of a README file within this folder.The Fugl-Meyer upper extremity score is a widely used assessment tool in clinical settings to evaluate motor function in stroke patients. With a maximum score of 66, higher values indicate better motor performance, while lower values signify greater deficits.
The Proportional Recovery Rule (PRR) suggests that the magnitude of recovery from nonsevere upper limb motor impairment after stroke is approximately 0.7 times the initial impairment. This rule, proposed in 2008, has been applied to various motor and nonmotor impairments, leading to inconsistencies in its formulation and application across studies.
In this study, we translated the Fugl-Meyer upper extremity score into a deficit score suitable for use in a mouse model. The PRR posits that the change in impairment can be predicted as 0.7 times the initial impairment, plus an error term. We adapted this rule by fitting a linear regression model without an intercept to relate the initial impairment to the change in impairment.
Initial Impairment Calculation:
Change Observed and Predicted:
Cluster Analysis:
Outlier Removal:
Cluster Characteristics:
Statistical Analysis:
This structured dataset was created with reference to the following publication:
DOI:10.1038/s41597-023-02242-8
If you have any questions or require further assistance, please do not hesitate to reach out to us. Contact us via email at markus.aswendtATuk-koeln.de or aref.kalantari-sarcheshmehATuk-koeln.de.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A live version of the data record, which will be kept up-to-date with new estimates, can be downloaded from the Humanitarian Data Exchange: https://data.humdata.org/dataset/covid-19-mobility-italy.
If you find the data helpful or you use the data for your research, please cite our work:
Pepe, E., Bajardi, P., Gauvin, L., Privitera, F., Lake, B., Cattuto, C., & Tizzoni, M. (2020). COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Scientific Data 7, 230 (2020).
The data record is structured into 4 comma-separated value (CSV) files, as follows:
id_provinces_IT.csv. Table of the administrative codes of the 107 Italian provinces. The fields of the table are:
COD_PROV is an integer field that is used to identify a province in all other data records;
SIGLA is a two-letters code that identifies the province according to the ISO_3166-2 standard (https://en.wikipedia.org/wiki/ISO_3166-2:IT);
DEN_PCM is the full name of the province.
OD_Matrix_daily_flows_norm_full_2020_01_18_2020_04_17.csv. The file contains the daily fraction of users’ moving between Italian provinces. Each line corresponds to an entry of matrix (i, j). The fields of the table are:
p1: COD_PROV of origin,
p2: COD_PROV of destination,
day: in the format yyyy-mm-dd.
median_q1_q3_rog_2020_01_18_2020_04_17.csv. The file contains median and interquartile range (IQR) of users’ radius of gyration in a province by week. Each entry of the table fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
week: median value of the radius of gyration on week week, with week in the format dd/mm-DD/MM where dd/mm and DD/MM are the first and the last day of the week, respectively.
week Q1 first quartile (Q1) of the distribution of the radius of gyration on week week,
week Q3 third quartile (Q3) of the distribution of the radius of gyration on week week,
average_network_degree_2020_01_18_2020_04_17.csv. The file contains daily time-series of the average degree 〈k〉 of the proximity network. Each entry of the table is a value of 〈k〉 on a given day. The fields of the table are:
COD_PROV of the province;
SIGLA of the province;
DEN_PCM of the province;
day in the format yyyy-mm-dd.
ESRI shapefiles of the Italian provinces updated to the most recent definition are available from the website of the Italian National Office of Statistics (ISTAT): https://www.istat.it/it/archivio/222527.
Facebook
TwitterThis product was developed as part of the project supported by the grant from and the National Oceanic and Atmospheric Administration’s Ocean Acidification Program under award NA18OAR0170430 to the Virginia Institute of Marine Science. The data product consists of water quality data for tidal 98 stations for 1984–2018. The source data used to generate this product were downloaded from the Chesapeake Bay Program’s (CBP) data hub. Out of the total of 255 monitoring stations in the Tidal Monitoring Program, we selected 98 with the long monitoring record (30 years or longer). The following variables were downloaded from the data hub at the native temporal and vertical resolution (between one and four cruises per month and approximately 10 depth levels sampled between 0 and 37 m) for 1984–2018: water temperature (T), salinity (S), pH, total alkalinity (TA), dissolved oxygen (DO) , and chlorophyll (Chl). All pH data prior to 1998 were removed because of the data quality concerns (Herrmann et al., 2020). Briefly, we found a dramatic difference in long-term trends between stations measured by institutions in the state of Virginia and stations measured by the state of Maryland, particularly from late spring to early fall. The boundary between the station groups runs east–west within the mesohaline portion of the bay, where the Potomac River estuary intersects the mainstem bay. The boundary separates strong negative linear trends to the south (Virginia stations) from neutral and weakly positive linear trends to the north (Maryland stations). For all variables, data entries marked with CBP’s “Problem†and “Qualifier†flags were removed. Additionally, all variables were scanned for extreme outliers: for each variable, data from all stations, depths, and times were combined into a single composite sample for which the 75th and 25th percentiles (i.e., the upper and lower quantiles) and the interquartile range (the difference between the upper and lower quantiles) were calculated. Extreme outliers were defined as the values falling outside of a certain number (censoring criterion) of interquartile ranges from the upper and lower quantiles.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of variables (Occ. = Occurrences, Medn. = Median, IQR = Interquartile Range).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Guardian’s response to postprocedural questionnaires.
Facebook
TwitterObjectives: This study aimed to ascertain utility and vision-related quality of life in patients awaiting access to specialist eye care. A secondary aim was to evaluate the association of utility indices with demographic profile and waiting time. Methods: Consecutive patients that had been waiting for ophthalmology care answered the 25-item National Eye Institute Visual Function Questionnaire (NEI VFQ-25). The questionnaire was administered when patients arrived at the clinics for their first visit. We derived a utility index (VFQ-UI) from the patients’ responses, then calculated the correlation between this index and waiting time and compared utility across demographic subgroups stratified by age, sex, and care setting. Results: 536 individuals participated in the study (mean age 52.9±16.6 years; 370 women, 69% women). The median utility index was 0.85 (interquartile range [IQR] 0.70–0.92; minimum 0.40, maximum 0.97). The mean VFQ-25 score was 70.88±14.59. Utility correlated weakly and nonsignificantly with waiting time (-0.05, P = 0.24). It did not vary across age groups (P = 0.85) or care settings (P = 0.77). Utility was significantly lower for women (0.84, IQR 0.70–0.92) than men (0.87, IQR 0.73–0.93, P = 0.03), but the magnitude of this difference was small (Cohen’s d = 0.13). Conclusion: Patients awaiting access to ophthalmology care had a utility index of 0.85 on a scale of 0 to 1. This measurement was not previously reported in the literature. Utility measures can provide insight into patients’ perspectives and support economic health analyses and inform health policies.
Facebook
TwitterThis record contains raw data related to article "Safety of metformin continuation in diabetic patients undergoing invasive coronary angiography: the NO-STOP single arm trial" Abstract Background: Despite paucity of data, it is common practice to discontinue metformin before invasive coronary angiography due to an alleged risk of Metformin-Associated Lactic Acidosis (M-ALA). We aimed at assessing the safety of metformin continuation in diabetic patients undergoing coronary angiography in terms of significant increase in lactate levels. Methods: In this open-label, prospective, multicentre, single-arm trial, all diabetic patients undergoing coronary angiography with or without percutaneous coronary intervention at 3 European centers were screened for enrolment. The primary endpoint was the increase in lactate levels from preprocedural levels at 72-h after the procedure. Secondary endpoints included contrast associated-acute kidney injury (CA-AKI), M-ALA, and all-cause mortality. Results: 142 diabetic patients on metformin therapy were included. Median preprocedural lactate level was 1.8 mmol/l [interquartile range (IQR) 1.3-2.3]. Lactate levels at 72 h after coronary angiography were 1.7 mmol/l (IQR 1.3-2.3), with no significant differences as compared to preprocedural levels (p = 0.91; median difference = 0; IQR - 0.5 to 0.4 mmol/l). One patient had 72-h levels ≥ 5 mmol/l (5.3 mmol/l), but no cases of M-ALA were reported. CA-AKI occurred in 9 patients (6.1%) and median serum creatinine and estimated glomerular filtration rate remained similar throughout the periprocedural period. At a median follow-up of 90 days (43-150), no patients required hemodialysis and 2 patients died due to non-cardiac causes. Conclusions: In diabetic patients undergoing invasive coronary angiography, metformin continuation throughout the periprocedural period does not increase lactate levels and was not associated with any decline in renal function.
Facebook
TwitterABSTRACT BACKGROUND: Metatarsalgia can be considered to be a common complaint in clinical practice. The aim of this study was to compare quality of life (QoL) between participants with different metatarsalgia types and matched-paired healthy controls. DESIGN AND SETTING: A cross-sectional analysis on a sample of 124 participants of median age ± interquartile range of 55 ± 22 years was carried out in the University Clinic of Podiatric Medicine and Surgery, Ferrol, Spain. They presented primary (n = 31), secondary (n = 31) or iatrogenic (n = 31) metatarsalgia, or were matched-paired healthy controls (n = 31). METHODS: Self-reported domain scores were obtained using the Foot Health Status Questionnaire (FHSQ) and were compared between the participants with metatarsalgia and between these and the healthy controls. RESULTS: Statistically significant differences were shown in all FHSQ domains (P ≤ 0.001). Post-hoc analyses showed statistically significant differences (P < 0.05) between the metatarsalgia types in relation to the matched healthy control group, such that the participants with metatarsalgia presented impaired foot-specific and general health-related QoL (lower FHSQ scores). CONCLUSION: This study demonstrated that presence of metatarsalgia had a negative impact on foot health-related QoL. Foot-specific health and general health were poorer among patients with metatarsalgia, especially among those with secondary and iatrogenic metatarsalgia, in comparison with matched healthy controls.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundFlurbiprofen, as a widely used nonsteroidal anti-inflammatory drug (NSAID), is commonly employed to relieve mild to moderate pain and inflammation. Understanding its adverse reactions in real-world usage is of significant importance.MethodsReports of all adverse drug events (ADEs) related to flurbiprofen were extracted from the FAERS database, covering the period from Q1 2004 to Q3 2024. These reports were standardized and analyzed using various signal quantification techniques, including Reporting Odds Ratios (ROR), Proportional Reporting Ratios (PRR), Bayesian Confidence Propagation Neural Network (BCPNN), and Multi-item Gamma Poisson Shrinkage (MGPS). Finally, the association between flurbiprofen and ADEs as well as clinical medical events was assessed.ResultsA total of 275 cases from the target population were identified in the FAERS database, with 788 instances of adverse events (AEs) occurring across 46 organ systems. We identified not only some common adverse reactions listed in the drug’s package insert, such as acute kidney injury, nausea and vomiting, and facial edema, but also significant signals that were not mentioned in the package insert, including Dysphonia, Drug abuse, and Pancreatitis acute. The median time to onset of flurbiprofen-related AEs was 1 day (interquartile range [IQR] 0–5 days), with most AEs occurring within the first month of flurbiprofen use.ConclusionThis study confirmed some common adverse reactions listed in the flurbiprofen drug package insert and identified significant unexpected adverse reactions. These findings can assist clinicians in conducting more comprehensive clinical monitoring when using the drug, thereby ensuring patient safety during treatment.
Facebook
TwitterWe include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).