Facebook
TwitterSummary statistics (mean, standard deviation, median, interquartile range, number of subjects) for “ln_adducts” in cases, controls, and total population.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Percentages may not total 100 due to rounding.Time and summary statistics in days (median and interquartile 25 to 75% range) from first time coded as on a palliative care register to date of death for each disease group.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics, mean ± SD, range, median and interquartile range (IQR).
Facebook
TwitterThese are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
TwitterWe include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Walmart Inc. is a multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. It is one of the world's largest companies by revenue and a key player in the retail sector. Walmart's stock is actively traded on major stock exchanges, making it an interesting subject for financial analysis.
This dataset contains historical stock price data for Walmart, sourced directly from Yahoo Finance using the yfinance Python API. The data covers daily stock prices and includes multiple key financial indicators.
This notebook performs an extensive EDA to uncover insights into Walmart's stock price trends, volatility, and overall behavior in the stock market. The following analysis steps are included:
This dataset and analysis can be useful for: - 📡 Stock Market Analysis – Evaluating Walmart’s stock price trends and volatility. - 🏦 Investment Research – Assisting traders and investors in making informed decisions. - 🎓 Educational Purposes – Teaching data science and financial analysis using real-world stock data. - 📊 Algorithmic Trading – Developing trading strategies based on historical stock price trends.
📥 Download the dataset and explore Walmart’s stock performance today! 🚀
Facebook
TwitterDescriptive statistics for time (mm:ss ± SD) to cessation of movement (COM) and external activity (EA, milli-gravity (mg) [g = acceleration of gravity or 9.8 m s − 2) for quartiles Q1 and Q3 as well as the interquartile range (IQR [Q3-Q1for finisher pigs (N = 79) depopulated using water-based foam (WBF), nitrogen-foam (N2F), and carbon dioxide (CO2).
Facebook
TwitterSD: Standard deviation.IQR: Interquartile range.
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
The dataset has been created specifically for practicing Python, NumPy, Pandas, and Matplotlib. It is designed to provide a hands-on learning experience in data manipulation, analysis, and visualization using these libraries.
Specifics of the Dataset:
The dataset consists of 5000 rows and 20 columns, representing various features with different data types and distributions. The features include numerical variables with continuous and discrete distributions, categorical variables with multiple categories, binary variables, and ordinal variables. Each feature has been generated using different probability distributions and parameters to introduce variations and simulate real-world data scenarios. The dataset is synthetic and does not represent any real-world data. It has been created solely for educational purposes.
One of the defining characteristics of this dataset is the intentional incorporation of various real-world data challenges:
Certain columns are randomly selected to be populated with NaN values, effectively simulating the common challenge of missing data. - The proportion of these missing values in each column varies randomly between 1% to 70%. - Statistical noise has been introduced in the dataset. For numerical values in some features, this noise adheres to a distribution with mean 0 and standard deviation 0.1. - Categorical noise is introduced in some features', with its categories randomly altered in about 1% of the rows. Outliers have also been embedded in the dataset, resonating with the Interquartile Range (IQR) rule
Context of the Dataset:
The dataset aims to provide a comprehensive playground for practicing Python, NumPy, Pandas, and Matplotlib. It allows learners to explore data manipulation techniques, perform statistical analysis, and create visualizations using the provided features. By working with this dataset, learners can gain hands-on experience in data cleaning, preprocessing, feature engineering, and visualization. Sources of the Dataset:
The dataset has been generated programmatically using Python's random number generation functions and probability distributions. No external sources or real-world data have been used in creating this dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The objective behind attempting this dataset was to understand the predictors that contribute to the life expectancy around the world. I have used Linear Regression, Decision Tree and Random Forest for this purpose. Steps Involved: - Read the csv file - Data Cleaning: - Variables Country and Status were showing as having character data types. These had to be converted to factor - 2563 missing values were encountered with Population variable having the most of the missing values i.e 652 - Missing rows were dropped before we could run the analysis. 3) Run Linear Regression - Before running linear regression, 3 variables were dropped as they were not found to be having that much of an effect on the dependent variable i.e Life Expectancy. These 3 variables were Country, Year & Status. This meant we are now working with 19 variables (1 dependent and 18 independent variables) - We run the linear regression. Multiple R squared is 83% which means that independent variables can explain 83% change or variance in the dependent variable. - OULTLIER DETECTION. We check for outliers using IQR and find 54 outliers. These outliers are then removed before we run the regression analysis once again. Multiple R squared increased from 83% to 86%. - MULTICOLLINEARITY. We check for multicollinearity using the VIF model(Variance Inflation Factor). This is being done in case when two or more independent variables showing high correlation. The thumb rule is that absolute VIF values above 5 should be removed. We find 6 variables that have a VIF value higher than 5 namely Infant.deaths, percentage.expenditure,Under.five.deaths,GDP,thinness1.19,thinness5.9. Infant deaths and Under Five deaths have strong collinearity so we drop infant deaths(which has the higher VIF value). - When we run the linear regression model again, VIF value of Under.Five.Deaths goes down from 211.46 to 2.74 while the other variable's VIF values reduce very less. Variable thinness1.19 is now dropped and we run the regression once more. - Variable thinness5.9 whose absolute VIF value was 7.61 has now dropped to 1.95. GDP and Population are still having VIF value more than 5 but I decided against dropping these as I consider them to be important independent variables. - SET THE SEED AND SPLIT THE DATA INTO TRAIN AND TEST DATA. We run the train data and get multiple R squared of 86% and p value less than that of alpha which states that it is statistically significant. We use the train data to predict the test data to find out the RMSE and MAPE. We run the library(Metrics) for this purpose. - In Linear Regression, RMSE (Root Mean Squared Error) is 3.2. This indicates that on an average, the predicted values have an error of 3.2 years as compared to the actual life expectancy values. - MAPE (Mean Absolute Percentage Error) is 0.037. This indicates an accuracy prediction of 96.20% (1-0.037). - MAE (Mean Absolute Error) is 2.55. This indicates that on an average, the predicted values deviate by approximately 2.83 years from the actual values.
Conclusion: Random Forest is the best model for predicting the life expectancy values as it has the lowest RMSE, MAPE and MAE.
Facebook
TwitterThe U.S. Geological Survey (USGS) National Water Use Program is responsible for compiling and disseminating the Nation's water-use data. Working in cooperation with local, State, and Federal agencies, the USGS has published an estimate of water use in the United States every 5 years, beginning in 1950. These 5-year compilations contain water-use estimates that are aggregated to the county level in the United States. This USGS data release contains summaries of method codes used in the 2015 national compilation of public supply, self-supplied domestic, thermoelectric, and irrigation water-use data. This data release also contains the county-level water-use estimates that support the evaluations in Luukkonen and others (2021). Finally, this data release contains summaries of regional medians and interquartile ranges from 1985 to 2015 that were used to highlight areas of unexpected variability, consistency and/or potential values that warrant further investigation. This data release supports the following publication: Luukkonen, C.L., Belitz, K., Sullivan, S.L., and Sargent, P., 2021, Factors affecting uncertainty of public supply, self-supplied domestic, irrigation, and thermoelectric water-use data, 1985-2015-evaluation of information sources, estimation methods, and data variability: U.S. Geological Survey Scientific Investigations Report 2021-5082, 78 p., https://doi.org/10.3133/sir20215082.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This synthetic dataset simulates a Phase III randomized controlled clinical trial evaluating CardioX (Drug A) versus an active comparator (Drug B) and a placebo for treating hypertension. It is designed for clinical data analysis, anomaly detection, and risk-based monitoring (RBM) applications.
The dataset includes 1,000 patients across 50 trial sites, with realistic patient demographics, blood pressure readings, cholesterol levels, dropout rates, and adverse event reporting. Several anomalies have been embedded to simulate real-world data quality issues commonly encountered in clinical trials.
This dataset is ideal for data quality assessments, statistical anomaly detection (Z-scores, IQR, clustering), and risk-based management (RBM) in clinical research.
🔹 Clinical Trial Data Analysis – Investigate treatment efficacy and safety trends.
🔹 Anomaly Detection – Apply Z-scores, IQR, and ML-based clustering methods to identify outliers.
🔹 Risk-Based Monitoring (RBM) – Detect potential site-level risks and data inconsistencies.
🔹 Machine Learning Applications – Train models for adverse event prediction or dropout risk estimation.
| Column Name | Description |
|---|---|
| Patient_ID | Unique identifier for each trial participant. |
| Site_ID | Site where the patient was enrolled (1-50) |
| Age | Patient age (in years). |
| Gender | Male or Female. |
| Enrollment_Date | Date when the patient was enrolled in the study. |
| Treatment_Group | Assigned treatment: Placebo, Drug A (CardioX), or Drug B (Active Comparator). |
| Adverse_Events | Number of adverse events (AEs) reported by the patient. |
| Dropout | Whether the patient dropped out of the study (1 = Yes, 0 = No). |
| Systolic_BP | Systolic Blood Pressure (mmHg). |
| Diastolic_BP | Diastolic Blood Pressure (mmHg). |
| Cholesterol_Level | Total cholesterol level (mg/dL). |
This dataset is fully synthetic and does not contain real patient data. It is created for educational, analytical, and research purposes in clinical data science and biostatistics.
🔗 If you use this dataset, tag me! Let’s discuss insights & findings! 🚀
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data supports a meta-analysis investigating ecological impacts of intense lawn management (mowing). Raw data on invertebrate abundance and temperature data was collected by Léonie Carignan-Guillemette (2018) and Caroline Turcotte (2017) under the supervision of Raphaël Proulx and Vincent Maire (refer to Appendix S1 within related publication for more information). Other data was gathered and processed according to the following: We searched the Scopus database on 8 February, 2019 with the following combinations of keywords: (lawn OR turf) AND mowing AND (urban OR city). Generally, studies were ineligible when: full-text of the article was not available even after contacting the authors; mowing was incidental to the study and not an experimental factor; response variables were not ecologically relevant; confounding factors (e.g. fertilisation) could not be isolated; a non-urban context was used; or simulated data were presented. We extracted the mean and statistical variation (standard deviation or standard error) for each response variable in control (less-intensively mown) and treatment (intensively mown) groups. Reported data were used when available. Otherwise, data were extracted from published figures using the Web Plot Digitizer tool. Where summary data on median, and interquartile range was presented, mean and standard deviation was estimated. Variables with multi-temporal data (e.g. soil moisture) were summarised using the mean and pooled standard deviation to provide an aggregated value per site per year. Where seasonal trends were evident in raw multi-temporal data (e.g. soil temperature), data was detrended using a polynomial function and analysis applied to the residuals.
Facebook
TwitterAdolescent girls and young women (AGYW) have a disproportionately high incidence of HIV compared to males of the same age in Uganda. AGYW are a priority sub-group for daily oral Pre-Exposure Prophylaxis (PrEP), but their adherence has consistently remained low. Short Message Service (SMS) reminders could improve adherence to PrEP in AGYW. However, there is a paucity of literature about the acceptability of SMS reminders among AGYW using PrEP. We assessed the level of acceptability of SMS reminders as a PrEP adherence support tool and the associated factors, among AGYW in Mukono district, Central Uganda. We consecutively enrolled AGYW using PrEP in Mukono district in a cross-sectional study. A structured pre-tested questionnaire was administered to participants by three trained research assistants. Data were analyzed in STATA 17.0; continuous variables were summarized using median and interquartile range (IQR) while categorical variables were summarized using frequencies and percentages...., The data set was collected through a reseacher administered questionnaire. The main dependent variable was acceptability of SMS reminders. This was measured using the seven constructs derived from the Theoretical Framework of Acceptability (TFA)(1). These include; affective attitude, burden, perceived effectiveness, ethicality, intervention coherence, opportunity costs, and self-efficacy. A 5-point Likert item question per construct was used and each level of a Likert scale was given a weight ranging from one to five. The summated scores from the weights assigned to each response were computed. The obtained summated acceptability score was then dichotomized using the 50th percentile of the possible summated scores which ranges from 7 to 35 (the 50th percentile is 21). Therefore “Acceptability of SMS reminders" was defined as a value greater than 21. The independent variables were captured as described in the data dictionary attached Data analysis was performed in STATA versi..., , The participants gave written informed consent to publish de-identified data in accordance with Uganda National Cuncil for Science and Technology (UNCST), a local human participant research regulator. The identifying characteristics like numerical age, physical address were reducted., # Acceptability of short message service reminders as the support tool for PrEP adherence among young women in Mukono district, Uganda
Dataset DOI: 10.5061/dryad.cvdncjt8h
In this dataset, we aimed to assess the acceptability of short message service (SMS) reminders among Adolescent Girls and Young Women (AGYW) prescribed Pre-Exposure Prophylaxis (PrEP). We also measured demographic and other individual factorsÂ
File: Manuscript_dataset.dta
Description:Â This section describes the variables included in the dataset (data dictionary)
| Variable Name | Variable type | Variable Label | Value Labels | | :------------------- | :------------ | :---------------------------------...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Objectives: To develop and pilot a tool to measure and improve pharmaceutical companies' clinical trial data sharing policies and practices. Design: Cross sectional descriptive analysis. Setting: Large pharmaceutical companies with novel drugs approved by the US Food and Drug Administration in 2015. Data sources: Data sharing measures were adapted from 10 prominent data sharing guidelines from expert bodies and refined through a multi-stakeholder deliberative process engaging patients, industry, academics, regulators, and others. Data sharing practices and policies were assessed using data from ClinicalTrials.gov, Drugs@FDA, corporate websites, data sharing platforms and registries (eg, the Yale Open Data Access (YODA) Project and Clinical Study Data Request (CSDR)), and personal communication with drug companies. Main outcome measures: Company level, multicomponent measure of accessibility of participant level clinical trial data (eg, analysis ready dataset and metadata); drug and trial level measures of registration, results reporting, and publication; company level overall transparency rankings; and feasibility of the measures and ranking tool to improve company data sharing policies and practices. Results: Only 25% of large pharmaceutical companies fully met the data sharing measure. The median company data sharing score was 63% (interquartile range 58-85%). Given feedback and a chance to improve their policies to meet this measure, three companies made amendments, raising the percentage of companies in full compliance to 33% and the median company data sharing score to 80% (73-100%). The most common reasons companies did not initially satisfy the data sharing measure were failure to share data by the specified deadline (75%) and failure to report the number and outcome of their data requests. Across new drug applications, a median of 100% (interquartile range 91-100%) of trials in patients were registered, 65% (36-96%) reported results, 45% (30-84%) were published, and 95% (69-100%) were publicly available in some form by six months after FDA drug approval. When examining results on the drug level, less than half (42%) of reviewed drugs had results for all their new drug applications trials in patients publicly available in some form by six months after FDA approval. Conclusions: It was feasible to develop a tool to measure data sharing policies and practices among large companies and have an impact in improving company practices. Among large companies, 25% made participant level trial data accessible to external investigators for new drug approvals in accordance with the current study's measures; this proportion improved to 33% after applying the ranking tool. Other measures of trial transparency were higher. Some companies, however, have substantial room for improvement on transparency and data sharing of clinical trials.
Facebook
Twitter(SE: Standard error, IQR: Interquartile range, n: Number of participants, P1Baseline (phase 1)).
Facebook
TwitterYounger fetuses are defined as GA <31 weeks, older fetuses are defined as GA≥31 weeks.*denotes significant p-values. Abbreviations: GA, gestational age; MRI, magnetic resonance imaging; M, male; F, female; SD, standard deviation; IQR, interquartile range.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a synthetic collection of student performance data created for data preprocessing, cleaning, and analysis practice in Data Mining and Machine Learning courses. It contains information about 1,020 students, including their study habits, attendance, and test performance, with intentionally introduced missing values, duplicates, and outliers to simulate real-world data issues.
The dataset is suitable for laboratory exercises, assignments, and demonstration of key preprocessing techniques such as:
| Column Name | Description |
|---|---|
| Student_ID | Unique identifier for each student (e.g., S0001, S0002, …) |
| Age | Age of the student (between 18 and 25 years) |
| Gender | Gender of the student (Male/Female) |
| Study_Hours | Average number of study hours per day (contains missing values and outliers) |
| Attendance(%) | Percentage of class attendance (contains missing values) |
| Test_Score | Final exam score (0–100 scale) |
| Grade | Letter grade derived from test scores (F, C, B, A, A+) |
Test_Score → Predict test score based on study hours, attendance, age, and gender.
Predict the student’s test score using their study hours, attendance percentage, and age.
🧠 Sample Features: X = ['Age', 'Gender', 'Study_Hours', 'Attendance(%)'] y = ['Test_Score']
You can use:
And analyze feature influence using correlation or SHAP/LIME explainability.
Facebook
Twitter*The Wilcoxon rank sum test was used for statistical analysis.†The Chi-squared test was used for statistical analysis.‡The t-test was used for statistical analysis.§Fisher's exact test was used for statistical analysis. IQR: interquartile range.
Facebook
TwitterIn the patient information sheet, outcome variables [bacterial pathogens and viral-bacterial coinfections (simultaneous occurrences)] and predictor variables (patient demographics, time frame, specimen type, type of bacterial isolate(s), and antimicrobial susceptibility patterns) were collected from the hospital records. The data were anonymized to ensure patient confidentiality. Data was entered and managed using Microsoft Excel, version 13.0, and analyzed using Statistical Package for Social Sciences (SPSS), version 17.0. Descriptive data were analyzed in terms of frequency and percentage. Quantitative data were reported as mean, median, and interquartile range (IQR). Qualitative variables were analyzed using the Chi-square test, while quantitative variables were analyzed using the independent student t-test, with statistical significance determined at a p-value of <0.05 within a 95% confidence interval (CI).
Facebook
TwitterSummary statistics (mean, standard deviation, median, interquartile range, number of subjects) for “ln_adducts” in cases, controls, and total population.