100+ datasets found
  1. n

    Data from: WiBB: An integrated method for quantifying the relative...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    zip
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qin Li; Xiaojun Kou (2021). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 20, 2021
    Dataset provided by
    Beijing Normal University
    Field Museum of Natural History
    Authors
    Qin Li; Xiaojun Kou
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.

    A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.

    Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.

  2. Price Prediction -Multiple Linear Regression

    • kaggle.com
    zip
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erol Masimov (2022). Price Prediction -Multiple Linear Regression [Dataset]. https://www.kaggle.com/datasets/erolmasimov/price-prediction-multiple-linear-regression
    Explore at:
    zip(6192 bytes)Available download formats
    Dataset updated
    Aug 3, 2022
    Authors
    Erol Masimov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The car company wants to enter a new market and needs an estimation of exactly which variables affect the car prices. The goal is: - Which variables are significant in predicting the price of a car - How well do those variables describe the price of a car

  3. Data sets of the study.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shouxi Zhu; Hongbin Gu (2023). Data sets of the study. [Dataset]. http://doi.org/10.1371/journal.pone.0283577.s001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shouxi Zhu; Hongbin Gu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThis study aimed to explore the adverse influences of mobile phone usage on pilots’ status, so as to improve flight safety.MethodsA questionnaire was designed, and a cluster random sampling method was adopted. Pilots of Shandong Airlines were investigated on the use of mobile phones. The data was analyzed by frequency statistics, linear regression and other statistical methods.ResultsA total of 340 questionnaires were distributed and 317 were returned, 315 of which were valid. The results showed that 239 pilots (75.87%) used mobile phones as the main means of entertainment in their leisure time. There was a significant negative correlation between age of pilots and playing mobile games (p

  4. Synthetic Dataset for Linear Regression

    • kaggle.com
    zip
    Updated Mar 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stable Space (2025). Synthetic Dataset for Linear Regression [Dataset]. https://www.kaggle.com/datasets/sharmajicoder/synthetic-dataset-for-linear-regression/data
    Explore at:
    zip(9111 bytes)Available download formats
    Dataset updated
    Mar 23, 2025
    Authors
    Stable Space
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The file contain dataset with two variables (x & y). The dataset is for Linear regression ML Models. The dataset can be used for Testing purpose. The x variable is the independent variable, and y is the dependent variable. The dataset has a correlation of 0.9981 showing the dataset is best suited for linear models and can be used for the testing purpose.

  5. q

    Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry...

    • qubeshub.org
    Updated Jan 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu (2022). Linear Regression (Excel) and Cellular Respiration for Biology, Chemistry and Mathematics [Dataset]. http://doi.org/10.25334/5PX5-H796
    Explore at:
    Dataset updated
    Jan 11, 2022
    Dataset provided by
    QUBES
    Authors
    Irene Corriette; Beatriz Gonzalez; Daniela Kitanska; Henriette Mozsolits; Sheela Vemu
    Description

    Students typically find linear regression analysis of data sets in a biology classroom challenging. These activities could be used in a Biology, Chemistry, Mathematics, or Statistics course. The collection provides student activity files with Excel instructions and Instructor Activity files with Excel instructions and solutions to problems.

    Students will be able to perform linear regression analysis, find correlation coefficient, create a scatter plot and find the r-square using MS Excel 365. Students will be able to interpret data sets, describe the relationship between biological variables, and predict the value of an output variable based on the input of an predictor variable.

  6. f

    Data Sheet 1_Non-linear correlation analysis between internet searches and...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    He, Yongzhang; Xia, Yixue; Wang, Yang; Huang, Fengxiang; Ran, Lingshi (2025). Data Sheet 1_Non-linear correlation analysis between internet searches and epidemic trends.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002036779
    Explore at:
    Dataset updated
    Apr 4, 2025
    Authors
    He, Yongzhang; Xia, Yixue; Wang, Yang; Huang, Fengxiang; Ran, Lingshi
    Description

    IntroductionThis study uses a non-linear model to explore the impact mechanism of change rates between internet search behavior and confirmed COVID-19 cases. The research background focuses on epidemic monitoring, leveraging internet search data as a real-time tool to capture public interest and predict epidemic development. The goal is to establish a widely applicable mathematical framework through the analysis of long-term disease data.MethodsData were sourced from the Baidu Index for COVID-19-related search behavior and confirmed COVID-19 case data from the National Health Commission of China. A logistic-based non-linear differential equation model was employed to analyze the mutual influence mechanism between confirmed case numbers and the rate of change in search behavior. Structural and operator relationships between variables were determined through segmented data fitting and regression analysis.ResultsThe results indicated a significant non-linear correlation between search behavior and confirmed COVID-19 cases. The non-linear differential equation model constructed in this study successfully passed both structural and correlation tests, with dynamic data fitting showing a high degree of consistency. The study further quantified the mutual influence between search behavior and confirmed cases, revealing a strong feedback loop between the two: changes in search behavior significantly drove the growth of confirmed cases, while the increase in confirmed cases also stimulated the public's search behavior. This finding suggests that search behavior not only reflects the development trend of the epidemic but can also serve as an effective indicator for predicting the evolution of the pandemic.DiscussionThis study enriches the understanding of epidemic transmission mechanisms by quantifying the dynamic interaction between public search behavior and epidemic spread. Compared to simple prediction models, this study focuses more on stable common mechanisms and structural analysis, laying a foundation for future research on public health events.

  7. Study Hours ,Student Scores for Linear Regression

    • kaggle.com
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    douaa bennoune (2024). Study Hours ,Student Scores for Linear Regression [Dataset]. https://www.kaggle.com/datasets/douaabennoune/study-hours-student-scores-for-linear-regression
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2024
    Dataset provided by
    Kaggle
    Authors
    douaa bennoune
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a collection of 100 randomly generated data points representing the relationship between the number of hours a student spends studying and their corresponding performance, measured as a score. The data has been generated to simulate a real-world scenario where study hours are assumed to influence academic outcomes, making it an excellent resource for linear regression analysis and other machine learning tasks.

    Each row in the dataset consists of:

    Hours: The number of hours a student dedicates to studying, ranging between 0 and 10 hours. Scores: The student's performance score, represented as a percentage, ranging from 0 to 100. Use Cases: This dataset is particularly useful for:

    Linear Regression: Exploring how study hours influence student performance, fitting a regression line to predict scores based on study time. Data Science & Machine Learning: Practicing regression analysis, training models, and applying other predictive algorithms. Educational Research: Simulating data-driven insights into student behavior and performance metrics. Features: 100 rows of data. Continuous numerical variables suitable for regression tasks. Generated for educational purposes, making it ideal for students, teachers, and beginners in machine learning and data science. Potential Applications: Build a linear regression model to predict student scores. Investigate the correlation between study time and performance. Apply data visualization techniques to better understand the data. Use the dataset to experiment with model evaluation metrics like Mean Squared Error (MSE) and R-squared.

  8. Call Center Simulated Data

    • kaggle.com
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Sebastián Campos Ortiz (2023). Call Center Simulated Data [Dataset]. https://www.kaggle.com/datasets/scss17/call-center-simulated-data
    Explore at:
    zip(3098 bytes)Available download formats
    Dataset updated
    Mar 28, 2023
    Authors
    Pablo Sebastián Campos Ortiz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The aim of this data set is to be used along with my notebook Linear Regression Notes which provides a guideline for applying correlation analysis and linear regression models from a statistical approach.

    A fictional call center is interested in knowing the relationship between the number of personnel and some variables that measure their performance such as average answer time, average calls per hour, and average time per call. Data were simulated to represent 200 shifts.

  9. Baseline multiple linear regression model with end fitness as the response...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Weaving; Ben Jones; Matt Ireton; Sarah Whitehead; Kevin Till; Clive B. Beggs (2023). Baseline multiple linear regression model with end fitness as the response variable, showing the calculated variable inflation factors (VIFs). [Dataset]. http://doi.org/10.1371/journal.pone.0211776.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Dan Weaving; Ben Jones; Matt Ireton; Sarah Whitehead; Kevin Till; Clive B. Beggs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Baseline multiple linear regression model with end fitness as the response variable, showing the calculated variable inflation factors (VIFs).

  10. Linear Regression Analysis

    • kaggle.com
    zip
    Updated Nov 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Iqbal Hossain (2020). Linear Regression Analysis [Dataset]. https://www.kaggle.com/datasets/iqbalrony/linear-regression-analysis/code
    Explore at:
    zip(9442 bytes)Available download formats
    Dataset updated
    Nov 30, 2020
    Authors
    Md Iqbal Hossain
    Description

    Context

    This dataset was given by our professor as a lab work of machine learning.

    Content

    In the exam.xls file, the results of a computer science exam are recorded. The following parameters are used:

    id = final digit of the student matriculation number group = name of the learning group sex = gender (m = male, f = female) quanti = number of solved exercises points = total score achieved from the exercises exam = total score achieved from the final written exam (* "Students must have to participate in the final written exam. If not, he/she will be considered as fail") passed = self-explanatory

    Please solve the following tasks as far as possible with the IBM SPSS Modeler in a single stream/with python programming language.

    TASK 1: (statistics) Determine the average, median, mode, and standard deviation of the points from the exercises.

    TASK 2: (regression)

    Check whether the points in the exam (y value) is dependent from the total score from the exercises (x value) by preparing the data graphically! Perform a linear regression! What are the parameters of the trend line? Determine (by hand) the correlation coefficient between the points (x value) in the exercises and in the points in the exam (y value). Interpret the result!

    Acknowledgements

    Basically, We used IBM SPSS Modeler for performing the task, as most of the students were not from the computer science background. But due to my self-interest, I also tried to solve the task with the python data science library.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  11. m

    Code and datasets: How to count bird calls? Vocal activity indices may...

    • data.mendeley.com
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anja Hutschenreiter (2024). Code and datasets: How to count bird calls? Vocal activity indices may provide different insights into bird abundance and behaviour depending on species traits [Dataset]. http://doi.org/10.17632/fc9mzp2y5c.1
    Explore at:
    Dataset updated
    Apr 18, 2024
    Authors
    Anja Hutschenreiter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R code and datasets for analyses presented in Hutschenreiter et al. 2024 "How to count bird calls? Vocal activity indices may provide different insights into bird abundance and behaviour depending on species traits" (Methods in Ecology and Evolution)

  12. f

    Correlation coefficients (R2) from simple linear regressions between the US...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Sep 23, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad, Vinay; Goldstein, Jeffrey A. (2014). Correlation coefficients (R2) from simple linear regressions between the US News and World Report Score and its contributors. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001258526
    Explore at:
    Dataset updated
    Sep 23, 2014
    Authors
    Prasad, Vinay; Goldstein, Jeffrey A.
    Description

    Correlation coefficients (R2) from simple linear regressions between the US News and World Report Score and its contributors.

  13. Pearson’s correlation and multiple stepwise linear regression analysis of...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fei Huang; Zhi Yang; Baihui Xu; Yufang Bi; Min Xu; Yu Xu; Jieli Lu; Yu Liu; Meng Dai; Wenzhong Zhou; Weiqing Wang; Yuhong Chen (2023). Pearson’s correlation and multiple stepwise linear regression analysis of risk factors associated with CIMT. [Dataset]. http://doi.org/10.1371/journal.pone.0054628.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Fei Huang; Zhi Yang; Baihui Xu; Yufang Bi; Min Xu; Yu Xu; Jieli Lu; Yu Liu; Meng Dai; Wenzhong Zhou; Weiqing Wang; Yuhong Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pearson’s correlation and multiple stepwise linear regression analysis of risk factors associated with CIMT.

  14. f

    Descriptive statistics.

    • plos.figshare.com
    xls
    Updated Oct 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha (2023). Descriptive statistics. [Dataset]. http://doi.org/10.1371/journal.pgph.0002475.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    PLOS Global Public Health
    Authors
    Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.

  15. Dataset for Multiple Linear Regression-2

    • kaggle.com
    zip
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mit Gandhi (2024). Dataset for Multiple Linear Regression-2 [Dataset]. https://www.kaggle.com/datasets/mitgandhi10/dataset-for-multiple-linear-regression-2
    Explore at:
    zip(89824 bytes)Available download formats
    Dataset updated
    Dec 15, 2024
    Authors
    Mit Gandhi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains detailed information about vehicles, including their engine characteristics, fuel consumption, and CO2 emissions. It is a valuable resource for analyzing the impact of various factors like engine size, transmission type, and fuel type on a vehicle's carbon emissions.

    Features:

    Engine Size (L): The engine size of the vehicle in liters. Cylinders: Number of cylinders in the engine. Fuel Consumption (City, Highway, Combined): Fuel consumption in liters per 100 kilometers for city, highway, and combined driving conditions. Fuel Consumption (Combined - MPG): Fuel consumption in miles per gallon for combined driving conditions. CO2 Emissions (g/km): Carbon dioxide emissions measured in grams per kilometer. Categorical Columns: Make: Manufacturer of the vehicle. Model: Specific model name. Vehicle Class: Vehicle category (e.g., sedan, SUV, etc.). Transmission: Type of transmission (automatic, manual, etc.). Fuel Type: Type of fuel used (e.g., gasoline, diesel, hybrid, etc.). This dataset is ideal for exploring:

    The correlation between fuel efficiency and CO2 emissions. The role of vehicle specifications in determining environmental impact. Regression modeling and machine learning applications.

  16. m

    Supplementary Table S3: Correlation coefficients calculated between IMFs and...

    • data.mendeley.com
    • narcis.nl
    Updated Sep 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CHIRANJIT MAJI (2020). Supplementary Table S3: Correlation coefficients calculated between IMFs and original data series of arrhythmic and normal subjects. [Dataset]. http://doi.org/10.17632/myf37k2x27.1
    Explore at:
    Dataset updated
    Sep 12, 2020
    Authors
    CHIRANJIT MAJI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The correlation coefficient between the IMFs of and the original signal (see the description of Supplementary Table S2) was calculated to identify the IMFs with the real components of the original signal (see Supplementary Table S1). The correlation coefficients were calculated using the following equations as described by Wharton et al., 2013 and Jha et al., 2006.

  17. Linear regression analysis of co-expression network quality scores.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexis Vandenbon (2023). Linear regression analysis of co-expression network quality scores. [Dataset]. http://doi.org/10.1371/journal.pone.0263344.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexis Vandenbon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Linear regression analysis of co-expression network quality scores.

  18. f

    Linear correlation (Pearson) coefficient between each pair of C-GFs.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cominelli, Eleonora; Orozco-Arroyo, Gregorio; Boncompagni, Eleonora; Daminati, Maria Gloria; Zu, Theophilus Tenutse Kwaku; Gangashetty, Prakash Irappa; Sparvoli, Francesca; Nielsen, Erik; Grando, Stefania (2018). Linear correlation (Pearson) coefficient between each pair of C-GFs. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000633948
    Explore at:
    Dataset updated
    Jun 1, 2018
    Authors
    Cominelli, Eleonora; Orozco-Arroyo, Gregorio; Boncompagni, Eleonora; Daminati, Maria Gloria; Zu, Theophilus Tenutse Kwaku; Gangashetty, Prakash Irappa; Sparvoli, Francesca; Nielsen, Erik; Grando, Stefania
    Description

    Linear correlation (Pearson) coefficient between each pair of C-GFs.

  19. health-focused products analysis

    • kaggle.com
    zip
    Updated Jul 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    possakorn.ty (2022). health-focused products analysis [Dataset]. https://www.kaggle.com/datasets/possakornty/case-study-bellabeat
    Explore at:
    zip(30061 bytes)Available download formats
    Dataset updated
    Jul 28, 2022
    Authors
    possakorn.ty
    Description

    This project relates to Bellabeat company, a high-tech manufacturer of health-focused products for women, and meet different characters and team members. Data sets include with Character Dataset and Product Dataset. with linear correlation review and explore the relationship between each values

  20. Pearson's correlation and Multiple linear regression analysis for the...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fan Wang; Ping Ye; Leiming Luo; Ruyi Xu; Yongyi Bai; Hongmei Wu (2023). Pearson's correlation and Multiple linear regression analysis for the association between eGFR and the hs-cTnT levels. [Dataset]. http://doi.org/10.1371/journal.pone.0038218.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Fan Wang; Ping Ye; Leiming Luo; Ruyi Xu; Yongyi Bai; Hongmei Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: High-sensitivity cardiac troponin T levels were natural logarithm transformed. BMI, body-mass index; BP, blood pressure; HDL, high-density lipoprotein; LDL, low-density lipoprotein; hs-CRP, high-sensitivity C-reactive protein; NT-proBNP, N-terminal pro B-type natriuretic peptide; eGFR, estimated glomerular filtration rate.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Qin Li; Xiaojun Kou (2021). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1

Data from: WiBB: An integrated method for quantifying the relative importance of predictive variables

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Aug 20, 2021
Dataset provided by
Beijing Normal University
Field Museum of Natural History
Authors
Qin Li; Xiaojun Kou
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

This dataset contains simulated datasets, empirical data, and R scripts described in the paper: “Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)”.

A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species’ presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.

Methods To simulate independent datasets (size = 1000), we adopted Galipaud et al.’s approach (2014) with custom modifications of the data.simulation function, which used the multiple normal distribution function rmvnorm in R package mvtnorm(v1.0-5, Genz et al. 2016). Each dataset was simulated with a preset correlation structure between a response variable (y) and four predictors(x1, x2, x3, x4). The first three (genuine) predictors were set to be strongly, moderately, and weakly correlated with the response variable, respectively (denoted by large, medium, small Pearson correlation coefficients, r), while the correlation between the response and the last (spurious) predictor was set to be zero. We simulated datasets with three levels of differences of correlation coefficients of consecutive predictors, where ∆r = 0.1, 0.2, 0.3, respectively. These three levels of ∆r resulted in three correlation structures between the response and four predictors: (0.3, 0.2, 0.1, 0.0), (0.6, 0.4, 0.2, 0.0), and (0.8, 0.6, 0.3, 0.0), respectively. We repeated the simulation procedure 200 times for each of three preset correlation structures (600 datasets in total), for LM fitting later. For GLM fitting, we modified the simulation procedures with additional steps, in which we converted the continuous response into binary data O (e.g., occurrence data having 0 for absence and 1 for presence). We tested the WiBB method, along with two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate the ability to correctly rank predictor importances under various scenarios. The empirical dataset of 71 Mimulus species was collected by their occurrence coordinates and correponding values extracted from climatic layers from WorldClim dataset (www.worldclim.org), and we applied the WiBB method to infer important predictors for their geographical distributions.

Search
Clear search
Close search
Google apps
Main menu