74 datasets found
  1. Diabetes data

    • kaggle.com
    zip
    Updated Jul 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Veronica Zheng (2020). Diabetes data [Dataset]. https://www.kaggle.com/veronicazheng/diabetes-data
    Explore at:
    zip(13479 bytes)Available download formats
    Dataset updated
    Jul 9, 2020
    Authors
    Veronica Zheng
    Description

    Dataset

    This dataset was created by Veronica Zheng

    Released under Other (specified in description)

    Contents

  2. logistic regression

    • kaggle.com
    zip
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R Durai Srinivasan (2025). logistic regression [Dataset]. https://www.kaggle.com/datasets/rduraisrinivasan/logistic-regression
    Explore at:
    zip(13041 bytes)Available download formats
    Dataset updated
    Nov 28, 2025
    Authors
    R Durai Srinivasan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by R Durai Srinivasan

    Released under MIT

    Contents

  3. U

    An example data set for exploration of Multiple Linear Regression

    • data.usgs.gov
    • catalog.data.gov
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV
    Explore at:
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    William Farmer
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    1956 - 2016
    Description

    This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

  4. f

    Supplement 1. R code demonstrating how to fit a logistic regression model,...

    • figshare.com
    • wiley.figshare.com
    html
    Updated Aug 9, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David I. Warton; Francis K. C. Hui (2016). Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference. [Dataset]. http://doi.org/10.6084/m9.figshare.3550407.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Aug 9, 2016
    Dataset provided by
    Wiley
    Authors
    David I. Warton; Francis K. C. Hui
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    File List glmmeg.R: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data. boot.glmm.R: R code for estimating P-values by applying the bootstrap to a GLMM likelihood ratio statistic. Description glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate. boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.

  5. f

    Additional file 18: R code. of Evaluation of logistic regression models and...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Feb 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Choi, Seung; DeStefano, Anita; Dupuis, JosÊe; Lunetta, Kathryn; Labadorf, Adam; Myers, Richard (2017). Additional file 18: R code. of Evaluation of logistic regression models and effect of covariates for case–control study in RNA-Seq analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001797663
    Explore at:
    Dataset updated
    Feb 7, 2017
    Authors
    Choi, Seung; DeStefano, Anita; Dupuis, JosĂŠe; Lunetta, Kathryn; Labadorf, Adam; Myers, Richard
    Description

    This R code regenerates the simulated data sets. (R 6 kb)

  6. d

    Data from: Data to support Leveraging machine learning to automate...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data to support Leveraging machine learning to automate regression model evaluations for large multi-site water-quality trend studies [Dataset]. https://catalog.data.gov/dataset/data-to-support-leveraging-machine-learning-to-automate-regression-model-evaluations-for-l
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release contains one dataset and one model archive in support of the journal article "Leveraging machine learning to automate regression model evaluations for large multi-site water-quality trend studies" by Jennifer C. Murphy and Jeffrey G. Chanat. The model archive contains scripts (run in R) to reproduce the four machine learning models (logistic regression, linear and quadratic discriminant analysis, and k-nearest neighbors) trained and tested as part of the journal article. The dataset contains the estimated probabilities for each of these models when applied to a training and test dataset.

  7. f

    R script and dataset for Bayesian hierarchical logistic modeling of percent...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Mar 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brenden, Travis (2017). R script and dataset for Bayesian hierarchical logistic modeling of percent male in sea lamprey populations in lentic and river environments [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001810488
    Explore at:
    Dataset updated
    Mar 1, 2017
    Authors
    Brenden, Travis
    Description

    R code for conducting analyses described in Johnson, N.S., W.D. Swink, and T.O. Brenden. Field study suggests that sex determination in sea lamprey is directly influenced by larval growth rate. Proceedings of the Royal Society B.data.csv is the raw data for fitting the Bayesian hierarchical logistic regression modelRead me_Metadata... is the metadata describing the variables in the data.csv fileRscript.R is the R script for fitting the Bayesian hierarchical logistic regression model

  8. Data in R package LOGIT

    • kaggle.com
    zip
    Updated Nov 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daisy S (2023). Data in R package LOGIT [Dataset]. https://www.kaggle.com/daisyamber/data-in-r-package-logit
    Explore at:
    zip(71179 bytes)Available download formats
    Dataset updated
    Nov 7, 2023
    Authors
    Daisy S
    License

    http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html

    Description

    Data for Hilbe, J.M. 2015. Practical Guide to Logistic Regression (Chapman and Hall/CRC Press).

    Version: 1.3

    CRAN: https://CRAN.R-project.org/package=LOGIT (removed)

    CRAN archive: https://cran.r-project.org/src/contrib/Archive/LOGIT (archived on 2018-5-10)

    Mirror: GitHub

  9. B

    Replication Data for: Site C Logistic Regression model

    • borealisdata.ca
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Taylor (2025). Replication Data for: Site C Logistic Regression model [Dataset]. http://doi.org/10.5683/SP3/MA1ATA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2025
    Dataset provided by
    Borealis
    Authors
    Eric Taylor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Site C dam, Canada, Ft St John, Peace River, BC
    Description

    The data are genetic assignments to upstream or downstream of Site C dam (bull trout, Arctic grayling, and rainbow trout). Columns are defined in the csv file. Also file of R code to run analysis

  10. Analyses.R

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    txt
    Updated Jan 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mu (2020). Analyses.R [Dataset]. http://doi.org/10.6084/m9.figshare.11511219.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 9, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Chi-square test and logistic regression model in the study.

  11. f

    Statistical summary of dataset1.

    • plos.figshare.com
    xls
    Updated Dec 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Al-Shourbaji; Pramod H. Kachare; Abdoh Jabbari; Raimund Kirner; Digambar Puri; Mostafa Mehanawi; Abdalla Alameen (2024). Statistical summary of dataset1. [Dataset]. http://doi.org/10.1371/journal.pone.0314391.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Ibrahim Al-Shourbaji; Pramod H. Kachare; Abdoh Jabbari; Raimund Kirner; Digambar Puri; Mostafa Mehanawi; Abdalla Alameen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the contemporary context of a burgeoning energy crisis, the accurate and dependable prediction of Solar Radiation (SR) has emerged as an indispensable component within thermal systems to facilitate renewable energy generation. Machine Learning (ML) models have gained widespread recognition for their precision and computational efficiency in addressing SR prediction challenges. Consequently, this paper introduces an innovative SR prediction model, denoted as the Cheetah Optimizer-Random Forest (CO-RF) model. The CO component plays a pivotal role in selecting the most informative features for hourly SR forecasting, subsequently serving as inputs to the RF model. The efficacy of the developed CO-RF model is rigorously assessed using two publicly available SR datasets. Evaluation metrics encompassing Mean Absolute Error (MAE), Mean Squared Error (MSE), and coefficient of determination (R2) are employed to validate its performance. Quantitative analysis demonstrates that the CO-RF model surpasses other techniques, Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neural Network, and standalone Random Forest (RF), both in the training and testing phases of SR prediction. The proposed CO-RF model outperforms others, achieving a low MAE of 0.0365, MSE of 0.0074, and an R2 of 0.9251 on the first dataset, and an MAE of 0.0469, MSE of 0.0032, and R2 of 0.9868 on the second dataset, demonstrating significant error reduction.

  12. Geographically Weighted Regression in R

    • kaggle.com
    zip
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fajar Krisna (2023). Geographically Weighted Regression in R [Dataset]. https://www.kaggle.com/datasets/fajarkrisnajaya/geographically-weighted-regression-in-r
    Explore at:
    zip(26311 bytes)Available download formats
    Dataset updated
    Jul 1, 2023
    Authors
    Fajar Krisna
    Description

    Implementation of GWR in R

    This repository contains code and files related to my project on Geographically Weighted Regression (GWR) in R. The dataset is from Badan Pusat Statistik. Files

    Dataset.xlsx: This file contains the dataset used in the analysis.
    
    GWLR.R: This script implements Geographically Weighted Logistic Regression in R.
    
    GWPR.r: This script implements Geographically Weighted Poisson Regression in R.
    
    GWR.R: This script implements Geographically Weighted Regression in R.
    

    License

    This project is licensed under the MIT License. See the LICENSE file for more details. Contact

    If you have any questions or suggestions, feel free to contact me.

  13. f

    Initial set of covariates for logistic regression.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Byington, Carrie L.; Stockmann, Chris; Ampofo, Krow; Pavia, Andrew T.; Adler, Frederick R. (2018). Initial set of covariates for logistic regression. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000626750
    Explore at:
    Dataset updated
    Jul 25, 2018
    Authors
    Byington, Carrie L.; Stockmann, Chris; Ampofo, Krow; Pavia, Andrew T.; Adler, Frederick R.
    Description

    Initial set of covariates for logistic regression.

  14. m

    Data for: A systematic review showed no performance benefit of machine...

    • data.mendeley.com
    • search.datacite.org
    Updated Mar 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Van Calster (2019). Data for: A systematic review showed no performance benefit of machine learning over logistic regression for clinical prediction models [Dataset]. http://doi.org/10.17632/sypyt6c2mc.1
    Explore at:
    Dataset updated
    Mar 14, 2019
    Authors
    Ben Van Calster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The uploaded files are:

    1) Excel file containing 6 sheets in respective Order: "Data Extraction" (summarized final data extractions from the three reviewers involved), "Comparison Data" (data related to the comparisons investigated), "Paper level data" (summaries at paper level), "Outcome Event Data" (information with respect to number of events for every outcome investigated within a paper), "Tuning Classification" (data related to the manner of hyperparameter tuning of Machine Learning Algorithms).

    2) R script used for the Analysis (In order to read the data, please: Save "Comparison Data", "Paper level data", "Outcome Event Data" Excel sheets as txt files. In the R script srpap: Refers to the "Paper level data" sheet, srevents: Refers to the "Outcome Event Data" sheet and srcompx: Refers to " Comparison data Sheet".

    3) Supplementary Material: Including Search String, Tables of data, Figures

    4) PRISMA checklist items

  15. [A Procedure for Multilevel Logistic Modeling] Appendix, Datasets, and...

    • figshare.com
    pdf
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Sommet (2024). [A Procedure for Multilevel Logistic Modeling] Appendix, Datasets, and Syntax Files [Dataset]. http://doi.org/10.6084/m9.figshare.5350786.v6
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Nicolas Sommet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • For each software, a series of sub-appendices describes the way to handle each stage of the procedure;- For each software, a .zip file contains the dataset in .dta (for Stata), .rdata (for R), .dat (for Mplus), and .sav (for SPSS), as well as the syntax file(s) in .do (for Stata), .R (for R), .inp (for Mplus), and .sps (for SPSS)- The dataset is also provided in .csv format.If you notice a mistake in the Stata or SPSS-related Appendices and/or syntax files, please report it to Nicolas Sommet (nicolas.sommet@unil.ch). If you notice a mistake in the R or Mplus-related Appendices and/or syntax files, please report it to Davide Morselli (davide.morselli@unil.ch).Sommet, N. and Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30, 203–218, DOI: https://doi.org/10.5334/irsp.90
  16. Telecom Churn - Logistic Regression

    • kaggle.com
    zip
    Updated Mar 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav B R (2023). Telecom Churn - Logistic Regression [Dataset]. https://www.kaggle.com/datasets/gauravbr/telecom-churn-logistic-regression
    Explore at:
    zip(269772 bytes)Available download formats
    Dataset updated
    Mar 6, 2023
    Authors
    Gaurav B R
    Description

    Dataset

    This dataset was created by Gaurav B R

    Contents

  17. Additional file 1 of Logistic regression has similar performance to...

    • springernature.figshare.com
    • commons.datacite.org
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anita L. Lynam; John M. Dennis; Katharine R. Owen; Richard A. Oram; Angus G. Jones; Beverley M. Shields; Lauric A. Ferrat (2023). Additional file 1 of Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults [Dataset]. http://doi.org/10.6084/m9.figshare.12569144.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anita L. Lynam; John M. Dennis; Katharine R. Owen; Richard A. Oram; Angus G. Jones; Beverley M. Shields; Lauric A. Ferrat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1: Figure S1. Flow diagram of participants through the model development stages. T1D: type 1 diabetes, T2D: type 2 diabetes. Figure S2. ROC AUC plots obtained using external validation dataset for seven prediction models. Legend: Solid lines: black = Support Vector Machine, dark grey = Logistic Regression, light grey = Random Forest. Dotted lines: black = Neural Network, dark grey = K-Nearest Neighbours, light grey = Gradient Boosting Machine. Figure S3. Correlation coefficient matrix and scatter plot of model predictions obtained from external test validation data.

  18. f

    Univariable logistic regression models.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Apr 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pharris, Anastasia; Semenza, Jan C.; Malliori, Melpomeni-Minerva; Sypsa, Vana; Nikolopoulos, Georgios K.; Fotiou, Anastasios; Friedman, Samuel R.; Hatzakis, Angelos; Costa-Storti, Claudia; Kanavou, Eleftheria; Detsis, Marios; Suk, Jonathan E.; Richardson, Clive; Paraskevis, Dimitrios (2015). Univariable logistic regression models. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001847795
    Explore at:
    Dataset updated
    Apr 15, 2015
    Authors
    Pharris, Anastasia; Semenza, Jan C.; Malliori, Melpomeni-Minerva; Sypsa, Vana; Nikolopoulos, Georgios K.; Fotiou, Anastasios; Friedman, Samuel R.; Hatzakis, Angelos; Costa-Storti, Claudia; Kanavou, Eleftheria; Detsis, Marios; Suk, Jonathan E.; Richardson, Clive; Paraskevis, Dimitrios
    Description

    The dependent variable was dichotomous taking value 1 for years in which a European Economic Area (EEA) country was experiencing an HIV outbreak, 0 otherwise. The results include Odds Ratios (OR), Lower (L) and Upper (U) limits of the confidence interval (CI), P-values, the number of Observations (Obs) in each model, and the number of countries (C) from which data were obtained for at least one year.†Per capita or per population††PWI: Public Wealth Index = GDP per capita divided by S80/S20 ratio.Univariable logistic regression models.

  19. Bank Loan Approval - LR, DT, RF and AUC

    • kaggle.com
    zip
    Updated Nov 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Bank Loan Approval - LR, DT, RF and AUC [Dataset]. https://www.kaggle.com/datasets/vikramamin/bank-loan-approval-lr-dt-rf-and-auc
    Explore at:
    zip(61437 bytes)Available download formats
    Dataset updated
    Nov 7, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • DATASET: Dependent variable is 'Personal.Loan'. 0 indicates loan not approved and 1 indicates loan approved.
    • OBJECTIVE : We will do Exploratory Data Analysis and use Logistic Regression, Decision Tree, Random Forest and AUC to find out which is the best model. Steps:
    • Set the working directory and read the data
    • Check the data types of all the variables https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F020afd07cf0c5ba058d88add9bcd467a%2FPicture1.png?generation=1699357564112927&alt=media" alt="">
    • DATA CLEANING
    • We need to change the data types of certain variables to factor vector
    • Check for missing data, duplicate records and remove insignificant variables https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fa286a5225207d4419b34bcf800e3cb67%2FPicture2.png?generation=1699357685993423&alt=media" alt="">
    • New data frame created called 'bank1' after dropping the 'ID' column.
    • EXPLORATORY DATA ANALYSIS
    • We will try to get some insights by digging into the data through bar charts and box plots which can help the bank management in decision making
    • Run the required libraries https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F7363f4b9ca8245b6e998bf07005fa099%2FPicture3.png?generation=1699357871368520&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8dba10f16fc6c2d7fd51a4c82a692136%2FCount%20of%20Loans%20Approved%20%20Not%20Approved.jpeg?generation=1699357967347355&alt=media" alt="">
    • Out of the total 5000 customers, 4520 have not been approved for a loan while 480 have been https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fe5eec968e7b264d9ec540bd1f24379fd%2FPicture4.png?generation=1699358066228901&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fb64eba6f373d5c043c9f504cfa348a75%2FPicture5.png?generation=1699358103026827&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F94608993dc12cdc31cfeca92932e0cb5%2FBoxPlot%20Income%20and%20Family.jpeg?generation=1699358148840198&alt=media" alt="">
    • THIS INDICATES THAT INCOME IS HIGHER WHEN THERE ARE LESS FAMILY MEMBERS https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8e44daf4ed42094f71c3000737f07a32%2FPicture6.png?generation=1699360599956530&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F0fd9010b95acf9ad20f7b9d0e171f305%2FBoxplot%20between%20Income%20%20Personal%20Loan.jpeg?generation=1699359231020725&alt=media" alt="">
    • THIS INDICATES PERSONAL LOAN HAS BEEN APPROVED FOR CUSTOMERS HAVING HIGHER INCOME https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Ff817481849aba7f176b7c4d0147308de%2FPicture7.png?generation=1699360768102069&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8e0bad8c76aaa11fe3b9909721d587f5%2FBoxPlot%20between%20Income%20%20Credit%20Cards.jpeg?generation=1699360798538907&alt=media" alt="">
    • THIS INDICATES THAT THE INCOME IS PRETTY SIMILAR FOR CUSTOMERS OWNING AND NOT OWNING A CREDIT CARD https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fab4b2fd2fde2a009bceb05a5a1161040%2FPicture8.png?generation=1699360882879480&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fe747dfa315609c4907ea83a9ac7f482c%2FBoxPlot%20between%20Income%20Class%20%20Mortgage.jpeg?generation=1699359265603058&alt=media" alt="">
    • CUSTOMERS BELONGING TO THE RICH CLASS (INCOME GROUP : 150-200) HAVE THE HIGHEST MORTGAGE https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6552d3fb9564b3ab3239ef67ed17a098%2FPicture9.png?generation=1699360938106437&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F4c7c7077e26229f455c1d9ef6e83195f%2FBoxPlot%20between%20CC%20Avg%20and%20Online%20Banking.jpeg?generation=1699359306645100&alt=media" alt="">
    • CC AVG IS PRETTY SIMILAR FOR THOSE WHO OPTED FOR ONLINE SERVICES AND THOSE WHO DID NOT
      https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Feddee2ca08a8138bb54eed0c25750280%2FPicture10.png?generation=1699360994581181&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6127e25258b25ccfbae66a5463a72773%2FBoxplot%20between%20CC%20Avg%20and%20Education.jpeg?generation=1699359333295827&alt=media" alt="">
    • MORE EDUCATED CUSTOMERS HAVE A HIGHER CREDIT AVERAGE ![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F...
  20. Carvana IsBadBuy? -Logistic RGS

    • kaggle.com
    zip
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Selda Firouzmand (2025). Carvana IsBadBuy? -Logistic RGS [Dataset]. https://www.kaggle.com/datasets/seldafirouzmand/carvana-isbadbuy-csunmrk-case2
    Explore at:
    zip(1184 bytes)Available download formats
    Dataset updated
    Mar 4, 2025
    Authors
    Selda Firouzmand
    Description

    This R script is designed to build a high-performance logistic regression model for predicting whether a car is a bad buy (IsBadBuy) using the Carvana dataset. It improves prediction accuracy by:

    Handling Missing Values – Uses median for numeric and mode for categorical variables instead of replacing NULLs with zero. Feature Engineering – Adds log transformation for VehBCost and dummy encodes categorical variables for better model performance. Model Training & Evaluation – Runs a logistic regression model, calculates McFadden’s pseudo R² for model fit, and generates a hit rate score for accuracy. Prediction & Submission – Predicts IsBadBuy for the test set and creates a submission file (optimized_submission.csv) in the required Kaggle format.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Veronica Zheng (2020). Diabetes data [Dataset]. https://www.kaggle.com/veronicazheng/diabetes-data
Organization logo

Diabetes data

Source: https://www.coursera.org/learn/logistic-regression-r-public-health

Explore at:
zip(13479 bytes)Available download formats
Dataset updated
Jul 9, 2020
Authors
Veronica Zheng
Description

Dataset

This dataset was created by Veronica Zheng

Released under Other (specified in description)

Contents

Search
Clear search
Close search
Google apps
Main menu