74 datasets found

Diabetes data
kaggle.com
zip
Updated Jul 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Veronica Zheng (2020). Diabetes data [Dataset]. https://www.kaggle.com/veronicazheng/diabetes-data
Explore at:
zip(13479 bytes)Available download formats
Dataset updated
Jul 9, 2020
Authors
Veronica Zheng
Description
Dataset

This dataset was created by Veronica Zheng

Released under Other (specified in description)

Contents
logistic regression
kaggle.com
zip
Updated Nov 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R Durai Srinivasan (2025). logistic regression [Dataset]. https://www.kaggle.com/datasets/rduraisrinivasan/logistic-regression
Explore at:
zip(13041 bytes)Available download formats
Dataset updated
Nov 28, 2025
Authors
R Durai Srinivasan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by R Durai Srinivasan

Released under MIT

Contents
U
An example data set for exploration of Multiple Linear Regression
data.usgs.gov
catalog.data.gov
Updated Feb 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Farmer (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. http://doi.org/10.5066/P9T5ZEXV
Explore at:
Unique identifier
https://doi.org/10.5066/P9T5ZEXV
Dataset updated
Feb 24, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
William Farmer
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
1956 - 2016
Description
This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.
f
Supplement 1. R code demonstrating how to fit a logistic regression model,...
figshare.com
wiley.figshare.com
html
Updated Aug 9, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David I. Warton; Francis K. C. Hui (2016). Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference. [Dataset]. http://doi.org/10.6084/m9.figshare.3550407.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3550407.v1
Dataset updated
Aug 9, 2016
Dataset provided by
Wiley
Authors
David I. Warton; Francis K. C. Hui
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
File List glmmeg.R: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data. boot.glmm.R: R code for estimating P-values by applying the bootstrap to a GLMM likelihood ratio statistic. Description glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate. boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.
f
Additional file 18: R code. of Evaluation of logistic regression models and...
datasetcatalog.nlm.nih.gov
springernature.figshare.com
Updated Feb 7, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Choi, Seung; DeStefano, Anita; Dupuis, JosĂŠe; Lunetta, Kathryn; Labadorf, Adam; Myers, Richard (2017). Additional file 18: R code. of Evaluation of logistic regression models and effect of covariates for caseâ€“control study in RNA-Seq analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001797663
Explore at:
Dataset updated
Feb 7, 2017
Authors
Choi, Seung; DeStefano, Anita; Dupuis, JosĂŠe; Lunetta, Kathryn; Labadorf, Adam; Myers, Richard
Description
This R code regenerates the simulated data sets. (R 6 kb)
d
Data from: Data to support Leveraging machine learning to automate...
catalog.data.gov
data.usgs.gov
+2more
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data to support Leveraging machine learning to automate regression model evaluations for large multi-site water-quality trend studies [Dataset]. https://catalog.data.gov/dataset/data-to-support-leveraging-machine-learning-to-automate-regression-model-evaluations-for-l
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data release contains one dataset and one model archive in support of the journal article "Leveraging machine learning to automate regression model evaluations for large multi-site water-quality trend studies" by Jennifer C. Murphy and Jeffrey G. Chanat. The model archive contains scripts (run in R) to reproduce the four machine learning models (logistic regression, linear and quadratic discriminant analysis, and k-nearest neighbors) trained and tested as part of the journal article. The dataset contains the estimated probabilities for each of these models when applied to a training and test dataset.
f
R script and dataset for Bayesian hierarchical logistic modeling of percent...
datasetcatalog.nlm.nih.gov
figshare.com
+1more
Updated Mar 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brenden, Travis (2017). R script and dataset for Bayesian hierarchical logistic modeling of percent male in sea lamprey populations in lentic and river environments [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001810488
Explore at:
Dataset updated
Mar 1, 2017
Authors
Brenden, Travis
Description
R code for conducting analyses described in Johnson, N.S., W.D. Swink, and T.O. Brenden. Field study suggests that sex determination in sea lamprey is directly influenced by larval growth rate. Proceedings of the Royal Society B.data.csv is the raw data for fitting the Bayesian hierarchical logistic regression modelRead me_Metadata... is the metadata describing the variables in the data.csv fileRscript.R is the R script for fitting the Bayesian hierarchical logistic regression model
Data in R package LOGIT
kaggle.com
zip
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daisy S (2023). Data in R package LOGIT [Dataset]. https://www.kaggle.com/daisyamber/data-in-r-package-logit
Explore at:
zip(71179 bytes)Available download formats
Dataset updated
Nov 7, 2023
Authors
Daisy S
License
http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
Description
Data for Hilbe, J.M. 2015. Practical Guide to Logistic Regression (Chapman and Hall/CRC Press).

Version: 1.3

CRAN: https://CRAN.R-project.org/package=LOGIT (removed)

CRAN archive: https://cran.r-project.org/src/contrib/Archive/LOGIT (archived on 2018-5-10)

Mirror: GitHub
B
Replication Data for: Site C Logistic Regression model
borealisdata.ca
Updated Nov 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Taylor (2025). Replication Data for: Site C Logistic Regression model [Dataset]. http://doi.org/10.5683/SP3/MA1ATA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/MA1ATA
Dataset updated
Nov 18, 2025
Dataset provided by
Borealis
Authors
Eric Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Site C dam, Canada, Ft St John, Peace River, BC
Description
The data are genetic assignments to upstream or downstream of Site C dam (bull trout, Arctic grayling, and rainbow trout). Columns are defined in the csv file. Also file of R code to run analysis
Analyses.R
figshare.com
datasetcatalog.nlm.nih.gov
txt
Updated Jan 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mu (2020). Analyses.R [Dataset]. http://doi.org/10.6084/m9.figshare.11511219.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11511219.v3
Dataset updated
Jan 9, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Mu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Chi-square test and logistic regression model in the study.
f
Statistical summary of dataset1.
plos.figshare.com
xls
Updated Dec 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahim Al-Shourbaji; Pramod H. Kachare; Abdoh Jabbari; Raimund Kirner; Digambar Puri; Mostafa Mehanawi; Abdalla Alameen (2024). Statistical summary of dataset1. [Dataset]. http://doi.org/10.1371/journal.pone.0314391.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0314391.t001
Dataset updated
Dec 20, 2024
Dataset provided by
PLOS ONE
Authors
Ibrahim Al-Shourbaji; Pramod H. Kachare; Abdoh Jabbari; Raimund Kirner; Digambar Puri; Mostafa Mehanawi; Abdalla Alameen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the contemporary context of a burgeoning energy crisis, the accurate and dependable prediction of Solar Radiation (SR) has emerged as an indispensable component within thermal systems to facilitate renewable energy generation. Machine Learning (ML) models have gained widespread recognition for their precision and computational efficiency in addressing SR prediction challenges. Consequently, this paper introduces an innovative SR prediction model, denoted as the Cheetah Optimizer-Random Forest (CO-RF) model. The CO component plays a pivotal role in selecting the most informative features for hourly SR forecasting, subsequently serving as inputs to the RF model. The efficacy of the developed CO-RF model is rigorously assessed using two publicly available SR datasets. Evaluation metrics encompassing Mean Absolute Error (MAE), Mean Squared Error (MSE), and coefficient of determination (R2) are employed to validate its performance. Quantitative analysis demonstrates that the CO-RF model surpasses other techniques, Logistic Regression (LR), Support Vector Machine (SVM), Artificial Neural Network, and standalone Random Forest (RF), both in the training and testing phases of SR prediction. The proposed CO-RF model outperforms others, achieving a low MAE of 0.0365, MSE of 0.0074, and an R2 of 0.9251 on the first dataset, and an MAE of 0.0469, MSE of 0.0032, and R2 of 0.9868 on the second dataset, demonstrating significant error reduction.
Geographically Weighted Regression in R
kaggle.com
zip
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fajar Krisna (2023). Geographically Weighted Regression in R [Dataset]. https://www.kaggle.com/datasets/fajarkrisnajaya/geographically-weighted-regression-in-r
Explore at:
zip(26311 bytes)Available download formats
Dataset updated
Jul 1, 2023
Authors
Fajar Krisna
Description
Implementation of GWR in R

This repository contains code and files related to my project on Geographically Weighted Regression (GWR) in R. The dataset is from Badan Pusat Statistik. Files

Dataset.xlsx: This file contains the dataset used in the analysis. GWLR.R: This script implements Geographically Weighted Logistic Regression in R. GWPR.r: This script implements Geographically Weighted Poisson Regression in R. GWR.R: This script implements Geographically Weighted Regression in R.

License

This project is licensed under the MIT License. See the LICENSE file for more details. Contact

If you have any questions or suggestions, feel free to contact me.
f
Initial set of covariates for logistic regression.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jul 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Byington, Carrie L.; Stockmann, Chris; Ampofo, Krow; Pavia, Andrew T.; Adler, Frederick R. (2018). Initial set of covariates for logistic regression. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000626750
Explore at:
Dataset updated
Jul 25, 2018
Authors
Byington, Carrie L.; Stockmann, Chris; Ampofo, Krow; Pavia, Andrew T.; Adler, Frederick R.
Description
Initial set of covariates for logistic regression.
m
Data for: A systematic review showed no performance benefit of machine...
data.mendeley.com
search.datacite.org
Updated Mar 14, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ben Van Calster (2019). Data for: A systematic review showed no performance benefit of machine learning over logistic regression for clinical prediction models [Dataset]. http://doi.org/10.17632/sypyt6c2mc.1
Explore at:
Unique identifier
https://doi.org/10.17632/sypyt6c2mc.1
Dataset updated
Mar 14, 2019
Authors
Ben Van Calster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The uploaded files are:

1) Excel file containing 6 sheets in respective Order: "Data Extraction" (summarized final data extractions from the three reviewers involved), "Comparison Data" (data related to the comparisons investigated), "Paper level data" (summaries at paper level), "Outcome Event Data" (information with respect to number of events for every outcome investigated within a paper), "Tuning Classification" (data related to the manner of hyperparameter tuning of Machine Learning Algorithms).

2) R script used for the Analysis (In order to read the data, please: Save "Comparison Data", "Paper level data", "Outcome Event Data" Excel sheets as txt files. In the R script srpap: Refers to the "Paper level data" sheet, srevents: Refers to the "Outcome Event Data" sheet and srcompx: Refers to " Comparison data Sheet".

3) Supplementary Material: Including Search String, Tables of data, Figures

4) PRISMA checklist items
[A Procedure for Multilevel Logistic Modeling] Appendix, Datasets, and...
figshare.com
pdf
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Sommet (2024). [A Procedure for Multilevel Logistic Modeling] Appendix, Datasets, and Syntax Files [Dataset]. http://doi.org/10.6084/m9.figshare.5350786.v6
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5350786.v6
Dataset updated
May 29, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Nicolas Sommet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For each software, a series of sub-appendices describes the way to handle each stage of the procedure;- For each software, a .zip file contains the dataset in .dta (for Stata), .rdata (for R), .dat (for Mplus), and .sav (for SPSS), as well as the syntax file(s) in .do (for Stata), .R (for R), .inp (for Mplus), and .sps (for SPSS)- The dataset is also provided in .csv format.If you notice a mistake in the Stata or SPSS-related Appendices and/or syntax files, please report it to Nicolas Sommet (nicolas.sommet@unil.ch). If you notice a mistake in the R or Mplus-related Appendices and/or syntax files, please report it to Davide Morselli (davide.morselli@unil.ch).Sommet, N. and Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30, 203–218, DOI: https://doi.org/10.5334/irsp.90
Telecom Churn - Logistic Regression
kaggle.com
zip
Updated Mar 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav B R (2023). Telecom Churn - Logistic Regression [Dataset]. https://www.kaggle.com/datasets/gauravbr/telecom-churn-logistic-regression
Explore at:
zip(269772 bytes)Available download formats
Dataset updated
Mar 6, 2023
Authors
Gaurav B R
Description
Dataset

This dataset was created by Gaurav B R

Contents
Additional file 1 of Logistic regression has similar performance to...
springernature.figshare.com
commons.datacite.org
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anita L. Lynam; John M. Dennis; Katharine R. Owen; Richard A. Oram; Angus G. Jones; Beverley M. Shields; Lauric A. Ferrat (2023). Additional file 1 of Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults [Dataset]. http://doi.org/10.6084/m9.figshare.12569144.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12569144.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Anita L. Lynam; John M. Dennis; Katharine R. Owen; Richard A. Oram; Angus G. Jones; Beverley M. Shields; Lauric A. Ferrat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Figure S1. Flow diagram of participants through the model development stages. T1D: type 1 diabetes, T2D: type 2 diabetes. Figure S2. ROC AUC plots obtained using external validation dataset for seven prediction models. Legend: Solid lines: black = Support Vector Machine, dark grey = Logistic Regression, light grey = Random Forest. Dotted lines: black = Neural Network, dark grey = K-Nearest Neighbours, light grey = Gradient Boosting Machine. Figure S3. Correlation coefficient matrix and scatter plot of model predictions obtained from external test validation data.
f
Univariable logistic regression models.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Apr 15, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pharris, Anastasia; Semenza, Jan C.; Malliori, Melpomeni-Minerva; Sypsa, Vana; Nikolopoulos, Georgios K.; Fotiou, Anastasios; Friedman, Samuel R.; Hatzakis, Angelos; Costa-Storti, Claudia; Kanavou, Eleftheria; Detsis, Marios; Suk, Jonathan E.; Richardson, Clive; Paraskevis, Dimitrios (2015). Univariable logistic regression models. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001847795
Explore at:
Dataset updated
Apr 15, 2015
Authors
Pharris, Anastasia; Semenza, Jan C.; Malliori, Melpomeni-Minerva; Sypsa, Vana; Nikolopoulos, Georgios K.; Fotiou, Anastasios; Friedman, Samuel R.; Hatzakis, Angelos; Costa-Storti, Claudia; Kanavou, Eleftheria; Detsis, Marios; Suk, Jonathan E.; Richardson, Clive; Paraskevis, Dimitrios
Description
The dependent variable was dichotomous taking value 1 for years in which a European Economic Area (EEA) country was experiencing an HIV outbreak, 0 otherwise. The results include Odds Ratios (OR), Lower (L) and Upper (U) limits of the confidence interval (CI), P-values, the number of Observations (Obs) in each model, and the number of countries (C) from which data were obtained for at least one year.†Per capita or per population††PWI: Public Wealth Index = GDP per capita divided by S80/S20 ratio.Univariable logistic regression models.
Bank Loan Approval - LR, DT, RF and AUC
kaggle.com
zip
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vikram amin (2023). Bank Loan Approval - LR, DT, RF and AUC [Dataset]. https://www.kaggle.com/datasets/vikramamin/bank-loan-approval-lr-dt-rf-and-auc
Explore at:
zip(61437 bytes)Available download formats
Dataset updated
Nov 7, 2023
Authors
vikram amin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DATASET: Dependent variable is 'Personal.Loan'. 0 indicates loan not approved and 1 indicates loan approved.

OBJECTIVE : We will do Exploratory Data Analysis and use Logistic Regression, Decision Tree, Random Forest and AUC to find out which is the best model. Steps:

Set the working directory and read the data

Check the data types of all the variables https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F020afd07cf0c5ba058d88add9bcd467a%2FPicture1.png?generation=1699357564112927&alt=media" alt="">

DATA CLEANING

We need to change the data types of certain variables to factor vector

Check for missing data, duplicate records and remove insignificant variables https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fa286a5225207d4419b34bcf800e3cb67%2FPicture2.png?generation=1699357685993423&alt=media" alt="">

New data frame created called 'bank1' after dropping the 'ID' column.

EXPLORATORY DATA ANALYSIS

We will try to get some insights by digging into the data through bar charts and box plots which can help the bank management in decision making

Run the required libraries https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F7363f4b9ca8245b6e998bf07005fa099%2FPicture3.png?generation=1699357871368520&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8dba10f16fc6c2d7fd51a4c82a692136%2FCount%20of%20Loans%20Approved%20%20Not%20Approved.jpeg?generation=1699357967347355&alt=media" alt="">

Out of the total 5000 customers, 4520 have not been approved for a loan while 480 have been https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fe5eec968e7b264d9ec540bd1f24379fd%2FPicture4.png?generation=1699358066228901&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fb64eba6f373d5c043c9f504cfa348a75%2FPicture5.png?generation=1699358103026827&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F94608993dc12cdc31cfeca92932e0cb5%2FBoxPlot%20Income%20and%20Family.jpeg?generation=1699358148840198&alt=media" alt="">

THIS INDICATES THAT INCOME IS HIGHER WHEN THERE ARE LESS FAMILY MEMBERS https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8e44daf4ed42094f71c3000737f07a32%2FPicture6.png?generation=1699360599956530&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F0fd9010b95acf9ad20f7b9d0e171f305%2FBoxplot%20between%20Income%20%20Personal%20Loan.jpeg?generation=1699359231020725&alt=media" alt="">

THIS INDICATES PERSONAL LOAN HAS BEEN APPROVED FOR CUSTOMERS HAVING HIGHER INCOME https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Ff817481849aba7f176b7c4d0147308de%2FPicture7.png?generation=1699360768102069&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8e0bad8c76aaa11fe3b9909721d587f5%2FBoxPlot%20between%20Income%20%20Credit%20Cards.jpeg?generation=1699360798538907&alt=media" alt="">

THIS INDICATES THAT THE INCOME IS PRETTY SIMILAR FOR CUSTOMERS OWNING AND NOT OWNING A CREDIT CARD https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fab4b2fd2fde2a009bceb05a5a1161040%2FPicture8.png?generation=1699360882879480&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fe747dfa315609c4907ea83a9ac7f482c%2FBoxPlot%20between%20Income%20Class%20%20Mortgage.jpeg?generation=1699359265603058&alt=media" alt="">

CUSTOMERS BELONGING TO THE RICH CLASS (INCOME GROUP : 150-200) HAVE THE HIGHEST MORTGAGE https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6552d3fb9564b3ab3239ef67ed17a098%2FPicture9.png?generation=1699360938106437&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F4c7c7077e26229f455c1d9ef6e83195f%2FBoxPlot%20between%20CC%20Avg%20and%20Online%20Banking.jpeg?generation=1699359306645100&alt=media" alt="">

CC AVG IS PRETTY SIMILAR FOR THOSE WHO OPTED FOR ONLINE SERVICES AND THOSE WHO DID NOT
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Feddee2ca08a8138bb54eed0c25750280%2FPicture10.png?generation=1699360994581181&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F6127e25258b25ccfbae66a5463a72773%2FBoxplot%20between%20CC%20Avg%20and%20Education.jpeg?generation=1699359333295827&alt=media" alt="">

MORE EDUCATED CUSTOMERS HAVE A HIGHER CREDIT AVERAGE ![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F...
Carvana IsBadBuy? -Logistic RGS
kaggle.com
zip
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Selda Firouzmand (2025). Carvana IsBadBuy? -Logistic RGS [Dataset]. https://www.kaggle.com/datasets/seldafirouzmand/carvana-isbadbuy-csunmrk-case2
Explore at:
zip(1184 bytes)Available download formats
Dataset updated
Mar 4, 2025
Authors
Selda Firouzmand
Description
This R script is designed to build a high-performance logistic regression model for predicting whether a car is a bad buy (IsBadBuy) using the Carvana dataset. It improves prediction accuracy by:

Handling Missing Values – Uses median for numeric and mode for categorical variables instead of replacing NULLs with zero. Feature Engineering – Adds log transformation for VehBCost and dummy encodes categorical variables for better model performance. Model Training & Evaluation – Runs a logistic regression model, calculates McFadden’s pseudo R² for model fit, and generates a hit rate score for accuracy. Prediction & Submission – Predicts IsBadBuy for the test set and creates a submission file (optimized_submission.csv) in the required Kaggle format.

Facebook

Twitter

Click to copy link

Link copied

Cite

Veronica Zheng (2020). Diabetes data [Dataset]. https://www.kaggle.com/veronicazheng/diabetes-data

Diabetes data

Source: https://www.coursera.org/learn/logistic-regression-r-public-health

Explore at:

zip(13479 bytes)Available download formats

Dataset updated

Jul 9, 2020

Authors

Veronica Zheng

Description

Dataset

This dataset was created by Veronica Zheng

Released under Other (specified in description)

Clear search

Close search

Google apps

Main menu

Diabetes data

Dataset

Contents

logistic regression

Dataset

Contents

An example data set for exploration of Multiple Linear Regression

Supplement 1. R code demonstrating how to fit a logistic regression model,...

Additional file 18: R code. of Evaluation of logistic regression models and...

Data from: Data to support Leveraging machine learning to automate...

R script and dataset for Bayesian hierarchical logistic modeling of percent...

Data in R package LOGIT

Replication Data for: Site C Logistic Regression model

Analyses.R

Statistical summary of dataset1.

Geographically Weighted Regression in R

Initial set of covariates for logistic regression.

Data for: A systematic review showed no performance benefit of machine...

[A Procedure for Multilevel Logistic Modeling] Appendix, Datasets, and...

Telecom Churn - Logistic Regression

Dataset

Contents

Additional file 1 of Logistic regression has similar performance to...

Univariable logistic regression models.

Bank Loan Approval - LR, DT, RF and AUC

Carvana IsBadBuy? -Logistic RGS

Diabetes dataSee More Versions

Source: https://www.coursera.org/learn/logistic-regression-r-public-health

Dataset

Contents

Diabetes data