The purpose of this database is to provide information about a bank's customers so that machine learning models can be developed that can predict whether a particular customer will repay the loan or not.
A retail bank would like to hire you to build a credit default model for their credit card portfolio. The bank expects the model to identify the consumers who are likely to default on their credit card payments over the next 12 months. This model will be used to reduce the bank’s future losses. The bank is willing to provide you with some sample datathat they can currently extract from their systems. This data set (credit_data.csv) consists of 13,444 observations with 14 variables.
Based on the bank’s experience, the number of derogatory reports is a strong indicator of default. This is all that the information you are able to get from the bank at the moment. Currently, they do not have the expertise to provide any clarification on this data and are also unsure about other variables captured by their systems
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Credit Risk Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/laotse/credit-risk-dataset on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Detailed data description of Credit Risk dataset:
| Feature Name | Description |
| --- | --- |
| person_age | Age |
| person_income | Annual Income |
| person_home_ownership | Home ownership |
| person_emp_length | Employment length (in years) |
| loan_intent | Loan intent |
| loan_grade | Loan grade |
| loan_amnt | Loan amount |
| loan_int_rate | Interest rate | |
| loan_status | Loan status (0 is non default 1 is default) |
| loan_percent_income | Percent income |
| cb_person_default_on_file | Historical default |
| cb_preson_cred_hist_length | Credit history length |
--- Original source retains full ownership of the source dataset ---
This dataset was created by Aaron Mathew Alex
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Credit risk’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/upadorprofzs/credit-risk on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The purpose of this database is to provide information about a bank's customers so that machine learning models can be developed that can predict whether a particular customer will repay the loan or not.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘German Credit Risk - With Target’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kabure/german-credit-data-with-risk on 28 January 2022.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 1,000 financial records with five key features and one target variable, Loan Default Risk. It is designed for credit risk analysis, helping to predict whether a customer is likely to default on a loan based on financial attributes.
Income: The individual's annual income. Credit Score: A credit rating score ranging from 300 to 850, where higher values indicate better creditworthiness. Spending Score: A normalized score between 0 and 100, representing the individual's spending habits. Transaction Count: The number of transactions made by the individual in a given period. Savings Ratio: The ratio of savings to income, ranging from 0 to 1. Loan Default Risk (Target): 0: Low risk (likely to repay the loan). 1: High risk (likely to default on the loan).
Feel free to use this dataset for research, projects, or educational purposes. If you use it in a publication, kindly provide attribution.
This dataset was synthetically generated. The features were adjusted to resemble real-world financial data, but they do not represent actual individuals or real financial records.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by 이재한1967
Released under Apache 2.0
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Prajna Prayas
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study explores the potential of utilizing alternative data sources to enhance the accuracy of credit scoring models, compared to relying solely on traditional data sources, such as credit bureau data. A comprehensive dataset from the Home Credit Group’s home loan portfolio is analysed. The research examines the impact of incorporating alternative predictors that are typically overlooked, such as an applicant’s social network default status, regional economic ratings, and local population characteristics. The modelling approach applies the model-X knockoffs framework for systematic variable selection. By including these alternative data sources, the credit scoring models demonstrate improved predictive performance, achieving an area under the curve metric of 0.79360 on the Kaggle Home Credit default risk competition dataset, outperforming models that relied solely on traditional data sources, such as credit bureau data. The findings highlight the significance of leveraging diverse, non-traditional data sources to augment credit risk assessment capabilities and overall model accuracy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘German Credit Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varunchawla30/german-credit-data on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The link to the original dataset can be found below.
It is almost impossible to understand the original dataset due to its complicated system of categories and symbols. Thus, I wrote a small Python script to convert it into a readable CSV file. The column names were also given in German originally. So, they are replaced by English names while processing. The attributes and their details in English are given below:
Source : UCI
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Leonardo Ferreira
Released under CC0: Public Domain
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by SAI PRUDHVI Bodempudi
Released under Apache 2.0
This dataset was created by Amin Uddin
From https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data) This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix.
1000 observations with 20 variables (7 numerical, 13 categorical).
Professor Dr. Hans Hofmann
Institut f"ur Statistik und "Okonometrie
Universit"at Hamburg
FB Wirtschaftswissenschaften
Von-Melle-Park 5
2000 Hamburg 13
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was created by Nick Kinyae
Released under CC BY-SA 4.0
This dataset was created by LanPBC
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Data Professionals Salary - 2022’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamsouravbanerjee/analytics-industry-salaries-2022-india on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns towards effective decision-making. It can be valuable in areas rich with recorded information; analytics relies on the simultaneous application of statistics, computer programming, and operations research to quantify performance.
Organizations may apply analytics to business data to describe, predict, and improve business performance. Specifically, areas within analytics include predictive analytics, prescriptive analytics, enterprise decision management, descriptive analytics, cognitive analytics, Big Data Analytics, retail analytics, supply chain analytics, store assortment and stock-keeping unit optimization, marketing optimization and marketing mix modeling, web analytics, call analytics, speech analytics, sales force sizing and optimization, price and promotion modeling, predictive science, graph analytics, credit risk analysis, and fraud analytics. Since analytics can require extensive computation (see big data), the algorithms and software used for analytics harness the most current methods in computer science, statistics, and mathematics.
This Dataset consists of salaries for Data Scientists, Machine Learning Engineers, Data Analysts, Data Engineers in various cities across India (2022).
For more, please visit: https://www.glassdoor.co.in/
--- Original source retains full ownership of the source dataset ---
This dataset was created by My Mai Trà
This dataset was created by Aniruddha
Released under Data files © Original Authors
The purpose of this database is to provide information about a bank's customers so that machine learning models can be developed that can predict whether a particular customer will repay the loan or not.