Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.
The dataset contains 45,000 records and 14 variables, each described below:
Column | Description | Type |
---|---|---|
person_age | Age of the person | Float |
person_gender | Gender of the person | Categorical |
person_education | Highest education level | Categorical |
person_income | Annual income | Float |
person_emp_exp | Years of employment experience | Integer |
person_home_ownership | Home ownership status (e.g., rent, own, mortgage) | Categorical |
loan_amnt | Loan amount requested | Float |
loan_intent | Purpose of the loan | Categorical |
loan_int_rate | Loan interest rate | Float |
loan_percent_income | Loan amount as a percentage of annual income | Float |
cb_person_cred_hist_length | Length of credit history in years | Float |
credit_score | Credit score of the person | Integer |
previous_loan_defaults_on_file | Indicator of previous loan defaults | Categorical |
loan_status (target variable) | Loan approval status: 1 = approved; 0 = rejected | Integer |
The dataset can be used for multiple purposes:
loan_status
variable (approved/not approved) for potential applicants.credit_score
variable based on individual and loan-related attributes. Mind the data issue from the original data, such as the instance > 100-year-old as age.
This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Kamran Ansari
Released under Database: Open Database, Contents: Database Contents
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains detailed synthetic records of bank loan applications, including applicant demographics, financial background, loan request details, and final approval or denial outcomes. It is ideal for developing and benchmarking predictive models for credit risk assessment, as well as for analyzing approval patterns and fairness in lending decisions.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a classification problem. The dataset contains 13 columns where, the loan_status column is the one we have to predict.
Variable | Description |
---|---|
Loan_ID | Unique Loan ID |
Gender | Male/ Female |
Married | Applicant married (Y/N) |
Dependents | Number of dependents |
Education | Applicant Education (Graduate/ Under Graduate) |
Self_Employed | Self employed (Y/N) |
ApplicantIncome | Applicant income |
CoapplicantIncome | Coapplicant income |
LoanAmount | Loan amount in thousands |
Loan_Amount_Term | Term of loan in months |
Credit_History | credit history meets guidelines |
Property_Area | Urban/ Semi Urban/ Rural |
Loan_Status | (Target) Loan approved (Y/N) |
Abstract
For any bank or financial institution, managing loans and controlling leverage is one of the most
important tasks they have to undertake. A bank cannot function efficiently without a well-
designed loan-to-deposit business model. As technology continues to evolve, the mechanism of
handling and granting loans underwent a significant change with the introduction of use cases
concerning machine learning and data science.
Hence, this data-driven research utilized advanced machine learning techniques to analyze and
manipulate the data, aiming to predict the best possible way to recommend a loan to a client.
These predictions are based on modified yet unique features created from the data obtained from
the client. The dataset was tested using two different methodologies: a logistic regression model
and a Neural Network algorithm. Both of these methodologies produced high-level accuracy
rates. However, the latter outperformed the currently used methodologies by over 20%, resulting
in an accuracy of 90%.
The successful research results were obtained due to the use of a perfectly balanced, unbiased,
and cleaned dataset, as well as the well-executed combination of activation functions for the
Neural Network model. A performance assessment was conducted based on a confusion matrix
evaluation to demonstrate its feasibility and performance
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data of persons relating to loan approval status
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The (Cleaned) Credit Score Dataset for Classification Dataset is a structured dataset designed for training machine learning models to classify individuals into credit score categories based on various credit-related attributes.
2) Data Utilization (1) Characteristics of the (Cleaned) Credit Score Dataset for Classification Dataset: • The dataset includes key financial variables that influence credit scoring, such as delinquency history, credit limit, credit utilization ratio, and repayment records. The credit score category serves as the multiclass classification label.
(2) Applications of the (Cleaned) Credit Score Dataset for Classification Dataset: • Credit score classification model training: The dataset can be used to train machine learning models that predict an individual’s credit score category based on financial indicators. • Financial risk assessment and customer segmentation: It can support tasks such as loan approval decision-making, interest rate setting, and personalized financial product recommendations by identifying a customer’s credit level in advance.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains preprocessed loan records from a large-scale financial dataset, designed for loan default prediction modeling. It includes a wide range of features related to borrower profiles, loan terms, and historical repayment behavior.
This dataset has been cleaned, preprocessed, and structured for use in loan default prediction modeling. It includes most normalized numerical features, and a binary target column indicating whether a loan defaulted or not.
Binary classification for loan default prediction Credit risk modeling Financial machine learning Loan approval system development Model benchmarking and testing
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Overview This dataset contains 45,000 records of loan applicants, with various attributes related to personal demographics, financial status, and loan details. The dataset can be used for predictive modeling, particularly in credit risk assessment and loan default prediction.
Dataset Content The dataset includes 14 columns representing different factors influencing loan approvals and defaults:
Personal Information
person_age: Age of the applicant (in years). person_gender: Gender of the applicant (male, female). person_education: Educational background (High School, Bachelor, Master, etc.). person_income: Annual income of the applicant (in USD). person_emp_exp: Years of employment experience. person_home_ownership: Type of home ownership (RENT, OWN, MORTGAGE). Loan Details
loan_amnt: Loan amount requested (in USD). loan_intent: Purpose of the loan (PERSONAL, EDUCATION, MEDICAL, etc.). loan_int_rate: Interest rate on the loan (percentage). loan_percent_income: Ratio of loan amount to income. Credit & Loan History
cb_person_cred_hist_length: Length of the applicant's credit history (in years). credit_score: Credit score of the applicant. previous_loan_defaults_on_file: Whether the applicant has previous loan defaults (Yes or No). Target Variable
loan_status: 1 if the loan was repaid successfully, 0 if the applicant defaulted. Use Cases Loan Default Prediction: Build a classification model to predict loan repayment. Credit Risk Analysis: Analyze the relationship between income, credit score, and loan defaults. Feature Engineering: Extract new insights from employment history, home ownership, and loan amounts. Acknowledgments This dataset is synthetic and designed for machine learning and financial risk analysis.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The credit_risk Dataset is a structured dataset designed to predict loan default status (default) based on a customer’s financial condition, credit history, and loan-related information. Each sample includes various features necessary for assessing the applicant’s credit risk.
2) Data Utilization (1) Characteristics of the credit_risk Dataset: • The dataset includes key financial indicators such as current account balance, savings balance, loan amount, job type, and number of existing loans. The default column serves as a binary classification label indicating whether the customer failed to repay the loan.
(2) Applications of the credit_risk Dataset: • Loan default prediction model training: The dataset can be used to train machine learning-based binary classification models that estimate a customer’s credit risk in advance and support decisions on loan approvals. • Credit risk analysis and policy development: By analyzing the relationship between financial status and credit history, the dataset can help in setting credit scoring criteria, adjusting risk-based interest rates, and personalizing financial services.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Some of the applications are as follows :
1)Credit Risk Assessment: Banks and financial institutions can leverage the dataset to develop models for assessing the credit risk associated with loan applicants. This involves predicting the likelihood of loan default based on various features.
2)Loan Portfolio Management: Financial organizations can use the dataset to manage and optimize their loan portfolios. This includes diversifying risk, setting interest rates, and making informed decisions about loan approval or denial.
3)Market Trend Analysis: By analyzing the dataset, researchers and analysts can identify trends in borrower behavior, regional variations, and shifts in loan purposes. This information can be valuable for making data-driven market predictions.
4)Customer Segmentation: Understanding the characteristics of different borrower segments can help banks tailor their services and products. This dataset can be used for clustering customers based on attributes like income, employment length, and loan history.
5)Regulatory Compliance: Financial institutions can use the dataset to ensure compliance with regulations. For example, assessing whether loans are being offered fairly across different demographics and regions.
6)Machine Learning Model Development: Data scientists can use this dataset to develop and test machine learning models for predicting loan outcomes. This can include classification tasks such as predicting loan approval or denial.
7)Lending Strategy Optimization: Banks can optimize their lending strategies by analyzing patterns in loan amounts, interest rates, and repayment behavior. This could involve adjusting lending criteria to attract desirable borrowers.
8)Fraud Detection: The dataset may be used to identify patterns indicative of fraudulent loan applications. Unusual patterns in borrower information could be flagged for further investigation.
loan.csv
:
In this file there are 18 columns:
loanId
: This is a unique loan identifier. Use this for joins with the payment.csv file anon_ssn
: This is a hash based on a client’s SSN (Anonymous ssn). You can use this as if it is a SSN to compare if a loan belongs to a previous customer.payFrequency
: This column represents repayment frequency of the loan:
B
is biweekly paymentsI
is irregularM
is monthlyS
is semi monthlyW
is weeklyapr
: Annual Percentage Rate of the loan (%)applicationDate
: Date of application (start date)originated
: Indicates if the loan has been initiated (underwriting process started).originatedDate
: Date of origination, day the loan was originatednPaidOff
: Number of MoneyLion loans previously paid off by the client.approved
: Indicates if the loan has been approved (final step of underwriting).isFunded
: Whether or not a loan is ultimately funded. a loan can be voided by a customer shortly after it is approved, so not all approved loans are ultimately funded.loanStatus
: Current loan status (this column is used for prediction). Most are selfexplanatory. Below are the statuses which need clarification:
Withdrawn Application
: The applicant has withdrawn their loan application before it was approved or funded.Paid Off Loan
: The loan has been fully paid off by the borrower according to the repayment terms.Rejected
: The loan application was rejected, typically due to failure to meet underwriting criteria.New Loan
: A newly approved loan that has not yet been funded.Internal Collection
: The loan is being managed and collected internally by MoneyLion due to missed payments or delinquency.CSR Voided New Loan
: A new loan application was voided by a customer service representative (CSR) before funding.External Collection
: The loan has been transferred to an external collection agency for management and collection.Returned Item
: A payment on the loan has been returned due to insufficient funds in the borrower's account.Customer Voided New Loan
: The borrower voided a new loan application before funding.Credit Return Void
: The loan was voided due to a credit return, typically related to a refunded transaction.Pending Paid Off
: The loan is in the process of being paid off, but the process is pending completion.Charged Off Paid Off
: The loan has been charged off as a loss by MoneyLion but has also been paid off by the borrower.Settled Bankruptcy
: The loan has been settled as part of a bankruptcy proceeding.Settlement Paid Off
: The loan has been paid off through a settlement agreement.Charged Off
: The loan has been charged off as a loss by MoneyLion due to nonpayment.Pending Rescind
: The loan is pending rescission, meaning it may be canceled or reversed.Customver Voided New Loan
: Typo: Likely should be "Customer Voided New Loan". Similar to "Customer Voided New Loan", indicating the borrower voided a new loan application before funding.Pending Application
: The loan application is pending review and approval.Voided New Loan
: The loan application was voided before funding.• Pending Application Fee: The loan application is pending due to the application fee not being paid.Settlement Pending Paid Off
: The loan is pending being paid off through a settlement agreement.loanAmount
: Principal amount of the loan ('Dollars') (for non-funded loans this will be the principal in the loan application)originallyScheduledPaymentAmount
: This is the Initialy scheduled repayment amount ('Dollars') (if a customer pays off all his scheduled payments, this is the amount we should receive)state
: State of the clientLead type
: The lead type determines the underwriting rules for a lead.
bvMandatory
: leads that are bought from the ping tree – required to perform bank verification before loan approvallead
: very similar to bvMandatory, except bank verification is optional for loan approvalcalifornia
: similar to lead, but optimized for California lending rulesorganic
: customers that came through the MoneyLion websiterc_returning
: customers who have at least 1 paid off loan in another loan portfolio. (The first paid off loan is not in this data set).prescreen
: preselected customers who have been offered a loan through direct mail campaignsexpress
: promotional “express” loansrepeat
: promotional loans offered through ...Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mortgage Approvals in the United Kingdom increased to 65.35 Thousand in July from 64.57 Thousand in June of 2025. This dataset provides the latest reported value for - United Kingdom Mortgage Approvals - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
By UCI [source]
This dataset explores the phenomenon of credit card application acceptance or rejection. It includes a range of both continuous and categorical attributes, such as the applicant's gender, credit score, and income; as well as details about recent credit card activity including balance transfers and delinquency. This data presents a unique opportunity to investigate how these different attributes interact in determining application status. With careful analysis of this dataset, we can gain valuable insights into understanding what factors help ensure a successful application outcome. This could lead us to developing more effective strategies for predicting and improving financial credit access for everyone
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is an excellent resource for researching the credit approval process, as it provides a variety of attributes from both continuous and categorical sources. The aim of this guide is to provide tips and advice on how to make the most out of this dataset. - Understand the data: Before attempting to work with this dataset, it's important to understand what kind of information it contains. Since there is a mix of continuous and categorical attributes in this data set, make sure you familiarise yourself with all the different columns before proceeding further. - Exploratory Analysis: It is recommended that you conduct some exploratory analysis on your data in order to gain an overall understanding of its characteristics and distributions. By investigating things like missing values and correlations between different independent variables (IVs) or dependent variables (DVs), you can better prepare yourself for making meaningful analyses or predictions in further steps. - Data Cleaning: Once you have familiarised yourself with your data, begin cleaning up any potential discrepancies such as missing values or outliers by replacing them appropriately or removing them from your dataset if necessary - Feature Selection/Engineering: After cleansing your data set, feature selection/engineering may be necessary if certain columns are redundant or not proving useful for constructing meaningful models/analyses over your data set (usually observed after exploratory analysis). You should be very mindful when deciding which features should be removed so that no information about potentially important relationships are lost!
- Model Building/Analysis: Now that our data has been pre-processed appropriately we can move forward with developing our desired models / analyses over our newly transformed datasets!
- Developing predictive models to identify customers who are likely to default on their credit card payments.
- Creating a risk analysis system that can identify customers who pose a higher risk for fraud or misuse of their credit cards.
- Developing an automated lending decision system that can use the data points provided in the dataset (i.e., gender, average monthly balance, etc.) to decide whether or not to approve applications for new credit lines and loans
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: crx.data.csv | Column name | Description | |:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------| | b | Gender (Categorical) | | 30.83 | Average Monthly Balance (Continuous) | | 0 | Number of Months Since Applicant's Last Delinquency (Continuous) | | w | Number of Months Since Applicant's Last Credit Card Approval (Continuous) | | 1.25 | Number Of Months since The applicant's last balance increase (Continuous) ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The benchmark interest rate in China was last recorded at 3 percent. This dataset provides the latest reported value for - China Interest Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Mortgage rates increased at a record pace in 2022, with the 10-year fixed mortgage rate doubling between March 2022 and December 2022. With inflation increasing, the Bank of England introduced several bank rate hikes, resulting in higher mortgage rates. In May 2025, the average 10-year fixed rate interest rate reached **** percent. As borrowing costs get higher, demand for housing is expected to decrease, leading to declining market sentiment and slower house price growth. How have the mortgage hikes affected the market? After surging in 2021, the number of residential properties sold declined in 2023, reaching just above *** million. Despite the number of transactions falling, this figure was higher than the period before the COVID-19 pandemic. The falling transaction volume also impacted mortgage borrowing. Between the first quarter of 2023 and the first quarter of 2024, the value of new mortgage loans fell year-on-year for five straight quarters in a row. How are higher mortgages affecting homebuyers? Homeowners with a mortgage loan usually lock in a fixed rate deal for two to ten years, meaning that after this period runs out, they need to renegotiate the terms of the loan. Many of the mortgages outstanding were taken out during the period of record-low mortgage rates and have since faced notable increases in their monthly repayment. About **** million homeowners are projected to see their deal expire by the end of 2026. About *** million of these loans are projected to experience a monthly payment increase of up to *** British pounds by 2026.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Context, Sources, and Inspirations Behind the Dataset When developing a hybrid model that combines human-like reasoning with neural network precision, the choice of dataset is crucial. The datasets used in training such a model were selected and curated based on specific goals and requirements, drawing inspiration from a variety of contexts. Below is a breakdown of the datasets, their origins, sources, and the inspirations behind selecting them:
Inspiration: Widely recognized for image classification and object detection tasks. They provide a large and varied set of labeled images, covering thousands of object categories. Source: Open datasets maintained by research communities. Usage: Used for training and testing the vision component of the hybrid model, focusing on object recognition and scene understanding. MultiWOZ (Multi-Domain Wizard-of-Oz):
Inspiration: A comprehensive dialogue dataset covering multiple domains (e.g., restaurant booking, hotel reservations). Source: Created by dialogue researchers, it provides annotated conversations mimicking real-world human interactions. Usage: Leveraged for training the language understanding and dialogue generation capabilities of the model. ConceptNet:
Inspiration: Designed to provide commonsense knowledge, helping models reason beyond factual information by understanding relationships and contexts. Source: An open-source project that aggregates data from various crowdsourced resources like Wikipedia, WordNet, and Open Mind Common Sense. Usage: Integrated into the reasoning module to improve multi-hop and commonsense reasoning. UCI Machine Learning Repository:
Inspiration: A well-known repository containing diverse datasets for various machine learning tasks, such as loan approval and medical diagnosis. Source: Academic research and publicly available datasets contributed by the research community. Usage: Used for structured data tasks, particularly in financial and healthcare analytics. B. Proprietary and Domain-Specific Datasets Healthcare Records Dataset:
Inspiration: The increasing demand for predictive analytics in healthcare motivated the use of patient records to predict health outcomes. Source: Anonymized data collected from healthcare providers, including patient demographics, medical history, and diagnostic information. Usage: Trained and tested the model's ability to handle regression tasks, such as predicting patient recovery rates and health risks. Financial Transactions and Loan Application Data:
Inspiration: To address risk analytics in financial services, loan application datasets containing applicant profiles, credit scores, and financial history were used. Source: Collaboration with financial institutions provided access to anonymized loan application data. Usage: Focused on classification tasks for loan approval predictions and credit scoring. C. Synthesized Data and Augmented Datasets Synthetic Dialogue Scenarios: Inspiration: To test the model's performance on hypothetical scenarios and rare cases not covered in standard datasets. Source: Generated using rule-based models and simulations to create additional training samples, especially for edge cases in dialogue tasks. Usage: Improved model robustness by exposing it to challenging and less common dialogue interactions. 3. Inspirations Behind the Dataset Choice Diverse Task Requirements: The hybrid model was designed to handle multiple types of tasks (classification, regression, reasoning), necessitating diverse datasets covering different input formats (images, text, structured data). Real-World Relevance: The selected datasets were inspired by real-world use cases in healthcare, finance, and customer service, reflecting common scenarios where such a hybrid model could be applied. Challenging Scenarios: To test the model's reasoning capabilities, datasets like ConceptNet and synthetic scenarios were included, inspired by the need to handle complex logical reasoning and inferencing tasks. Inclusivity and Fairness: Public datasets were chosen to ensure coverage across various demographic groups, reducing bias and improving fairness in predictions. 4. Pre-Processing and Data Preparation Standardization and Normalization: Structured data were ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The value of loans in Sweden increased 2.60 percent in July of 2025 over the same month in the previous year. This dataset provides the latest reported value for - Sweden Household Lending Growth - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bank Lending Rate in Germany decreased to 4 percent in June from 4.09 percent in May of 2025. This dataset provides the latest reported value for - Germany Bank Lending Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
This dataset contains information about credit card balance. This data can be used for a lot of purposes such as credit card balance prediction. The columns in the given dataset are as follows: Income: Income of the customer. Limit: Credit limit provided to the customer. Rating: The customer's credit rating. Cards: The number of credit cards the customer has. Age: Age of the customer. Education: Educational level of the customer. Gender: Sex of the customer. Student: If the customer is a student or not. Married: If the customer is married. Ethnicity: Ethnicity of the customer. Balance: Credit balance of the customer.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.
The dataset contains 45,000 records and 14 variables, each described below:
Column | Description | Type |
---|---|---|
person_age | Age of the person | Float |
person_gender | Gender of the person | Categorical |
person_education | Highest education level | Categorical |
person_income | Annual income | Float |
person_emp_exp | Years of employment experience | Integer |
person_home_ownership | Home ownership status (e.g., rent, own, mortgage) | Categorical |
loan_amnt | Loan amount requested | Float |
loan_intent | Purpose of the loan | Categorical |
loan_int_rate | Loan interest rate | Float |
loan_percent_income | Loan amount as a percentage of annual income | Float |
cb_person_cred_hist_length | Length of credit history in years | Float |
credit_score | Credit score of the person | Integer |
previous_loan_defaults_on_file | Indicator of previous loan defaults | Categorical |
loan_status (target variable) | Loan approval status: 1 = approved; 0 = rejected | Integer |
The dataset can be used for multiple purposes:
loan_status
variable (approved/not approved) for potential applicants.credit_score
variable based on individual and loan-related attributes. Mind the data issue from the original data, such as the instance > 100-year-old as age.
This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.