24 datasets found
  1. Loan Approval Classification Dataset

    • kaggle.com
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ta-wei Lo (2024). Loan Approval Classification Dataset [Dataset]. https://www.kaggle.com/datasets/taweilo/loan-approval-classification-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ta-wei Lo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    1. Data Source

    This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.

    2. Metadata

    The dataset contains 45,000 records and 14 variables, each described below:

    ColumnDescriptionType
    person_ageAge of the personFloat
    person_genderGender of the personCategorical
    person_educationHighest education levelCategorical
    person_incomeAnnual incomeFloat
    person_emp_expYears of employment experienceInteger
    person_home_ownershipHome ownership status (e.g., rent, own, mortgage)Categorical
    loan_amntLoan amount requestedFloat
    loan_intentPurpose of the loanCategorical
    loan_int_rateLoan interest rateFloat
    loan_percent_incomeLoan amount as a percentage of annual incomeFloat
    cb_person_cred_hist_lengthLength of credit history in yearsFloat
    credit_scoreCredit score of the personInteger
    previous_loan_defaults_on_fileIndicator of previous loan defaultsCategorical
    loan_status (target variable)Loan approval status: 1 = approved; 0 = rejectedInteger

    3. Data Usage

    The dataset can be used for multiple purposes:

    • Exploratory Data Analysis (EDA): Analyze key features, distribution patterns, and relationships to understand credit risk factors.
    • Classification: Build predictive models to classify the loan_status variable (approved/not approved) for potential applicants.
    • Regression: Develop regression models to predict the credit_score variable based on individual and loan-related attributes.

    Mind the data issue from the original data, such as the instance > 100-year-old as age.

    This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.

    Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

  2. Data from: Loan Approval Prediction Dataset

    • kaggle.com
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamran Ansari (2025). Loan Approval Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/korpionn/loan-approval-prediction-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kamran Ansari
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Kamran Ansari

    Released under Database: Open Database, Contents: Database Contents

    Contents

  3. G

    Bank Loan Application Approvals

    • gomask.ai
    csv, json
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Bank Loan Application Approvals [Dataset]. https://gomask.ai/marketplace/datasets/bank-loan-application-approvals
    Explore at:
    csv(10 MB), jsonAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    loan_amount, applicant_id, loan_purpose, applicant_dob, decision_date, denial_reason, interest_rate, application_id, residence_city, approved_amount, and 18 more
    Description

    This dataset contains detailed synthetic records of bank loan applications, including applicant demographics, financial background, loan request details, and final approval or denial outcomes. It is ideal for developing and benchmarking predictive models for credit risk assessment, as well as for analyzing approval patterns and fairness in lending decisions.

  4. Data from: Loan Approval Prediction

    • kaggle.com
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siddharth Sharma (2022). Loan Approval Prediction [Dataset]. https://www.kaggle.com/datasets/ssiddharth408/loan-prediction-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Siddharth Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a classification problem. The dataset contains 13 columns where, the loan_status column is the one we have to predict.

    Columns

    VariableDescription
    Loan_IDUnique Loan ID
    GenderMale/ Female
    MarriedApplicant married (Y/N)
    DependentsNumber of dependents
    EducationApplicant Education (Graduate/ Under Graduate)
    Self_EmployedSelf employed (Y/N)
    ApplicantIncomeApplicant income
    CoapplicantIncomeCoapplicant income
    LoanAmountLoan amount in thousands
    Loan_Amount_TermTerm of loan in months
    Credit_Historycredit history meets guidelines
    Property_AreaUrban/ Semi Urban/ Rural
    Loan_Status(Target) Loan approved (Y/N)
  5. Data from: Optimizing Bank Loan Approval with Cutting-Edge Deep Learning...

    • zenodo.org
    bin
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdalla Mahgoub; Abdalla Mahgoub (2023). Optimizing Bank Loan Approval with Cutting-Edge Deep Learning model [Dataset]. http://doi.org/10.5281/zenodo.10041577
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdalla Mahgoub; Abdalla Mahgoub
    Description

    Abstract

    For any bank or financial institution, managing loans and controlling leverage is one of the most
    important tasks they have to undertake. A bank cannot function efficiently without a well-
    designed loan-to-deposit business model. As technology continues to evolve, the mechanism of
    handling and granting loans underwent a significant change with the introduction of use cases
    concerning machine learning and data science.
    Hence, this data-driven research utilized advanced machine learning techniques to analyze and
    manipulate the data, aiming to predict the best possible way to recommend a loan to a client.
    These predictions are based on modified yet unique features created from the data obtained from
    the client. The dataset was tested using two different methodologies: a logistic regression model
    and a Neural Network algorithm. Both of these methodologies produced high-level accuracy
    rates. However, the latter outperformed the currently used methodologies by over 20%, resulting
    in an accuracy of 90%.
    The successful research results were obtained due to the use of a perfectly balanced, unbiased,
    and cleaned dataset, as well as the well-executed combination of activation functions for the
    Neural Network model. A performance assessment was conducted based on a confusion matrix
    evaluation to demonstrate its feasibility and performance

  6. CPL Prediction

    • kaggle.com
    Updated Jun 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Plavak Das (2020). CPL Prediction [Dataset]. https://www.kaggle.com/plavak10/cpl-prediction/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Plavak Das
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Data of persons relating to loan approval status

  7. c

    (Cleaned) Credit Score for Classification Dataset

    • cubig.ai
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). (Cleaned) Credit Score for Classification Dataset [Dataset]. https://cubig.ai/store/products/504/cleaned-credit-score-for-classification-dataset
    Explore at:
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The (Cleaned) Credit Score Dataset for Classification Dataset is a structured dataset designed for training machine learning models to classify individuals into credit score categories based on various credit-related attributes.

    2) Data Utilization (1) Characteristics of the (Cleaned) Credit Score Dataset for Classification Dataset: • The dataset includes key financial variables that influence credit scoring, such as delinquency history, credit limit, credit utilization ratio, and repayment records. The credit score category serves as the multiclass classification label.

    (2) Applications of the (Cleaned) Credit Score Dataset for Classification Dataset: • Credit score classification model training: The dataset can be used to train machine learning models that predict an individual’s credit score category based on financial indicators. • Financial risk assessment and customer segmentation: It can support tasks such as loan approval decision-making, interest rate setting, and personalized financial product recommendations by identifying a customer’s credit level in advance.

  8. Historical Loan Records with Default Status

    • kaggle.com
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Nishad (2025). Historical Loan Records with Default Status [Dataset]. https://www.kaggle.com/datasets/abhisheknishad8988/defaulter-data/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    Kaggle
    Authors
    Abhishek Nishad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Loan Default Prediction Dataset

    This dataset contains preprocessed loan records from a large-scale financial dataset, designed for loan default prediction modeling. It includes a wide range of features related to borrower profiles, loan terms, and historical repayment behavior.

    This dataset has been cleaned, preprocessed, and structured for use in loan default prediction modeling. It includes most normalized numerical features, and a binary target column indicating whether a loan defaulted or not.

    Want to Improve or Customize It?

    Users are encouraged to:

    • Apply additional feature engineering (e.g., create debt-to-income ratio, rolling averages)
    • Encode categorical variables using different techniques (e.g., one-hot encoding, target encoding)
    • Handle class imbalance with oversampling (SMOTE) or undersampling
    • Perform train/test splitting , cross-validation , or scaling
    • Add derived features based on domain knowledge

    Use Cases:

    Binary classification for loan default prediction Credit risk modeling Financial machine learning Loan approval system development Model benchmarking and testing

  9. bank_loan_data

    • kaggle.com
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uday Malviya (2025). bank_loan_data [Dataset]. http://doi.org/10.34740/kaggle/dsv/10791226
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Uday Malviya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview This dataset contains 45,000 records of loan applicants, with various attributes related to personal demographics, financial status, and loan details. The dataset can be used for predictive modeling, particularly in credit risk assessment and loan default prediction.

    Dataset Content The dataset includes 14 columns representing different factors influencing loan approvals and defaults:

    Personal Information

    person_age: Age of the applicant (in years). person_gender: Gender of the applicant (male, female). person_education: Educational background (High School, Bachelor, Master, etc.). person_income: Annual income of the applicant (in USD). person_emp_exp: Years of employment experience. person_home_ownership: Type of home ownership (RENT, OWN, MORTGAGE). Loan Details

    loan_amnt: Loan amount requested (in USD). loan_intent: Purpose of the loan (PERSONAL, EDUCATION, MEDICAL, etc.). loan_int_rate: Interest rate on the loan (percentage). loan_percent_income: Ratio of loan amount to income. Credit & Loan History

    cb_person_cred_hist_length: Length of the applicant's credit history (in years). credit_score: Credit score of the applicant. previous_loan_defaults_on_file: Whether the applicant has previous loan defaults (Yes or No). Target Variable

    loan_status: 1 if the loan was repaid successfully, 0 if the applicant defaulted. Use Cases Loan Default Prediction: Build a classification model to predict loan repayment. Credit Risk Analysis: Analyze the relationship between income, credit score, and loan defaults. Feature Engineering: Extract new insights from employment history, home ownership, and loan amounts. Acknowledgments This dataset is synthetic and designed for machine learning and financial risk analysis.

  10. c

    creditrisk Dataset

    • cubig.ai
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). creditrisk Dataset [Dataset]. https://cubig.ai/store/products/506/creditrisk-dataset
    Explore at:
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The credit_risk Dataset is a structured dataset designed to predict loan default status (default) based on a customer’s financial condition, credit history, and loan-related information. Each sample includes various features necessary for assessing the applicant’s credit risk.

    2) Data Utilization (1) Characteristics of the credit_risk Dataset: • The dataset includes key financial indicators such as current account balance, savings balance, loan amount, job type, and number of existing loans. The default column serves as a binary classification label indicating whether the customer failed to repay the loan.

    (2) Applications of the credit_risk Dataset: • Loan default prediction model training: The dataset can be used to train machine learning-based binary classification models that estimate a customer’s credit risk in advance and support decisions on loan approvals. • Credit risk analysis and policy development: By analyzing the relationship between financial status and credit history, the dataset can help in setting credit scoring criteria, adjusting risk-based interest rates, and personalizing financial services.

  11. Comprehensive Loan Information for Credit Risk

    • kaggle.com
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheen (2023). Comprehensive Loan Information for Credit Risk [Dataset]. https://www.kaggle.com/datasets/nezukokamaado/auto-loan-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Kaggle
    Authors
    Sheen
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Some of the applications are as follows :

    1)Credit Risk Assessment: Banks and financial institutions can leverage the dataset to develop models for assessing the credit risk associated with loan applicants. This involves predicting the likelihood of loan default based on various features.

    2)Loan Portfolio Management: Financial organizations can use the dataset to manage and optimize their loan portfolios. This includes diversifying risk, setting interest rates, and making informed decisions about loan approval or denial.

    3)Market Trend Analysis: By analyzing the dataset, researchers and analysts can identify trends in borrower behavior, regional variations, and shifts in loan purposes. This information can be valuable for making data-driven market predictions.

    4)Customer Segmentation: Understanding the characteristics of different borrower segments can help banks tailor their services and products. This dataset can be used for clustering customers based on attributes like income, employment length, and loan history.

    5)Regulatory Compliance: Financial institutions can use the dataset to ensure compliance with regulations. For example, assessing whether loans are being offered fairly across different demographics and regions.

    6)Machine Learning Model Development: Data scientists can use this dataset to develop and test machine learning models for predicting loan outcomes. This can include classification tasks such as predicting loan approval or denial.

    7)Lending Strategy Optimization: Banks can optimize their lending strategies by analyzing patterns in loan amounts, interest rates, and repayment behavior. This could involve adjusting lending criteria to attract desirable borrowers.

    8)Fraud Detection: The dataset may be used to identify patterns indicative of fraudulent loan applications. Unusual patterns in borrower information could be flagged for further investigation.

  12. Loans Dataset

    • kaggle.com
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zaki Hanfer (2024). Loans Dataset [Dataset]. https://www.kaggle.com/datasets/zakihanfer/loans-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Zaki Hanfer
    Description

    Data Dictionary

    The Data contains 1 file :

    • loan.csv: In this file there are 18 columns:
      1. loanId: This is a unique loan identifier. Use this for joins with the payment.csv file
      2. anon_ssn: This is a hash based on a client’s SSN (Anonymous ssn). You can use this as if it is a SSN to compare if a loan belongs to a previous customer.
      3. payFrequency: This column represents repayment frequency of the loan:
        • B is biweekly payments
        • I is irregular
        • M is monthly
        • S is semi monthly
        • W is weekly
      4. apr: Annual Percentage Rate of the loan (%)
      5. applicationDate: Date of application (start date)
      6. originated: Indicates if the loan has been initiated (underwriting process started).
      7. originatedDate: Date of origination, day the loan was originated
      8. nPaidOff: Number of MoneyLion loans previously paid off by the client.
      9. approved: Indicates if the loan has been approved (final step of underwriting).
      10. isFunded: Whether or not a loan is ultimately funded. a loan can be voided by a customer shortly after it is approved, so not all approved loans are ultimately funded.
      11. loanStatus: Current loan status (this column is used for prediction). Most are selfexplanatory. Below are the statuses which need clarification:
        • Withdrawn Application: The applicant has withdrawn their loan application before it was approved or funded.
        • Paid Off Loan: The loan has been fully paid off by the borrower according to the repayment terms.
        • Rejected: The loan application was rejected, typically due to failure to meet underwriting criteria.
        • New Loan: A newly approved loan that has not yet been funded.
        • Internal Collection: The loan is being managed and collected internally by MoneyLion due to missed payments or delinquency.
        • CSR Voided New Loan: A new loan application was voided by a customer service representative (CSR) before funding.
        • External Collection: The loan has been transferred to an external collection agency for management and collection.
        • Returned Item: A payment on the loan has been returned due to insufficient funds in the borrower's account.
        • Customer Voided New Loan: The borrower voided a new loan application before funding.
        • Credit Return Void: The loan was voided due to a credit return, typically related to a refunded transaction.
        • Pending Paid Off: The loan is in the process of being paid off, but the process is pending completion.
        • Charged Off Paid Off: The loan has been charged off as a loss by MoneyLion but has also been paid off by the borrower.
        • Settled Bankruptcy: The loan has been settled as part of a bankruptcy proceeding.
        • Settlement Paid Off: The loan has been paid off through a settlement agreement.
        • Charged Off: The loan has been charged off as a loss by MoneyLion due to nonpayment.
        • Pending Rescind: The loan is pending rescission, meaning it may be canceled or reversed.
        • Customver Voided New Loan: Typo: Likely should be "Customer Voided New Loan". Similar to "Customer Voided New Loan", indicating the borrower voided a new loan application before funding.
        • Pending Application: The loan application is pending review and approval.
        • Voided New Loan: The loan application was voided before funding.• Pending Application Fee: The loan application is pending due to the application fee not being paid.
        • Settlement Pending Paid Off: The loan is pending being paid off through a settlement agreement.
      12. loanAmount: Principal amount of the loan ('Dollars') (for non-funded loans this will be the principal in the loan application)
      13. originallyScheduledPaymentAmount: This is the Initialy scheduled repayment amount ('Dollars') (if a customer pays off all his scheduled payments, this is the amount we should receive)
      14. state: State of the client
      15. Lead type: The lead type determines the underwriting rules for a lead.
        • bvMandatory: leads that are bought from the ping tree – required to perform bank verification before loan approval
        • lead: very similar to bvMandatory, except bank verification is optional for loan approval
        • california: similar to lead, but optimized for California lending rules
        • organic: customers that came through the MoneyLion website
        • rc_returning: customers who have at least 1 paid off loan in another loan portfolio. (The first paid off loan is not in this data set).
        • prescreen: preselected customers who have been offered a loan through direct mail campaigns
        • express: promotional “express” loans
        • repeat: promotional loans offered through ...
  13. T

    United Kingdom Mortgage Approvals

    • tradingeconomics.com
    • ko.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). United Kingdom Mortgage Approvals [Dataset]. https://tradingeconomics.com/united-kingdom/mortgage-approvals
    Explore at:
    csv, excel, json, xmlAvailable download formats
    Dataset updated
    Sep 2, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 31, 1986 - Jul 31, 2025
    Area covered
    United Kingdom
    Description

    Mortgage Approvals in the United Kingdom increased to 65.35 Thousand in July from 64.57 Thousand in June of 2025. This dataset provides the latest reported value for - United Kingdom Mortgage Approvals - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  14. Credit Approval (Mixed Attributes)

    • kaggle.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Credit Approval (Mixed Attributes) [Dataset]. https://www.kaggle.com/datasets/thedevastator/improving-credit-approval-with-mixed-attributes/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Credit Approval (Mixed Attributes)

    Continuous and Categorical Features

    By UCI [source]

    About this dataset

    This dataset explores the phenomenon of credit card application acceptance or rejection. It includes a range of both continuous and categorical attributes, such as the applicant's gender, credit score, and income; as well as details about recent credit card activity including balance transfers and delinquency. This data presents a unique opportunity to investigate how these different attributes interact in determining application status. With careful analysis of this dataset, we can gain valuable insights into understanding what factors help ensure a successful application outcome. This could lead us to developing more effective strategies for predicting and improving financial credit access for everyone

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is an excellent resource for researching the credit approval process, as it provides a variety of attributes from both continuous and categorical sources. The aim of this guide is to provide tips and advice on how to make the most out of this dataset. - Understand the data: Before attempting to work with this dataset, it's important to understand what kind of information it contains. Since there is a mix of continuous and categorical attributes in this data set, make sure you familiarise yourself with all the different columns before proceeding further. - Exploratory Analysis: It is recommended that you conduct some exploratory analysis on your data in order to gain an overall understanding of its characteristics and distributions. By investigating things like missing values and correlations between different independent variables (IVs) or dependent variables (DVs), you can better prepare yourself for making meaningful analyses or predictions in further steps. - Data Cleaning: Once you have familiarised yourself with your data, begin cleaning up any potential discrepancies such as missing values or outliers by replacing them appropriately or removing them from your dataset if necessary - Feature Selection/Engineering: After cleansing your data set, feature selection/engineering may be necessary if certain columns are redundant or not proving useful for constructing meaningful models/analyses over your data set (usually observed after exploratory analysis). You should be very mindful when deciding which features should be removed so that no information about potentially important relationships are lost!
    - Model Building/Analysis: Now that our data has been pre-processed appropriately we can move forward with developing our desired models / analyses over our newly transformed datasets!

    Research Ideas

    • Developing predictive models to identify customers who are likely to default on their credit card payments.
    • Creating a risk analysis system that can identify customers who pose a higher risk for fraud or misuse of their credit cards.
    • Developing an automated lending decision system that can use the data points provided in the dataset (i.e., gender, average monthly balance, etc.) to decide whether or not to approve applications for new credit lines and loans

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: crx.data.csv | Column name | Description | |:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------| | b | Gender (Categorical) | | 30.83 | Average Monthly Balance (Continuous) | | 0 | Number of Months Since Applicant's Last Delinquency (Continuous) | | w | Number of Months Since Applicant's Last Credit Card Approval (Continuous) | | 1.25 | Number Of Months since The applicant's last balance increase (Continuous) ...

  15. T

    China Loan Prime Rate

    • tradingeconomics.com
    • de.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). China Loan Prime Rate [Dataset]. https://tradingeconomics.com/china/interest-rate
    Explore at:
    xml, csv, excel, jsonAvailable download formats
    Dataset updated
    Aug 20, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 25, 2013 - Aug 20, 2025
    Area covered
    China
    Description

    The benchmark interest rate in China was last recorded at 3 percent. This dataset provides the latest reported value for - China Interest Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  16. Average mortgage interest rates in the UK 2000-2025, by month and type

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Average mortgage interest rates in the UK 2000-2025, by month and type [Dataset]. https://www.statista.com/statistics/386301/uk-average-mortgage-interest-rates/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2000 - May 2025
    Area covered
    United Kingdom
    Description

    Mortgage rates increased at a record pace in 2022, with the 10-year fixed mortgage rate doubling between March 2022 and December 2022. With inflation increasing, the Bank of England introduced several bank rate hikes, resulting in higher mortgage rates. In May 2025, the average 10-year fixed rate interest rate reached **** percent. As borrowing costs get higher, demand for housing is expected to decrease, leading to declining market sentiment and slower house price growth. How have the mortgage hikes affected the market? After surging in 2021, the number of residential properties sold declined in 2023, reaching just above *** million. Despite the number of transactions falling, this figure was higher than the period before the COVID-19 pandemic. The falling transaction volume also impacted mortgage borrowing. Between the first quarter of 2023 and the first quarter of 2024, the value of new mortgage loans fell year-on-year for five straight quarters in a row. How are higher mortgages affecting homebuyers? Homeowners with a mortgage loan usually lock in a fixed rate deal for two to ten years, meaning that after this period runs out, they need to renegotiate the terms of the loan. Many of the mortgages outstanding were taken out during the period of record-low mortgage rates and have since faced notable increases in their monthly repayment. About **** million homeowners are projected to see their deal expire by the end of 2026. About *** million of these loans are projected to experience a monthly payment increase of up to *** British pounds by 2026.

  17. submission.json

    • kaggle.com
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sharmila Ghosh (2024). submission.json [Dataset]. https://www.kaggle.com/datasets/sharmilaghosh/submission-json/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sharmila Ghosh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context, Sources, and Inspirations Behind the Dataset When developing a hybrid model that combines human-like reasoning with neural network precision, the choice of dataset is crucial. The datasets used in training such a model were selected and curated based on specific goals and requirements, drawing inspiration from a variety of contexts. Below is a breakdown of the datasets, their origins, sources, and the inspirations behind selecting them:

    1. Context of the Dataset Selection Objective: To create a model capable of generalizing across diverse tasks, including classification, regression, language understanding, and visual recognition. The model is designed to tackle challenges involving unseen data, complex reasoning, and multi-modal inputs. Approach: A combination of publicly available benchmark datasets and proprietary datasets from specific domains was used. The data sources aimed to provide comprehensive coverage of real-world scenarios and diverse input types to enhance the model's robustness.
    2. Data Sources Public Benchmark Datasets ImageNet and COCO (Common Objects in Context):

    Inspiration: Widely recognized for image classification and object detection tasks. They provide a large and varied set of labeled images, covering thousands of object categories. Source: Open datasets maintained by research communities. Usage: Used for training and testing the vision component of the hybrid model, focusing on object recognition and scene understanding. MultiWOZ (Multi-Domain Wizard-of-Oz):

    Inspiration: A comprehensive dialogue dataset covering multiple domains (e.g., restaurant booking, hotel reservations). Source: Created by dialogue researchers, it provides annotated conversations mimicking real-world human interactions. Usage: Leveraged for training the language understanding and dialogue generation capabilities of the model. ConceptNet:

    Inspiration: Designed to provide commonsense knowledge, helping models reason beyond factual information by understanding relationships and contexts. Source: An open-source project that aggregates data from various crowdsourced resources like Wikipedia, WordNet, and Open Mind Common Sense. Usage: Integrated into the reasoning module to improve multi-hop and commonsense reasoning. UCI Machine Learning Repository:

    Inspiration: A well-known repository containing diverse datasets for various machine learning tasks, such as loan approval and medical diagnosis. Source: Academic research and publicly available datasets contributed by the research community. Usage: Used for structured data tasks, particularly in financial and healthcare analytics. B. Proprietary and Domain-Specific Datasets Healthcare Records Dataset:

    Inspiration: The increasing demand for predictive analytics in healthcare motivated the use of patient records to predict health outcomes. Source: Anonymized data collected from healthcare providers, including patient demographics, medical history, and diagnostic information. Usage: Trained and tested the model's ability to handle regression tasks, such as predicting patient recovery rates and health risks. Financial Transactions and Loan Application Data:

    Inspiration: To address risk analytics in financial services, loan application datasets containing applicant profiles, credit scores, and financial history were used. Source: Collaboration with financial institutions provided access to anonymized loan application data. Usage: Focused on classification tasks for loan approval predictions and credit scoring. C. Synthesized Data and Augmented Datasets Synthetic Dialogue Scenarios: Inspiration: To test the model's performance on hypothetical scenarios and rare cases not covered in standard datasets. Source: Generated using rule-based models and simulations to create additional training samples, especially for edge cases in dialogue tasks. Usage: Improved model robustness by exposing it to challenging and less common dialogue interactions. 3. Inspirations Behind the Dataset Choice Diverse Task Requirements: The hybrid model was designed to handle multiple types of tasks (classification, regression, reasoning), necessitating diverse datasets covering different input formats (images, text, structured data). Real-World Relevance: The selected datasets were inspired by real-world use cases in healthcare, finance, and customer service, reflecting common scenarios where such a hybrid model could be applied. Challenging Scenarios: To test the model's reasoning capabilities, datasets like ConceptNet and synthetic scenarios were included, inspired by the need to handle complex logical reasoning and inferencing tasks. Inclusivity and Fairness: Public datasets were chosen to ensure coverage across various demographic groups, reducing bias and improving fairness in predictions. 4. Pre-Processing and Data Preparation Standardization and Normalization: Structured data were ...

  18. T

    Sweden Household Lending Growth

    • tradingeconomics.com
    • id.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Sweden Household Lending Growth [Dataset]. https://tradingeconomics.com/sweden/loan-growth
    Explore at:
    excel, csv, xml, jsonAvailable download formats
    Dataset updated
    Aug 27, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 1976 - Jul 31, 2025
    Area covered
    Sweden
    Description

    The value of loans in Sweden increased 2.60 percent in July of 2025 over the same month in the previous year. This dataset provides the latest reported value for - Sweden Household Lending Growth - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  19. T

    Germany Bank Lending Rate

    • tradingeconomics.com
    • pl.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2024). Germany Bank Lending Rate [Dataset]. https://tradingeconomics.com/germany/bank-lending-rate
    Explore at:
    excel, xml, csv, jsonAvailable download formats
    Dataset updated
    Dec 15, 2024
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 2003 - Jun 30, 2025
    Area covered
    Germany
    Description

    Bank Lending Rate in Germany decreased to 4 percent in June from 4.09 percent in May of 2025. This dataset provides the latest reported value for - Germany Bank Lending Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  20. Credit Card Balance Prediction

    • kaggle.com
    Updated Jul 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdalrahman Ali El nashar (2022). Credit Card Balance Prediction [Dataset]. https://www.kaggle.com/datasets/abdalrahmanelnashar/credit-card-balance-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2022
    Dataset provided by
    Kaggle
    Authors
    Abdalrahman Ali El nashar
    Description

    This dataset contains information about credit card balance. This data can be used for a lot of purposes such as credit card balance prediction. The columns in the given dataset are as follows: Income: Income of the customer. Limit: Credit limit provided to the customer. Rating: The customer's credit rating. Cards: The number of credit cards the customer has. Age: Age of the customer. Education: Educational level of the customer. Gender: Sex of the customer. Student: If the customer is a student or not. Married: If the customer is married. Ethnicity: Ethnicity of the customer. Balance: Credit balance of the customer.

    $ Income : num 14.9 106 104.6 148.9 55.9 ...

    $ Limit : int 3606 6645 7075 9504 4897 8047 3388 7114 3300 6819 ...

    $ Rating : int 283 483 514 681 357 569 259 512 266 491 ...

    $ Cards : int 2 3 4 3 2 4 2 2 5 3 ...

    $ Age : int 34 82 71 36 68 77 37 87 66 41 ...

    $ Education: int 11 15 11 11 16 10 12 9 13 19 ...

    $ Gender : Factor w/ 2 levels " Male","Female": 1 2 1 2 1 1 2 1 2 2 ...

    $ Student : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 1 1 2 ...

    $ Married : Factor w/ 2 levels "No","Yes": 2 2 1 1 2 1 1 1 1 2 ...

    $ Ethnicity: Factor w/ 3 levels "African American",..: 3 2 2 2 3 3 1 2 3 1 ...

    $ Balance : int 333 903 580 964 331 1151 203 872 279 1350 ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ta-wei Lo (2024). Loan Approval Classification Dataset [Dataset]. https://www.kaggle.com/datasets/taweilo/loan-approval-classification-data
Organization logo

Loan Approval Classification Dataset

Synthetic Data for binary classification on Loan Approval

Explore at:
32 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ta-wei Lo
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

1. Data Source

This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.

2. Metadata

The dataset contains 45,000 records and 14 variables, each described below:

ColumnDescriptionType
person_ageAge of the personFloat
person_genderGender of the personCategorical
person_educationHighest education levelCategorical
person_incomeAnnual incomeFloat
person_emp_expYears of employment experienceInteger
person_home_ownershipHome ownership status (e.g., rent, own, mortgage)Categorical
loan_amntLoan amount requestedFloat
loan_intentPurpose of the loanCategorical
loan_int_rateLoan interest rateFloat
loan_percent_incomeLoan amount as a percentage of annual incomeFloat
cb_person_cred_hist_lengthLength of credit history in yearsFloat
credit_scoreCredit score of the personInteger
previous_loan_defaults_on_fileIndicator of previous loan defaultsCategorical
loan_status (target variable)Loan approval status: 1 = approved; 0 = rejectedInteger

3. Data Usage

The dataset can be used for multiple purposes:

  • Exploratory Data Analysis (EDA): Analyze key features, distribution patterns, and relationships to understand credit risk factors.
  • Classification: Build predictive models to classify the loan_status variable (approved/not approved) for potential applicants.
  • Regression: Develop regression models to predict the credit_score variable based on individual and loan-related attributes.

Mind the data issue from the original data, such as the instance > 100-year-old as age.

This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

Search
Clear search
Close search
Google apps
Main menu