Facebook
TwitterA)20160923_global_crisis_data:
https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx
This data was collected over many years by Carmen Reinhart (with her coauthors Ken Rogoff, Christoph Trebesch, and Vincent Reinhart). This data contains the banking crises of 70 countries, from 1800 AD to 2016 AD, with a total of 15,190 records and 16 variables. But the data stabilized after cleaning and adjusting to 8642 records and 17 variables.
B)Label_Country: This data contains a description of the country whether it's Developing or Developed .
1-Case: ID Number for Country.
2-Cc3: ID String for Country.
3-Country : Name Country.
4-Year: The date from 1800 to 2016.
5-Banking_Crisis: Banking problems can often be traced to a decrease the value of banks' assets.
A) due to a collapse in real estate prices or When the bank asset values decrease substantially . B) if a government stops paying its obligations, this can trigger a sharp decline in value of bonds.
6-Systemic_Crisis : when many banks in a country are in serious solvency or liquidity problems at the same time—either:
A) because there are all hits by the same outside shock. B) or because failure in one bank or a group of banks spreads to other banks in the system.
7-Gold_Standard: The Country have crisis in Gold Standard.
8-Exch_Usd: Exch local currency in USD, Except exch USD currency in GBP.
9-Domestic_Debt_In_Default: The Country have domestic debt in default.
10-Sovereign_External_Debt_1: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom and post-1975 defaults on Official External Creditors.
11-Sovereign_External_Debt_2: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom but includes post-1975 defaults on Official External Creditors.
12-Gdp_Weighted_Default:GDP Weighted Default for country.
13-Inflation: Annual percentages of average consumer prices.
14-Independence: Independence for country.
15-Currency_Crises: The Country have crisis in Currency.
16-Inflation_Crises: The Country have crisis in Inflation.
17-Level_Country: The description of the country whether it's Developing or Developed.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
🏦 Synthetic Loan Approval Dataset
A Realistic, High-Quality Dataset for Credit Risk Modelling
🎯 Why This Dataset?
Most loan datasets on Kaggle have unrealistic patterns where:
Unlike most loan datasets available online, this one is built on real banking criteria from US and Canadian financial institutions. Drawing from 3 years of hands-on finance industry experience, the dataset incorporates realistic correlations and business logic that reflect how actual lending decisions are made. This makes it perfect for data scientists looking to build portfolio projects that showcase not just coding ability, but genuine understanding of credit risk modelling.
📊 Dataset Overview
| Metric | Value |
|---|---|
| Total Records | 50,000 |
| Features | 20 (customer_id + 18 predictors + 1 target) |
| Target Distribution | 55% Approved, 45% Rejected |
| Missing Values | 0 (Complete dataset) |
| Product Types | Credit Card, Personal Loan, Line of Credit |
| Market | United States & Canada |
| Use Case | Binary Classification (Approved/Rejected) |
🔑 Key Features
Identifier:
-Customer ID (unique identifier for each application)
Demographics:
-Age, Occupation Status, Years Employed
Financial Profile:
-Annual Income, Credit Score, Credit History Length -Savings/Assets, Current Debt
Credit Behaviour:
-Defaults on File, Delinquencies, Derogatory Marks
Loan Request:
-Product Type, Loan Intent, Loan Amount, Interest Rate
Calculated Ratios:
-Debt-to-Income, Loan-to-Income, Payment-to-Income
💡 What Makes This Dataset Special?
1️⃣ Real-World Approval Logic The dataset implements actual banking criteria: - DTI ratio > 50% = automatic rejection - Defaults on file = instant reject - Credit score bands match real lending thresholds - Employment verification for loans ≥$20K
2️⃣ Realistic Correlations - Higher income → Better credit scores - Older applicants → Longer credit history - Students → Lower income, special treatment for small loans - Loan intent affects approval (Education best, Debt Consolidation worst)
3️⃣ Product-Specific Rules - Credit Cards: More lenient, higher limits - Personal Loans: Standard criteria, up to $100K - Line of Credit: Capped at $50K, manual review for high amounts
4️⃣ Edge Cases Included - Young applicants (age 18) building first credit - Students with thin credit files - Self-employed with variable income - High debt-to-income ratios - Multiple delinquencies
🎓 Perfect For - Machine Learning Practice: Binary classification with real patterns - Credit Risk Modelling: Learn actual lending criteria - Portfolio Projects: Build impressive, explainable models - Feature Engineering: Rich dataset with meaningful relationships - Business Analytics: Understand financial decision-making
📈 Quick Stats
Approval Rates by Product - Credit Card: 60.4% more lenient) - Personal Loan: 46.9 (standard) - Line of Credit: 52.6% (moderate)
Loan Intent (Best → Worst Approval Odds) 1. Education (63% approved) 2. Personal (58% approved) 3. Medical/Home (52% approved) 4. Business (48% approved) 5. Debt Consolidation (40% approved)
Credit Score Distribution - Mean: 644 - Range: 300-850 - Realistic bell curve around 600-700
Income Distribution - Mean: $50,063 - Median: $41,608 - Range: $15K - $250K
🎯 Expected Model Performance
With proper feature engineering and tuning: - Accuracy: 75-85% - ROC-AUC: 0.80-0.90 - F1-Score: 0.75-0.85
Important: Feature importance should show: 1. Credit Score (most important) 2. Debt-to-Income Ratio 3. Delinquencies 4. Loan Amount 5. Income
If your model shows different patterns, something's wrong!
🏆 Use Cases & Projects
Beginner - Binary classification with XGBoost/Random Forest - EDA and visualization practice - Feature importance analysis
Intermediate - Custom threshold optimization (profit maximization) - Cost-sensitive learning (false positive vs false negative) - Ensemble methods and stacking
Advanced - Explainable AI (SHAP, LIME) - Fairness analysis across demographics - Production-ready API with FastAPI/Flask - Streamlit deployment with business rules
⚠️ Important Notes
This is SYNTHETIC Data - Generated based on real banking criteria - No real customer data was used - Safe for public sharing and portfolio use
Limitations - Simplified approval logic (real banks use 100+ factors) - No temporal component (no time series) - Single country/currency assumed (USD) - No external factors (economy, market conditions)
Educational Purpose This dataset is designed for: - Learning credit risk modeling - Portfolio projects - ML practice - Understanding lending criteria
NOT for: - Actual lending decisions - Financial advice - Production use without validation
🤝 Contributing
Found an issue? Have suggestions? - Open an issue on GitHub - Suggest i...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterA)20160923_global_crisis_data:
https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx
This data was collected over many years by Carmen Reinhart (with her coauthors Ken Rogoff, Christoph Trebesch, and Vincent Reinhart). This data contains the banking crises of 70 countries, from 1800 AD to 2016 AD, with a total of 15,190 records and 16 variables. But the data stabilized after cleaning and adjusting to 8642 records and 17 variables.
B)Label_Country: This data contains a description of the country whether it's Developing or Developed .
1-Case: ID Number for Country.
2-Cc3: ID String for Country.
3-Country : Name Country.
4-Year: The date from 1800 to 2016.
5-Banking_Crisis: Banking problems can often be traced to a decrease the value of banks' assets.
A) due to a collapse in real estate prices or When the bank asset values decrease substantially . B) if a government stops paying its obligations, this can trigger a sharp decline in value of bonds.
6-Systemic_Crisis : when many banks in a country are in serious solvency or liquidity problems at the same time—either:
A) because there are all hits by the same outside shock. B) or because failure in one bank or a group of banks spreads to other banks in the system.
7-Gold_Standard: The Country have crisis in Gold Standard.
8-Exch_Usd: Exch local currency in USD, Except exch USD currency in GBP.
9-Domestic_Debt_In_Default: The Country have domestic debt in default.
10-Sovereign_External_Debt_1: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom and post-1975 defaults on Official External Creditors.
11-Sovereign_External_Debt_2: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom but includes post-1975 defaults on Official External Creditors.
12-Gdp_Weighted_Default:GDP Weighted Default for country.
13-Inflation: Annual percentages of average consumer prices.
14-Independence: Independence for country.
15-Currency_Crises: The Country have crisis in Currency.
16-Inflation_Crises: The Country have crisis in Inflation.
17-Level_Country: The description of the country whether it's Developing or Developed.