7 datasets found
  1. SBA Loans Case Data Set

    • kaggle.com
    zip
    Updated Apr 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). SBA Loans Case Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/sba-loans-case-data-set/data
    Explore at:
    zip(114603 bytes)Available download formats
    Dataset updated
    Apr 15, 2020
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Should This Loan be Approved or Denied?

    If you like the data set and download it, an upvote would be appreciated.

    The Small Business Administration (SBA) was founded in 1953 to assist small businesses in obtaining loans. Small businesses have been the primary source of employment in the United States. Helping small businesses help with job creation, which reduces unemployment. Small business growth also promotes economic growth. One of the ways the SBA helps small businesses is by guaranteeing bank loans. This guarantee reduces the risk to banks and encourages them to lend to small businesses. If the loan defaults, the SBA covers the amount guaranteed, and the bank suffers a loss for the remaining balance.

    There have been several small business success stories like FedEx and Apple. However, the rate of default is very high. Many economists believe the banking market works better without the assistance of the SBA. Supporter claim that the social benefits and job creation outweigh any financial costs to the government in defaulted loans.

    The Data Set

    The original data set is from the U.S.SBA loan database, which includes historical data from 1987 through 2014 (899,164 observations) with 27 variables. The data set includes information on whether the loan was paid off in full or if the SMA had to charge off any amount and how much that amount was. The data set used is a subset of the original set. It contains loans about the Real Estate and Rental and Leasing industry in California. This file has 2,102 observations and 35 variables. The column Default is an integer of 1 or zero, and I had to change this column to a factor.

    For more information on this data set go to https://amstat.tandfonline.com/doi/full/10.1080/10691898.2018.1434342

  2. Bank Debt Data

    • kaggle.com
    zip
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abid Ali Awan (2020). Bank Debt Data [Dataset]. https://www.kaggle.com/kingabzpro/bank-debt-data
    Explore at:
    zip(26058 bytes)Available download formats
    Dataset updated
    Dec 8, 2020
    Authors
    Abid Ali Awan
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    After a debt has been legally declared "uncollectable" by a bank, the account is considered "charged-off." But that doesn't mean the bank walks away from the debt. They still want to collect some of the money they are owed. The bank will score the account to assess the expected recovery amount, that is, the expected amount that the bank may be able to receive from the customer in the future. This amount is a function of the probability of the customer paying, the total debt, and other factors that impact the ability and willingness to pay.

    The bank has implemented different recovery strategies at different thresholds ($1000, $2000, etc.) where the greater the expected recovery amount, the more effort the bank puts into contacting the customer. For low recovery amounts (Level 0), the bank just adds the customer's contact information to their automatic dialer and emailing system. For higher recovery strategies, the bank incurs more costs as they leverage human resources in more efforts to obtain payments. Each additional level of recovery strategy requires an additional $50 per customer so that customers in the Recovery Strategy Level 1 cost the company $50 more than those in Level 0. Customers in Level 2 cost $50 more than those in Level 1, etc.

    Content

    The big question: does the extra amount that is recovered at the higher strategy level exceed the extra $50 in costs? In other words, was there a jump (also called a "discontinuity") of more than $50 in the amount recovered at the higher strategy level? We'll find out in this notebook.

  3. lending-loans

    • kaggle.com
    zip
    Updated Oct 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Izem Demirci (2020). lending-loans [Dataset]. https://www.kaggle.com/izemdemirci/lendingloans
    Explore at:
    zip(218632 bytes)Available download formats
    Dataset updated
    Oct 13, 2020
    Authors
    Izem Demirci
    Description

    The data is publicly available which is from LendingClub.com. The dataset represents 9,578 3-year-loans that were funded through the LendingClub.com platform between May 2007 and February 2010.

    The binary dependent variable 'not.full.paid' indicates that the loan was not paid back in full (the borrower either defaulted or the loan was "charged off," meaning the borrower was deemed unlikely to ever pay it back).

    Independent variables

    • credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.
    • purpose:The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other").
    • int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be riskier are assigned higher interest rates.
    • installment: The monthly installments ($) owed by the borrower if the loan is funded.
    • log.annual.inc: The natural log of the self-reported annual income of the borrower.
    • dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).
    • fico: The FICO credit score of the borrower.
    • days.with.cr.line: The number of days the borrower has had a credit line.
    • revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).
    • revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).
    • inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months.
    • delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
    • pub.rec:The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).
  4. Charge-Off Rate on All Loans and Mortgages

    • kaggle.com
    zip
    Updated Dec 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Reserve (2019). Charge-Off Rate on All Loans and Mortgages [Dataset]. https://www.kaggle.com/federalreserve/charge-off-rate-on-all-loans-and-mortgages
    Explore at:
    zip(9851 bytes)Available download formats
    Dataset updated
    Dec 24, 2019
    Dataset provided by
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Authors
    Federal Reserve
    Description

    Content

    More details about each file are in the individual file descriptions.

    Context

    This is a dataset from the Federal Reserve hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according to the frequency that the data updates. Explore the Federal Reserve using Kaggle and all of the data sources available through the Federal Reserve organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using FRED's API and Kaggle's API.

    Cover photo by David Hellmann on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  5. Loan Data for Dummy Bank

    • kaggle.com
    zip
    Updated Aug 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MuhammadNadeemFerozi (2018). Loan Data for Dummy Bank [Dataset]. https://www.kaggle.com/mrferozi/loan-data-for-dummy-bank
    Explore at:
    zip(29512416 bytes)Available download formats
    Dataset updated
    Aug 4, 2018
    Authors
    MuhammadNadeemFerozi
    Description

    Company Information:

    The data set is based upon https://www.kaggle.com/prateikmahendra/loan-data"> Lending Club Information . - TheIrish Dummy Banks is a peer to peer lending bank based in the ireland, in which bank provide funds for potential borrowers and bank earn a profit depending on the risk they take (the borrowers credit score). Irish Fake bank provides loan to their loyal customers. The complete data set is borrowed from Lending Club For more basic information about the company please check out the wikipedia article about the company. This dataset is copied and clean from kaggle but it has been changed. The any kind of similarity is just for learning purposes. I dont have any intention for Plagiarism I just like to be clear myself.

    <a src="https://en.wikipedia.org/wiki/Lending_Club"> Lending Club Information </a>
    

    The central idea and coding is abstract from Kevin mark ham youtube video series, Introduction to machine learning with scikit-learn video series. You can find link under resources section.

    Data Description

    • LoanStatNew Description

    • addr_state The state provided by the borrower in the loan application

    • annual_inc The self-reported annual income provided by the borrower during registration.

    • annual_inc_joint The combined self-reported annual income provided by the co-borrowers during registration

    • application_type Indicates whether the loan is an individual application or a joint application with two co-borrowers

    • collection_recovery_fee post charge off collection fee

    • collections_12_mths_ex_med Number of collections in 12 months excluding medical collections

    • delinq_2yrs The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years

    • desc Loan description provided by the borrower

    • dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, - - - excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.

    • dti_joint A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, - excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income

    • earliest_cr_line The month the borrower's earliest reported credit line was opened

    • emp_length Employment length in years. Possible values are between 0 and 10 where 0 means less than one year

    • and 10 means ten or more years.

    • emp_title The job title supplied by the Borrower when applying for the loan.*

    • fico_range_high The upper boundary range the borrower’s FICO at loan origination belongs to.

    • fico_range_low The lower boundary range the borrower’s FICO at loan origination belongs to.

    • funded_amnt The total amount committed to that loan at that point in time.

    • funded_amnt_inv The total amount committed by investors for that loan at that point in time.

    • grade LC assigned loan grade

    • home_ownership The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.

  6. PPP loans during the Covid-19 pandemic in USA

    • kaggle.com
    zip
    Updated Jan 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vaibhav kumar (2022). PPP loans during the Covid-19 pandemic in USA [Dataset]. https://www.kaggle.com/vaibhav2025/ppp-loans-during-the-covid19-pandemic-in-usa
    Explore at:
    zip(352961614 bytes)Available download formats
    Dataset updated
    Jan 14, 2022
    Authors
    Vaibhav kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Area covered
    United States
    Description

    The number of community banks in the United States has fallen from more than 13,000 in the mid-1980s to less than 5,000 today. These community-focused banks have consolidated mainly as a result of competitive pressures. Research shows that community banks are essential to maintaining economically fruitful communities, and losing these banks could be a significant blow to local infrastructure.

    One example of the importance of community banks was their role in distributing Paycheck Protection Program (PPP)1 loans during the Covid-19 pandemic. The PPP was designed to help small businesses keep their workers employed during the pandemic by providing funds through a short-term loan backed by the Small Business Administration (SBA). Preliminary research by CSBS shows that state-chartered banks were the primary distributor of PPP loans, and that community banks played an outsized role in the distribution of PPP funds.

    CSBS is providing complete loan-level PPP data [available here (full file, 300MB), here (sample data) and here (data definitions)] that combines the publicly available files made available on sba.gov. To allow for analysis on depository institutions, CSBS will also be adding FDIC Certificate numbers to this file. When the institution is a bank, the FDIC Certificate number will allow participants to link the PPP data to the quarterly Call Report of Income and Condition, which can be accessed here. CSBS is also providing a sample dataset that can be updated and examined in Excel. Questions regarding the data can be sent to data@csbs.org. CERT number is based on originating lender, not servicing lender. Field Name Field Description LoanNumber Loan Number (unique identifier) DateApproved Loan Funded Date SBAOfficeCode SBA Origination Office Code ProcessingMethod Loan Delivery Method (PPP for first draw; PPS for second draw) BorrowerName Borrower Name BorrowerAddress Borrower Street Address BorrowerCity Borrower City BorrowerState Borrower State BorrowerZip Borrower Zip Code LoanStatusDate Loan Status Date - Loan Status Date is blank when the loan is disbursed but not Paid In Full or Charged Off LoanStatus Loan Status Description - Loan Status is replaced by 'Exemption 4' when the loan is disbursed but not Paid in Full or Charged Off Term Loan Maturity in Months SBAGuarantyPercentage SBA Guaranty Percentage InitialApprovalAmount Loan Approval Amount (at origination) CurrentApprovalAmount Loan Approval Amount (current) UndisbursedAmount Undisbursed Amount FranchiseName Franchise Name ServicingLenderLocationID Lender Location ID (unique identifier) ServicingLenderName Servicing Lender Name ServicingLenderAddress Servicing Lender Street Address ServicingLenderCity Servicing Lender City ServicingLenderState Servicing Lender State ServicingLenderZip Servicing Lender Zip Code RuralUrbanIndicator Rural or Urban Indicator (R/U) HubzoneIndicator Hubzone Indicator (Y/N) LMIIndicator LMI Indicator (Y/N) BusinessAgeDescription Business Age Description ProjectCity Project City ProjectCountyName Project County Name ProjectState Project State ProjectZip Project Zip Code CD Project Congressional District JobsReported Number of Employees NAICSCode NAICS 6 digit code Race Borrower Race Description Ethnicity Borrower Ethnicity Description UTILITIES_PROCEED Note: Proceed data is lender reported at origination. On the PPP application the proceeds fields were check boxes. PAYROLL_PROCEED MORTGAGE_INTEREST_PROCEED RENT_PROCEED REFINANCE_EIDL_PROCEED HEALTH_CARE_PROCEED DEBT_INTEREST_PROCEED BusinessType Business Type Description OriginatingLenderLocationID Originating Lender ID (unique identifier) OriginatingLender Originating Lender Name OriginatingLenderCity Originating Lender City OriginatingLenderState Originating Lender State Gender Gender Indicator Veteran Veteran Indicator NonProfit 'Yes' if Business Type = Nonprofit Organization or Nonprofit Childcare Center or 501(c) Nonprofit ForgivenessAmount Forgiveness Amount ForgivenessDate Forgiveness Paid Date CERT Community Bank Flag State vs. National Charter

    Source - https://www.csbs.org/

  7. Station Stats Dataset

    • kaggle.com
    zip
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K S ABISHEK (2023). Station Stats Dataset [Dataset]. https://www.kaggle.com/datasets/ksabishek/station-stats-dataset
    Explore at:
    zip(286803 bytes)Available download formats
    Dataset updated
    Jul 7, 2023
    Authors
    K S ABISHEK
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    City B is a small sized town having a population of just over 100K.

    City B has 10 stations which branch off to other towns. These stations have been named as ST01, ST02 and so on up to ST10. You have been given 11 Excel files, one of them containing the combined data of all the other files, and 10 other excel sheets containing the following details:

    1.Station Name/Code
    2.Date
    3.Influx (Count of passengers who visited the station)
    4.Total tickets booked during that particular day/date.
    5.Prebooked - Indicating the count of tickets booked before-hand, that is, tickets booked at home/office etc.
    6.Onspot - Indicating the count of tickets booked at the station, that is, at ticket counters.
    7.Total Revenue - charged against a standard price of 5 Eur per ticket.

    Now, that being said, the Municipal Authority found out an anomaly - some people do not book tickets before boarding the trains, and it is causing losses.

    Now, the presiding authority wants you to come up with a report containing the following metrics/insights:

    1. Average tickets booked at every station
    2. Trends in ticket booking with respect to months of the year - elucidating on any upward/downward trends.
    3. The number of people who haven't booked tickets at the station - both monthly and yearly count(s) for each station.
    4. A ranking of stations based on the above metric of non-booking.
    5. Assuming the cost of each ticket to be 5 Eur, calculate the total loss incurred and the average loss as well.

    Based on the influx metric, can you chalk out the busiest station in the city?

    Note: The dataset is fictional and is only intended for budding analysts and students who are looking to work on numbers.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data-Science Sean (2020). SBA Loans Case Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/sba-loans-case-data-set/data
Organization logo

SBA Loans Case Data Set

The data set includes information on whether the loan was paid off or defaulted

Explore at:
zip(114603 bytes)Available download formats
Dataset updated
Apr 15, 2020
Authors
Data-Science Sean
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Should This Loan be Approved or Denied?

If you like the data set and download it, an upvote would be appreciated.

The Small Business Administration (SBA) was founded in 1953 to assist small businesses in obtaining loans. Small businesses have been the primary source of employment in the United States. Helping small businesses help with job creation, which reduces unemployment. Small business growth also promotes economic growth. One of the ways the SBA helps small businesses is by guaranteeing bank loans. This guarantee reduces the risk to banks and encourages them to lend to small businesses. If the loan defaults, the SBA covers the amount guaranteed, and the bank suffers a loss for the remaining balance.

There have been several small business success stories like FedEx and Apple. However, the rate of default is very high. Many economists believe the banking market works better without the assistance of the SBA. Supporter claim that the social benefits and job creation outweigh any financial costs to the government in defaulted loans.

The Data Set

The original data set is from the U.S.SBA loan database, which includes historical data from 1987 through 2014 (899,164 observations) with 27 variables. The data set includes information on whether the loan was paid off in full or if the SMA had to charge off any amount and how much that amount was. The data set used is a subset of the original set. It contains loans about the Real Estate and Rental and Leasing industry in California. This file has 2,102 observations and 35 variables. The column Default is an integer of 1 or zero, and I had to change this column to a factor.

For more information on this data set go to https://amstat.tandfonline.com/doi/full/10.1080/10691898.2018.1434342

Search
Clear search
Close search
Google apps
Main menu