Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title: Credit Card Transactions Dataset for Fraud Detection (Used in: A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning)Description:This dataset, commonly known as creditcard.csv, contains anonymized credit card transactions made by European cardholders in September 2013. It includes 284,807 transactions, with 492 labeled as fraudulent. Due to confidentiality constraints, features have been transformed using PCA, except for 'Time' and 'Amount'.This dataset was used in the research article titled "A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning for Credit Card Fraud Detection". The study proposes an ensemble model integrating techniques such as Autoencoders, Isolation Forest, Local Outlier Factor, and supervised classifiers including XGBoost and Random Forest, aiming to improve the detection of rare fraudulent patterns while maintaining efficiency and scalability.Key Features:30 numerical input features (V1–V28, Time, Amount)Class label indicating fraud (1) or normal (0)Imbalanced class distribution typical in real-world fraud detectionUse Case:Ideal for benchmarking and evaluating anomaly detection and classification algorithms in highly imbalanced data scenarios.Source:Originally published by the Machine Learning Group at Université Libre de Bruxelles.https://www.kaggle.com/mlg-ulb/creditcardfraudLicense:This dataset is distributed for academic and research purposes only. Please cite the original source when using the dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
As a data contributor, I'm sharing this crucial dataset focused on the detection of fraudulent credit card transactions. Recognizing these illicit activities is paramount for protecting customers and the integrity of financial systems.
About the Dataset:
This dataset encompasses credit card transactions made by European cardholders during a two-day period in September 2013. It presents a real-world scenario with a significant class imbalance, where fraudulent transactions are considerably less frequent than legitimate ones. Out of a total of 284,807 transactions, only 492 are instances of fraud, representing a mere 0.172% of the entire dataset.
Content of the Data:
Due to confidentiality concerns, the majority of the input features in this dataset have undergone a Principal Component Analysis (PCA) transformation. This means the original meaning and context of features V1, V2, ..., V28 are not directly provided. However, these principal components capture the variance in the underlying transaction data.
The only features that have not been transformed by PCA are:
The target variable for this classification task is:
Important Note on Evaluation:
Given the substantial class imbalance (far more legitimate transactions than fraudulent ones), traditional accuracy metrics based on the confusion matrix can be misleading. It is strongly recommended to evaluate models using the Area Under the Precision-Recall Curve (AUPRC), as this metric is more sensitive to the performance on the minority class (fraudulent transactions).
How to Use This Dataset:
Acknowledgements and Citation:
This dataset has been collected and analyzed through a research collaboration between Worldline and the Machine Learning Group (MLG) of ULB (Université Libre de Bruxelles).
When using this dataset in your research or projects, please cite the following works as appropriate:
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains a wealth of customer information collected from within a consumer credit card portfolio, with the aim of helping analysts predict customer attrition. It includes comprehensive demographic details such as age, gender, marital status and income category, as well as insight into each customer’s relationship with the credit card provider such as the card type, number of months on book and inactive periods. Additionally it holds key data about customers’ spending behavior drawing closer to their churn decision such as total revolving balance, credit limit, average open to buy rate and analyzable metrics like total amount of change from quarter 4 to quarter 1, average utilization ratio and Naive Bayes classifier attrition flag (Card category is combined with contacts count in 12months period alongside dependent count plus education level & months inactive). Faced with this set of useful predicted data points across multiple variables capture up-to-date information that can determine long term account stability or an impending departure therefore offering us an equipped understanding when seeking to manage a portfolio or serve individual customers
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset can be used to analyze the key factors that influence customer attrition. Analysts can use this dataset to understand customer demographics, spending patterns, and relationship with the credit card provider to better predict customer attrition.
- Using the customer demographics, such as gender, marital status, education level and income category to determine which customer demographic is more likely to churn.
- Analyzing the customer’s spending behavior leading up to churning and using this data to better predict the likelihood of a customer of churning in the future.
- Creating a classifier that can predict potential customers who are more susceptible to attrition based on their credit score, credit limit, utilization ratio and other spending behavior metrics over time; this could be used as an early warning system for predicting potential attrition before it happens
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: BankChurners.csv | Column name | Description | |:---------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------| | CLIENTNUM | Unique identifier for each customer. (Integer) | | Attrition_Flag | Flag indicating whether or not the customer has churned out. (Boolean) | | Customer_Age | Age of customer. (Integer) | | Gender | Gender of customer. (String) | | Dependent_count | Number of dependents that customer has. (Integer) | | Education_Level ...
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Patyam Satya Lokesh
Released under Database: Open Database, Contents: Database Contents
This dataset was created by Muhammad Waqas
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML.
This dataset contains information about credit card balance. This data can be used for a lot of purposes such as credit card balance prediction. The columns in the given dataset are as follows: Income: Income of the customer. Limit: Credit limit provided to the customer. Rating: The customer's credit rating. Cards: The number of credit cards the customer has. Age: Age of the customer. Education: Educational level of the customer. Gender: Sex of the customer. Student: If the customer is a student or not. Married: If the customer is married. Ethnicity: Ethnicity of the customer. Balance: Credit balance of the customer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview: This record includes three datasets collected by the Consumer Financial Protection Bureau: Marketing Agreements and DataStudent Banking Reports to CongressDeposit Product Marking Agreements and DataCollege credit card marketing agreements and dataAs required by the Credit CARD Act of 2009, the Consumer Financial Protection Bureau (CFPB) collects information annually from credit card issuers who have marketing agreements with universities, colleges, or affiliated organizations such as alumni associations, sororities, fraternities, and foundations.The CFPB intends to continue updating the CSV file each year as it collects new data from college credit card issuers. The CFPB intends to ensure that the publicly available dataset is as accurate and complete as possible. This means that the dataset (as well as some of the charts and figures in this report) may not be completely consistent with past iterations of this report because submitting entities sometimes make corrections to earlier submissions. In all cases, the CFPB intends for the public dataset to be the CFPB’s definitive account of the data and it will be updated each year as new data becomes availableStudent banking reports to CongressThe Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010 instructs the Bureau to monitor for risks to consumers in the offering or provision of consumer financial products or services, particularly when those products pose a disproportionate risk to traditionally underserved populations.College deposit product marketing agreements and dataThis page presents information about banking products provided to college students pursuant to agreements between institutions of higher education and financial service providers and governed in part by the Department of Education's cash management regulations.The agreements and related information presented here are a sample of the data used in the CFPB's annual report to Congress and should not be considered comprehensive. The scope of the CFPB's observations was limited to the agreements and other public disclosures that were published by institutions related to each award year (interested parties should note that any information in place at the time of publication but absent from the institutional disclosures as of June of each award year may not have been evaluated). Nevertheless, review of publicly available information is helpful in providing an overview of significant market dynamics at a point in time.The CFPB intends to ensure that the publicly available dataset is as accurate and complete as possible. This means that the dataset may not be completely consistent with past iterations of this report because the CFPB sometimes makes corrections to the dataset. In all cases, the CFPB intends for the public dataset to be the CFPB’s definitive account of the data.
In 2019, around **** percent of internet users in Thailand were familiar with the use of credit card numbers and CVV authentication for online payments. CVV stands for card verification value which is used for verifying that the customer has a physical credit or debit card.
Data
We provide you with a data set in CSV format. The data set contains 2 lakhh+ record train instances and 56 thousand test instance There are 31 input features, labeled V1 to V28 and Amount .
The target variable is labeled Class.
Task - Create a Classification model to predict the target variable Class.
How to evaluate the model 1. Use the F1 Score for metrics 2. Any other evaluation measure that you believe is appropriate other than Accuracy.
The provided JSON file, derived from the project available at the specified Kaggle link, has been transformed into a CSV format for ease of analysis. This dataset likely encompasses credit card fraud-related information. It is structured as a tabular collection of data, with rows representing individual instances and columns containing various attributes. This dataset may include details such as transaction timestamps, transaction amounts, merchant information, and features related to fraud detection. Researchers and analysts can utilize this CSV dataset to investigate patterns, trends, and anomalies related to credit card fraud. The transformation to CSV simplifies data manipulation and exploration, facilitating data-driven insights and potentially aiding in the development of fraud detection algorithms and strategies. SOURCE https://www.kaggle.com/datasets/joebeachcapital/credit-card-fraud
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Credit Risk Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/praveengovi/credit-risk-classification-dataset on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This is Customer Transaction and Demographic related data , It holds Risky and Not Risky customer for specific banking products
Dataset is small in nature , It helps budding data scientist 👨🔬 👩🔬& Data Analyst to experiment Machine Learning and Statistical modelling concept
payment_data.csv: customer’s card payment history. id: customer id OVD_t1: number of times overdue type 1 OVD_t2: number of times overdue type 2 OVD_t3: number of times overdue type 3 OVD_sum: total overdue days pay_normal: number of times normal payment prod_code: credit product code prod_limit: credit limit of product update_date: account update date new_balance: current balance of product highest_balance: highest balance in history report_date: date of recent payment
customer’s demographic data and category attributes which have been encoded. Category features are fea_1, fea_3, fea_5, fea_6, fea_7, fea_9. label is 1, the customer is in high credit risk label is 0, the customer is in low credit risk
Thanks to Google Datasets search
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
This dataset help to find out weather customer is Credit Risky or Credit Worthy in Banking perspective
Q1 - What are the factors contributing to Credit Risky customer ? Q2 - Behaviour of Credit Worthy Customer ?
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Institutions of higher education play a critical role in supporting and promoting students’ overall financial health and well-being. A growing body of evidence suggests that relatively small financial shocks may cause acute financial hardship for students, potentially derailing their academic pursuits.The Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010 instructs the Bureau to monitor for risks to consumers in the offering or provision of consumer financial products or services, particularly when those products pose a disproportionate risk to traditionally underserved populations. Student banking reports to Congress These reports monitor the growth and impacts of financial products offered by or in conjunction with colleges, specifically focusing on marketing agreements for college-sponsored deposit and prepaid accounts and college-sponsored credit cards. College credit card marketing agreements and data As required by the Credit CARD Act of 2009, we collect information annually from credit card issuers who have marketing agreements with universities, colleges, or affiliated organizations such as alumni associations, sororities, fraternities, and foundations. We maintain publicly accessible files of the agreements.
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
CSV file detailing all individual transactions made by Surrey County Council using purchase cards (corporate credit card) during the period Jan-Mar 2016 (quarter 4). Purchase cards are used mainly by frontline services to support the work we do for residents. See Metadata tab for more details.
CSV file detailing all individual transactions made by Surrey County Council using purchase cards (corporate credit card) during the period Jan-Mar 2016 (quarter 4). Purchase cards are used mainly by frontline services to support the work we do for residents. Specific data schema details can be found on the Local Government Association's (LGA) website http://schemas.opendata.esd.org.uk/Spend.
The same information is available to download as 5 star Linked Data.
This data is published as part of Surrey's obligations for transparency, as set out in the Local Government Transparency Code 2014.
Update frequency: Quarterly
Review date: No later than end of the month after the quarter end
Temporal coverage: Q4 - Jan - Mar
Geographical coverage: pan-Surrey (though no spatial data published)
Data lineage: Data extracted from SAP, processed to remove irrelevant fields, personal data redacted and re-formatted according to LGA data schema (see above link)
Maintainer contact: Payments Team, Orbis Business Services
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides access to data about general purpose credit cards, which are open-end loans used by consumers to pay for day-to-day expenses, finance purchases, or provide cash advances.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Don't ask me where this data come from, the answer is I don't know!
Credit score cards are a common risk control method in the financial industry. It uses personal information and data submitted by credit card applicants to predict the probability of future defaults and credit card borrowings. The bank is able to decide whether to issue a credit card to the applicant. Credit scores can objectively quantify the magnitude of risk.
Generally speaking, credit score cards are based on historical data. Once encountering large economic fluctuations. Past models may lose their original predictive power. Logistic model is a common method for credit scoring. Because Logistic is suitable for binary classification tasks and can calculate the coefficients of each feature. In order to facilitate understanding and operation, the score card will multiply the logistic regression coefficient by a certain value (such as 100) and round it.
At present, with the development of machine learning algorithms. More predictive methods such as Boosting, Random Forest, and Support Vector Machines have been introduced into credit card scoring. However, these methods often do not have good transparency. It may be difficult to provide customers and regulators with a reason for rejection or acceptance.
Build a machine learning model to predict if an applicant is 'good' or 'bad' client, different from other tasks, the definition of 'good' or 'bad' is not given. You should use some techique, such as vintage analysis to construct you label. Also, unbalance data problem is a big problem in this task.
There're two tables could be merged by ID
:
application_record.csv | ||
---|---|---|
Feature name | Explanation | Remarks |
ID | Client number | |
CODE_GENDER | Gender | |
FLAG_OWN_CAR | Is there a car | |
FLAG_OWN_REALTY | Is there a property | |
CNT_CHILDREN | Number of children | |
AMT_INCOME_TOTAL | Annual income | |
NAME_INCOME_TYPE | Income category | |
NAME_EDUCATION_TYPE | Education level | |
NAME_FAMILY_STATUS | Marit... |
A retail bank would like to hire you to build a credit default model for their credit card portfolio. The bank expects the model to identify the consumers who are likely to default on their credit card payments over the next 12 months. This model will be used to reduce the bank’s future losses. The bank is willing to provide you with some sample datathat they can currently extract from their systems. This data set (credit_data.csv) consists of 13,444 observations with 14 variables.
Based on the bank’s experience, the number of derogatory reports is a strong indicator of default. This is all that the information you are able to get from the bank at the moment. Currently, they do not have the expertise to provide any clarification on this data and are also unsure about other variables captured by their systems
Credit card default risk is the chance that companies or individuals will not be able to return the money lent on time.
This dataset contains the following files: - train.csv: 45528 x 19 - test.csv: 11383 x 18 - sample_submission.csv: 5 x 2
The dataset belongs to American Express. It's shared here only for educational purpose.
Find out which customer might default.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:
Context:
Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.
Inspiration:
The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.
Dataset Information:
The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:
Use Cases:
Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title: Credit Card Transactions Dataset for Fraud Detection (Used in: A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning)Description:This dataset, commonly known as creditcard.csv, contains anonymized credit card transactions made by European cardholders in September 2013. It includes 284,807 transactions, with 492 labeled as fraudulent. Due to confidentiality constraints, features have been transformed using PCA, except for 'Time' and 'Amount'.This dataset was used in the research article titled "A Hybrid Anomaly Detection Framework Combining Supervised and Unsupervised Learning for Credit Card Fraud Detection". The study proposes an ensemble model integrating techniques such as Autoencoders, Isolation Forest, Local Outlier Factor, and supervised classifiers including XGBoost and Random Forest, aiming to improve the detection of rare fraudulent patterns while maintaining efficiency and scalability.Key Features:30 numerical input features (V1–V28, Time, Amount)Class label indicating fraud (1) or normal (0)Imbalanced class distribution typical in real-world fraud detectionUse Case:Ideal for benchmarking and evaluating anomaly detection and classification algorithms in highly imbalanced data scenarios.Source:Originally published by the Machine Learning Group at Université Libre de Bruxelles.https://www.kaggle.com/mlg-ulb/creditcardfraudLicense:This dataset is distributed for academic and research purposes only. Please cite the original source when using the dataset.