Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.
Key Features:
This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Many projects require datasets about bank transactions to test their systems. Unfortunately, it is hard to find a dataset that would have transaction product categorization which is important for many analytical projects.
There you have 4 datasets. Clients - basic information about bank users. Categories - standart transaction categories which are being by many banks worldwide. Transactions - the core of our dataset, basic information about transactions like who is the second account of transaction, category, amount, etc. Subscriptions - information about subscriptions, in other words, transactions which are made automatically.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML.
This dataset was created by Süfyan Taşkın
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Miguel Esteban Gómez
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Phishing Dataset for Machine Learning’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/phishing-dataset-for-machine-learning on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Anti-phishing refers to efforts to block phishing attacks. Phishing is a kind of cybercrime where attackers pose as known or trusted entities and contact individuals through email, text or telephone and ask them to share sensitive information. Typically, in a phishing email attack, and the message will suggest that there is a problem with an invoice, that there has been suspicious activity on an account, or that the user must login to verify an account or password. Users may also be prompted to enter credit card information or bank account details as well as other sensitive data. Once this information is collected, attackers may use it to access accounts, steal data and identities, and download malware onto the user’s computer.
This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to the parsing approach based on regular expressions.
Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.
Tan, Choon Lin (2018), “Phishing Dataset for Machine Learning: Feature Evaluation”, Mendeley Data, V1, doi: 10.17632/h3cgnj8hft.1 Source of the Dataset.
--- Original source retains full ownership of the source dataset ---
Item | Description |
---|---|
BALANCE | Outstanding balance on the credit card account |
BALANCE_FREQUENCY | How often the balance is updated |
PURCHASES | Total amount of purchases made on the credit card |
ONEOFF_PURCHASES | Total amount of one-time purchases made on the credit card |
INSTALLMENTS_PURCHASES | Total amount of purchases made on the credit card that were paid back in installments |
CASH_ADVANCE | Amount of cash withdrawn from the credit card account as a cash advance |
PURCHASES_FREQUENCY | How often purchases are made on the credit card |
ONEOFF_PURCHASES_FREQUENCY | How often one-time purchases are made on the credit card |
PURCHASES_INSTALLMENTS_FREQUENCY | How often purchases that are paid back in installments are made on the credit card |
CASH_ADVANCE_FREQUENCY | How often cash advances are taken out on the credit card |
CASH_ADVANCE_TRX | Number of cash advance transactions made on the credit card account |
PURCHASES_TRX | Number of purchase transactions made on the credit card account |
CREDIT_LIMIT | Maximum amount of credit the customer is allowed to use on the credit card |
PAYMENTS | Total amount of payments made on the credit card account |
MINIMUM_PAYMENTS | Minimum amount of payments required on the credit card account |
PRC_FULL_PAYMENT | Percentage of the balance that is paid in full by the customer each month |
TENURE | Number of years the customer has been using the credit card account |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘UPI apps Transactions in 2021’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ramjasmaurya/upi-apps-transactions-in-2021 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
https://miro.medium.com/max/1400/1*94MvdhxeCQHoD7A4K1vlWg.png">
Unified Payments Interface (UPI) is an instant real-time payment system developed by National Payments Corporation of India (NPCI) facilitating inter-bank peer-to-peer (P2P) and person-to-merchant (P2M) transactions.NPCI is umbrella organisation for all digital payments. The interface is regulated by the Reserve Bank of India (RBI) and works by instantly transferring funds between two bank accounts on a mobile platform. As of November 2021, there are 274 banks available on UPI with a monthly volume of 4.18 billion transactions and a value of ₹7.1 trillion (US$94 billion) UPI witnessed 68 billion transactions till November 2021. The mobile-only payment system helped transact a total of ₹34.95 lakh crore (US$460 billion) during the 67 months of operation starting from 2016. As of May 2021, the platform has 150 million monthly active users in India with plans to achieve 500 million by 2025. IIT Madras is also working to integrate voice command feature that can support English and Indian vernacular language in future. The proportion of UPI transactions in total volume of digital transactions grew from 23% in 2018-19 to 55% in 2020-21 with an average value of ₹1,849 per transaction
--- Original source retains full ownership of the source dataset ---
This data set is the transaction data leaked by mt.gox exchange.
First, we combine the buy and sell transaction fields of the same transaction, and then de duplicate them through transaction time, transaction account, etc. to ensure the uniqueness of each transaction data. This transaction data is very useful for analyzing the user behavior of bitcoin market.
We have done a market manipulation study using this data set.
For more details about blockchain dataset, please click here.
In the Target column, Normal(No Attack) = 0, Neptune Attack =1
Description of the columns present in the Dataset:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset is all about credit card defaulters. It contains 5 Columns 1) ID - Id of customer 2) Default - Is the person a loan defaulter 3) Student - Is the person a student 4) Balance - balance in his/her account 5) Income - His/Her income
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About the Dataset This dataset contains financial transaction records and risk management data for accounting systems. It includes a variety of transactional data, such as transaction IDs, amounts, categories, and payment methods, alongside associated risk incidents like fraud, errors, and misstatements. The dataset also captures system metadata, such as user activity, transaction processing time, login frequency, and geographical region of the IP. The data is designed to simulate real-world accounting system operations and risk events, enabling the development and testing of AI-driven risk prediction models. The dataset can be used for research in real-time financial risk management, fraud detection, and improving decision-making processes in accounting systems using artificial intelligence.
Context Collection of Facebook spam-legit profile and content-based data. It can be used for classification tasks.
Content The dataset can be used for building machine learning models. To collect the dataset, Facebook API and Facebook Graph API are used and the data is collected from public profiles. There are 500 legit profiles and 100 spam profiles. The list of features is as follows with Label (0-legit, 1-spam). 1. Number of friends 2. Number of followings 3. Number of Community 4. The age of the user account (in days) 5. Total number of posts shared 6. Total number of URLs shared 7. Total number of photos/videos shared 8. Fraction of the posts containing URLs 9. Fraction of the posts containing photos/videos 10. Average number of comments per post 11. Average number of likes per post 12. Average number of tags in a post (Rate of tagging) 13. Average number of hashtags present in a post
Inspiration Dataset helps the community to understand how features can help to differ Facebook legit users from spam users.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a comprehensive collection of historical price records for the top 1000 cryptocurrencies. The data in this dataset is updated daily, providing a reliable and up-to-date source of information for cryptocurrency traders, researchers, and enthusiasts.
Each file in the dataset includes the following columns: date, open price, high price, low price, closing price, adjusted closing price, and trading volume. These columns provide a detailed picture of the daily price movements and trading activity of each cryptocurrency in the dataset.
The "date" column indicates the day on which the price data was recorded, while the "open" column provides the opening price of the cryptocurrency for that day. The "high" and "low" columns indicate the highest and lowest prices of the cryptocurrency on that day, respectively. The "close" column represents the closing price of the cryptocurrency on that day, while the "adjusted close" column takes into account any dividends or other corporate actions that may have affected the price. Finally, the "volume" column shows the trading volume of the cryptocurrency on that day.
With this dataset, users can analyze and visualize the performance of individual cryptocurrencies, compare them to one another, and track trends over time. The data is ideal for use in machine learning models, predictive analytics, and other data-driven applications.
This data-set contains >300,000 anonymized transactions. The variables are anonymized to protect the consumers information but they represent fields such as how long has the consumer had the account in a way which protects the information. Each row represents a users transaction. This data-set was built so that using the classifier you can build a model which can use the anonymized variables to predict which transactions are potentially fraudulent.
The data-set contains a fraud rate of ~0.1% and thus is highly unbalanced.
The variables are as follows: Time, anonymized variables (30 variables), $ Amount, Class (Fraud Classifier)
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
The data are detailed series underlying the Financial Accounts, ABS Cat NoAA 5232.0. They cover special purpose vehicles registered or incorporated in Australia to securitise selected assets, and whose issues are independently rated by a recognised rating agency. See :Changes to Tables-C/ in the DecemberA 1996 issue of the Bulletin for a further discussion of securitisation vehicles. Some data prior to JuneA 1993 are partly estimated.
:Mortgages-C/ include both residential and non-residential mortgages.
:Other loans and placements-C/ include operating lease and lease finance receivables, secured loans to originators and loans secured by other types of assets.
Holdings of :Asset-backed bonds-C/ refers to individual securitisation vehicles-C/ holdings of asset-backed bonds issued by other securitisation vehicles.
:All other assets-C/ include cash and deposits with Australian banks and corporations registered under the Financial Sector (Collection of Data) Act 2001 and all other claims not already included.
:Other liabilities-C/ include loans and advances from Australian banks, corporations registered under the Financial Sector (Collection of Data) Act 2001 and other financial institutions, along with all other liabilities not already included.
Customer ID: Unique identifier for each credit card holder. Balance: Remaining balance in the customer's account. Balance Frequency: How often the balance is updated (score between 0 and 1, with 0 indicating infrequent updates and 1 signifying frequent updates). Purchases: Total amount of purchases made from the account. One-Off Purchases: Maximum purchase amount made in a single transaction. Installment Purchases: Amount of purchases made in installments. Cash Advance: Amount of cash advanced using the credit card. Purchases Frequency: How often purchases are made (score between 0 and 1, similar to balance frequency). One-Off Purchases Frequency: How often customers make one-time purchases. Installment Purchases Frequency: How often customers make installment purchases. Cash Advance Frequency: How often customers take cash advances. Cash Advance Transactions: Number of cash advance transactions. Purchases Transactions: Number of purchase transactions. Credit Limit: Maximum credit limit for the specific user. Payments: Total amount of payments made by the user. Minimum Payment: Minimum payment amount required by the user. Percentage of Full Payment: Percentage of the total balance paid by the user (0 indicates no payment, 100 indicates full payment). Tenure: Length of time the customer has been a credit card user.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A Recipient-executed Grant is a Trust Fund Grant that is provided to a third party under a grant agreement, and for which the Bank plays an operational role - i.e., the Bank normally appraises and supervises activities financed by these funds. This dataset provides data on the amount of grant funds committed in the course of a fiscal year and payments made out of a Trust Fund account to eligible recipients, in accordance with the legal agreements. In fulfilling its responsibilities, the World Bank as Trustee complies with all sanctions applicable to World Bank transactions. All definitions should be regarded at present as provisional and not final, and are subject to revision at any time. Data is provided at the individual Trust Fund level and is updated as of 04/02/2015. No further updates are planned for this particular dataset, please visit the Global Partnership and Trust Fund Operations website for more details: http://go.worldbank.org/GABMG2YEI0
This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore World Bank's Financial Data using Kaggle and all of the data sources available through the World Bank organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
This dataset is distributed under a Creative Commons Attribution 3.0 IGO license.
Cover photo by Joseph Gonzalez on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
This dataset is distributed under Creative Commons Attribution 3.0 IGO
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.
It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
The columns in this dataset are:
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.
Key Features:
This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.