Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.
| Column Name | Description |
|---|---|
| Transaction_ID | Unique identifier for each transaction |
| User_ID | Unique identifier for the user |
| Transaction_Amount | Amount of money involved in the transaction |
| Transaction_Type | Type of transaction (Online, In-Store, ATM, etc.) |
| Timestamp | Date and time of the transaction |
| Account_Balance | User's current account balance before the transaction |
| Device_Type | Type of device used (Mobile, Desktop, etc.) |
| Location | Geographical location of the transaction |
| Merchant_Category | Type of merchant (Retail, Food, Travel, etc.) |
| IP_Address_Flag | Whether the IP address was flagged as suspicious (0 or 1) |
| Previous_Fraudulent_Activity | Number of past fraudulent activities by the user |
| Daily_Transaction_Count | Number of transactions made by the user that day |
| Avg_Transaction_Amount_7d | User's average transaction amount in the past 7 days |
| Failed_Transaction_Count_7d | Count of failed transactions in the past 7 days |
| Card_Type | Type of payment card used (Credit, Debit, Prepaid, etc.) |
| Card_Age | Age of the card in months |
| Transaction_Distance | Distance between the user's usual location and transaction location |
| Authentication_Method | How the user authenticated (PIN, Biometric, etc.) |
| Risk_Score | Fraud risk score computed for the transaction |
| Is_Weekend | Whether the transaction occurred on a weekend (0 or 1) |
| Fraud_Label | Target variable (0 = Not Fraud, 1 = Fraud) |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 5 million synthetically generated financial transactions designed to simulate real-world behavior for fraud detection research and machine learning applications. Each transaction record includes fields such as:
Transaction Details: ID, timestamp, sender/receiver accounts, amount, type (deposit, transfer, etc.)
Behavioral Features: time since last transaction, spending deviation score, velocity score, geo-anomaly score
Metadata: location, device used, payment channel, IP address, device hash
Fraud Indicators: binary fraud label (is_fraud) and type of fraud (e.g., money laundering, account takeover)
The dataset follows realistic fraud patterns and behavioral anomalies, making it suitable for:
Binary and multiclass classification models
Fraud detection systems
Time-series anomaly detection
Feature engineering and model explainability
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Some of these records were flagged false by existing algorithms.
Further approaches could be used to feature engineer properties that could further strengthen the fraud detection algorithms as well as find out where the existing algorithm lacks.
CASH-IN: is the process of increasing the balance of account by paying in cash to a merchant.
CASH-OUT: is the opposite process of CASH-IN, it means to withdraw cash from a merchant which decreases the balance of the account.
DEBIT: is similar process than CASH-OUT and involves sending the money from the mobile money service to a bank account.
PAYMENT: is the process of paying for goods or services to merchants which decreases the balance of the account and increases the balance of the receiver.
TRANSFER: is the process of sending money to another user of the service through the mobile money platform
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Project Objectives Provider Fraud is one of the biggest problems facing Medicare. According to the government, the total Medicare spending increased exponentially due to frauds in Medicare claims. Healthcare fraud is an organized crime which involves peers of providers, physicians, beneficiaries acting together to make fraud claims.
Rigorous analysis of Medicare data has yielded many physicians who indulge in fraud. They adopt ways in which an ambiguous diagnosis code is used to adopt costliest procedures and drugs. Insurance companies are the most vulnerable institutions impacted due to these bad practices. Due to this reason, insurance companies increased their insurance premiums and as result healthcare is becoming costly matter day by day.
Healthcare fraud and abuse take many forms. Some of the most common types of frauds by providers are:
a) Billing for services that were not provided.
b) Duplicate submission of a claim for the same service.
c) Misrepresenting the service provided.
d) Charging for a more complex or expensive service than was actually provided.
e) Billing for a covered service when the service actually provided was not covered.
Problem Statement The goal of this project is to " predict the potentially fraudulent providers " based on the claims filed by them.along with this, we will also discover important variables helpful in detecting the behaviour of potentially fraud providers. further, we will study fraudulent patterns in the provider's claims to understand the future behaviour of providers.
Introduction to the Dataset For the purpose of this project, we are considering Inpatient claims, Outpatient claims and Beneficiary details of each provider. Lets s see their details :
A) Inpatient Data
This data provides insights about the claims filed for those patients who are admitted in the hospitals. It also provides additional details like their admission and discharge dates and admit d diagnosis code.
B) Outpatient Data
This data provides details about the claims filed for those patients who visit hospitals and not admitted in it.
C) Beneficiary Details Data
This data contains beneficiary KYC details like health conditions,regioregion they belong to etc.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Digital payments are evolving, but so are cyber criminals.
According to the Data Breach Index, more than 5 million records are being stolen on a daily basis, a concerning statistic that shows - fraud is still very common both for Card-Present and Card-not Present type of payments.
In today’s digital world where trillions of Card transaction happens per day, detection of fraud is challenging.
This Dataset sourced by some unnamed institute.
Feature Explanation:
distance_from_home - the distance from home where the transaction happened.
distance_from_last_transaction - the distance from last transaction happened.
ratio_to_median_purchase_price - Ratio of purchased price transaction to median purchase price.
repeat_retailer - Is the transaction happened from same retailer.
used_chip - Is the transaction through chip (credit card).
used_pin_number - Is the transaction happened by using PIN number.
online_order - Is the transaction an online order.
fraud - Is the transaction fraudulent.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
There is a lack of public available datasets on financial services and specially in the emerging mobile money transactions domain. Financial datasets are important to many researchers and in particular to us performing research in the domain of fraud detection. Part of the problem is the intrinsically private nature of financial transactions, that leads to no publicly available datasets.
We present a synthetic dataset generated using the simulator called PaySim as an approach to such a problem. PaySim uses aggregated data from the private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behaviour to later evaluate the performance of fraud detection methods.
PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.
This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle.
This is a sample of 1 row with headers explanation:
1,PAYMENT,1060.31,C429214117,1089.0,28.69,M1591654462,0.0,0.0,0,0
step - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).
type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.
amount - amount of the transaction in local currency.
nameOrig - customer who started the transaction
oldbalanceOrg - initial balance before the transaction
newbalanceOrig - new balance after the transaction.
nameDest - customer who is the recipient of the transaction
oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).
newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).
isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.
isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.
There are 5 similar files that contain the run of 5 different scenarios. These files are better explained at my PhD thesis chapter 7 (PhD Thesis Available here http://urn.kb.se/resolve?urn=urn:nbn:se:bth-12932.
We ran PaySim several times using random seeds for 744 steps, representing each hour of one month of real time, which matches the original logs. Each run took around 45 minutes on an i7 intel processor with 16GB of RAM. The final result of a run contains approximately 24 million of financial records divided into the 5 types of categories: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.
This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.
Please refer to this dataset using the following citations:
PaySim first paper of the simulator:
E. A. Lopez-Rojas , A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016
Facebook
TwitterAt LOL Bank Pvt. Ltd., ensuring the safety and integrity of economic transactions is a top priority. With increasingly more on line transactions and digital banking activities, fraudulent transactions have end up a good sized danger to both the financial institution and its customers. Fraudulent activities, along with unauthorized account get right of entry to, identification robbery, and suspicious transaction patterns, bring about economic losses and harm to patron agree with.
To cope with this developing subject, LOL Bank Pvt. Ltd. Is in search of a strategy to stumble on and save you fraudulent transactions in real time. This includes analyzing ancient transaction records, consisting of account info, transaction quantities, service provider records, and time stamps, to pick out patterns indicative of fraudulent conduct. The intention is to construct a robust fraud detection gadget that may distinguish among legitimate transactions and probably fraudulent ones, with minimal fake positives.
The answer must incorporate device learning algorithms to study from transaction history, allowing the machine to become aware of rising fraud strategies and adapt to evolving threats. The gadget must be able to flag suspicious transactions in real time, providing bank employees with actionable insights to take activate action. By enhancing fraud detection abilities, LOL Bank Pvt. Ltd. Objectives to shield patron belongings, lessen financial losses, and keep its reputation as a secure and honest economic organization.
Here are the information of the columns:
These column descriptions give a clear expertise of the facts as a way to be used for fraud detection analysis.
At LOL Bank Pvt. Ltd., making sure the safety of patron financial transactions is paramount. With the rise of digital banking, the growth in transaction extent has unfolded greater opportunities for fraudulent activities, which could significantly affect the bank's recognition and lead to substantial financial losses. The undertaking is to accurately hit upon and prevent fraud while preserving a continuing banking revel in for clients. The key aspects of this trouble are as follows:
Nature of the Problem:
- Fraudulent transactions encompass unauthorized account get right of entry to, cash laundering, identity robbery, and uncommon transaction styles.
- Traditional strategies of fraud detection are regularly reactive, main to behind schedule identity of fraud.
- Fraudsters continuously evolve their tactics, making it harder to discover new forms of fraud the use of conventional strategies.
Data Available:
- The dataset includes historic transaction facts, which includes transaction information consisting of:
- Transaction ID, ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed for click fraud detection in Cost-Per-Action (CPA) online advertising. It contains 5,000 click records, with features related to user behavior, device information, and interaction patterns. The dataset includes both legitimate and fraudulent clicks, allowing researchers and data scientists to develop and evaluate AI-based fraud detection models.
Key Features Click Behavior: Click duration, scroll depth, mouse movements, keystrokes detected User & Device Info: Device type, browser, operating system, IP reputation Network Security: VPN usage, proxy usage, IP address Fraud Labels: is_fraudulent (1 = Fraudulent Click, 0 = Legitimate Click)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Nature of Data: This dataset contains fictitious data designed for educational and testing purposes in fraud detection algorithms. It does not represent real-world financial transactions or individuals.
Purpose of Creation: The dataset was generated to provide a realistic example for developing and evaluating fraud detection models without relying on sensitive real-world data. It's intended for students, researchers, and practitioners to practice data analysis and machine learning techniques in a safe environment.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset presents a synthetic representation of mobile money transactions, meticulously crafted to mirror the complexities of real-world financial activities while integrating fraudulent behaviors for research purposes. Derived from a simulator named PaySim, which utilizes aggregated data from actual financial logs of a mobile money service in an African country, this dataset aims to fill the gap in publicly available financial datasets for fraud detection studies. It encompasses a variety of transaction types including CASH-IN, CASH-OUT, DEBIT, PAYMENT, and TRANSFER over a simulated period of 30 days, providing a comprehensive environment for evaluating fraud detection methodologies. By addressing the intrinsic privacy concerns associated with financial transactions, this dataset offers a unique resource for researchers and analysts in the field of financial security and fraud detection, scaled to 1/4 of the original dataset size for efficient use within the Kaggle platform. Please note that transactions marked as fraudulent have been nullified, emphasizing the importance of non-balance columns for fraud analysis. This dataset is a contribution to the field from the "Scalable resource-efficient systems for big data analytics" project, funded by the Knowledge Foundation in Sweden.
PaySim synthesizes mobile money transactions using data derived from a month's worth of financial logs from a mobile money service operating in an African country. These logs were provided by a multinational company that offers this financial service across more than 14 countries globally.
This synthetic dataset has been scaled to one-quarter the size of the original dataset and is specifically tailored for Kaggle.
Important Note: Transactions identified as fraudulent are annulled. Hence, for fraud detection analysis, the following columns should not be utilized: oldbalanceOrg, newbalanceOrig, oldbalanceDest, newbalanceDest.
This dataset has been generated through multiple runs of the PaySim simulator, each simulating a month of real-time transactions over 744 steps. Each run produced approximately 24 million financial records across the five transaction categories.
This project is part of the "Scalable resource-efficient systems for big data analytics" research, supported by the Knowledge Foundation (grant: 20140032) in Sweden.
For citations and further references, please use:
E. A. Lopez-Rojas, A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Vehicle insurance fraud involves conspiring to make false or exaggerated claims involving property damage or personal injuries following an accident. Some common examples include staged accidents where fraudsters deliberately “arrange” for accidents to occur; the use of phantom passengers where people who were not even at the scene of the accident claim to have suffered grievous injury, and make false personal injury claims where personal injuries are grossly exaggerated.
This dataset contains vehicle dataset - attribute, model, accident details, etc along with policy details - policy type, tenure etc. The target is to detect if a claim application is fraudulent or not - FraudFound_P
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.
The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set
This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.
This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a fabricated dataset which is made by merging two dataset, Dataset1.csv and Dataset2.csv .
The final dataset which merged_dataset.csv is a synthetic dataset, using probabilistic imputation to handle missing values.
Balancing the Dataset: The dataset, which was initially imbalanced, was balanced using the ROSE (Random Over-Sampling Examples) package to ensure equal representation of fraudulent and non-fraudulent transactions.
This dataset was used for my group and school project report. You can check out my code for this project, through this https://github.com/slothislazy/DM_AOL
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset simulates real-world banking transactions, including both legitimate and fraudulent activity. It includes detailed features such as transaction amount, time, type, location, device type, and historical user behavior. Designed for binary classification, this dataset is ideal for training and evaluating machine learning models for fraud detection. This dataset contains simulated financial transactions labeled as fraudulent or legitimate. It includes the following features:
transaction_id: Unique identifier for each transaction
customer_id: Anonymized customer ID
transaction_amount: Value of the transaction in currency units
transaction_type: Type of transaction (e.g., payment, transfer)
transaction_time: Timestamp of when the transaction occurred
transaction_location: Region where the transaction was initiated
device_type: Device used (e.g., mobile, POS, desktop)
previous_transactions_count: Number of recent transactions by the same customer
is_fraud: Target label indicating fraud (1) or not (0)
This dataset is ideal for binary classification tasks such as fraud detection using machine learning.
Facebook
TwitterThe dataset has one training dataset, one testing (unseen) dataset, which is unlabeled, and a clickstream dataset, all interconnected through a common identifier known as "SESSION_ID." This identifier allows us to link user actions across the datasets. A session involves client online banking activities like signing in, updating passwords, viewing products, or adding items to the cart.
Majority of fraud cases add new shipping address, or change password. you can do visualization to get more insights about the nature of frauds.
I also added 2 datasets named "train/test_dataset_combined" which are the merged version of the train and test datasets based on the "SESSION_ID" column. For more information, please refer to this link: https://www.kaggle.com/code/mohammadbolandraftar/combine-datasets-in-pandas
In addition, I added the cleaned dataset after doing EDA. For more information about the EDA process, please refer to this link: https://www.kaggle.com/code/mohammadbolandraftar/a-deep-dive-into-fraud-detection-through-eda
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Creating a comprehensive dataset for fraud prevention and prescription in a bank involves collecting and generating various data points. In practice, data collection and generation can be a complex and time-consuming process. This dataset will be much simpler than a real-world fraud detection dataset but can serve as a starting point for a notebook. You can then expand and refine it as needed. This Kaggle dataset contains synthetic sales data designed for data analytics practice and hackathons. The dataset is entirely computer generated and does not contain any real-world information, ensuring privacy and data protection.
Key Features:
Transaction_Id Customer_Id Merchant_Id Amount Transaction time Card_type Location Purchase_category Customer_Age Is_fraudulent
Use Cases:
Practice data cleaning and preprocessing techniques. Explore time series analysis and forecasting. Develop customer segmentation models. Investigate product performance and inventory management. Experiment with recommendation systems and personalized marketing.
Note: This dataset is entirely synthetic and does not represent any real-world sales data. It is intended for educational and practice purposes only.
Attribution: If you use this dataset in your work, please attribute it to the original creator.
Creator: Ishita Biswas
Feel free to customize this description with any additional information or details about the specific characteristics of your required dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Rupeswara Babu Sangoju
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Youssef Dessouky
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Credit Card Transactions Dataset provides detailed records of credit card transactions, including information about transaction times, amounts, and associated personal and merchant details. This dataset has over 1.85M rows.
How This Dataset Can Be Used:
Fraud Detection : Use machine learning models to identify fraudulent transactions by examining patterns in transaction amounts, locations, and user profiles. Enhancing fraud detection systems becomes feasible by analyzing behavioral patterns.
Customer Segmentation : Segment customers based on spending patterns, location, and demographics. Tailor marketing strategies and personalized offers to these different customer segments for better engagement.
Transaction Classification : Classify transactions into categories such as grocery or entertainment to understand spending behaviors. This helps in improving recommendation systems by identifying transaction categories and preferences.
Geospatial Analysis : Analyze transaction data geographically to map spending patterns and detect regional trends or anomalies based on latitude and longitude.
Predictive Modeling : Build models to forecast future spending behavior using historical transaction data. Predict potential fraudulent activities and financial trends.
Behavioral Analysis : Examine how factors like transaction amount, merchant type, and time influence spending behavior. Study the relationships between user demographics and transaction patterns.
Anomaly Detection : Identify unusual transaction patterns that deviate from normal behavior to detect potential fraud early. Employ anomaly detection techniques to spot outliers and suspicious activities.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Healthcare fraud is considered a challenge for many societies. Health care funding that could be spent on medicine, care for the elderly, or emergency room visits is instead lost to fraudulent activities by materialistic practitioners or patients. With rising healthcare costs, healthcare fraud is a major contributor to these increasing healthcare costs.
Try out various unsupervised techniques to find the anomalies in the data.
Detailed Data File:
The following variables are included in the detailed Physician and Other Supplier data file (see Appendix A for a condensed version of variables included)).
npi – National Provider Identifier (NPI) for the performing provider on the claim. The provider NPI is the numeric identifier registered in NPPES.
nppes_provider_last_org_name – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s last name. When the provider is registered as an organization (entity type code = ‘O’), this is the organization's name.
nppes_provider_first_name – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s first name. When the provider is registered as an organization (entity type code = ‘O’), this will be blank.
nppes_provider_mi – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s middle initial. When the provider is registered as an organization (entity type code= ‘O’), this will be blank.
nppes_credentials – When the provider is registered in NPPES as an individual (entity type code=’I’), these are the provider’s credentials. When the provider is registered as an organization (entity type code = ‘O’), this will be blank.
nppes_provider_gender – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s gender. When the provider is registered as an organization (entity type code = ‘O’), this will be blank.
nppes_entity_code – Type of entity reported in NPPES. An entity code of ‘I’ identifies providers registered as individuals and an entity type code of ‘O’ identifies providers registered as organizations.
nppes_provider_street1 – The first line of the provider’s street address, as reported in NPPES.
nppes_provider_street – The second line of the provider’s street address, as reported in NPPES.
nppes_provider_city – The city where the provider is located, as reported in NPPES.
nppes_provider_zip – The provider’s zip code, as reported in NPPES.
nppes_provider_state – The state where the provider is located, as reported in NPPES. The fifty U.S. states and the District of Columbia are reported by the state postal abbreviation. The following values are used for all other areas:
'XX' = 'Unknown' 'AA' = 'Armed Forces Central/South America' 'AE' = 'Armed Forces Europe' 'AP' = 'Armed Forces Pacific' 'AS' = 'American Samoa' 'GU' = 'Guam' 'MP' = 'North Mariana Islands' 'PR' = 'Puerto Rico' 'VI' = 'Virgin Islands' 'ZZ' = 'Foreign Country'
nppes_provider_country – The country where the provider is located, as reported in NPPES. The country code will be ‘US’ for any state or U.S. possession. For foreign countries (i.e., state values of ‘ZZ’), the provider country values include the following: AE=United Arab Emirates IT=Italy AG=Antigua JO= Jordan AR=Argentina JP=Japan AU=Australia KR=Korea BO=Bolivia KW=Kuwait BR=Brazil KY=Cayman Islands CA=Canada LB=Lebanon CH=Switzerland MX=Mexico CN=China NL=Netherlands CO=Colombia NO=Norway DE= Germany NZ=New Zealand ES= Spain PA=Panama FR=France PK=Pakistan GB=Great Britain RW=Rwanda GR=Greece SA=Saudi Arabia HU= Hungary SY=Syria IL= Israel TH=Thailand IN=India TR=Turkey IS= Iceland VE=Venezuela
provider_type – Derived from the provider specialty code reported on the claim.
medicare_participation_indicator – Identifies whether the provider participates in Medicare and/or accepts the assigned assignment of Medicare allowed amounts.
place_of_service – Identifies whether the place of service submitted on the claims is a facility (value of ‘F’) or non-facility (value of ‘O’). Non-facility is generally an office setting; however other entities are included in non-facility.
hcpcs_code – HCPCS code used to identify the specific medical service furnished by the provider.
hcpcs_description – Description of the HCPCS code for the specific medical service furnished by the provider.
hcpcs_drug_indicator –Identifies whether the HCPCS code for the specific service furnished by the provider is an HCPCS listed on the Medicare Part B Drug Average Sales Price (ASP) File.
line_srvc_cnt – Number of services provided; note that the metrics used to count the number provided can vary from service to service.
bene_unique_cnt – Number of distinct Medicare beneficiaries rec...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.
| Column Name | Description |
|---|---|
| Transaction_ID | Unique identifier for each transaction |
| User_ID | Unique identifier for the user |
| Transaction_Amount | Amount of money involved in the transaction |
| Transaction_Type | Type of transaction (Online, In-Store, ATM, etc.) |
| Timestamp | Date and time of the transaction |
| Account_Balance | User's current account balance before the transaction |
| Device_Type | Type of device used (Mobile, Desktop, etc.) |
| Location | Geographical location of the transaction |
| Merchant_Category | Type of merchant (Retail, Food, Travel, etc.) |
| IP_Address_Flag | Whether the IP address was flagged as suspicious (0 or 1) |
| Previous_Fraudulent_Activity | Number of past fraudulent activities by the user |
| Daily_Transaction_Count | Number of transactions made by the user that day |
| Avg_Transaction_Amount_7d | User's average transaction amount in the past 7 days |
| Failed_Transaction_Count_7d | Count of failed transactions in the past 7 days |
| Card_Type | Type of payment card used (Credit, Debit, Prepaid, etc.) |
| Card_Age | Age of the card in months |
| Transaction_Distance | Distance between the user's usual location and transaction location |
| Authentication_Method | How the user authenticated (PIN, Biometric, etc.) |
| Risk_Score | Fraud risk score computed for the transaction |
| Is_Weekend | Whether the transaction occurred on a weekend (0 or 1) |
| Fraud_Label | Target variable (0 = Not Fraud, 1 = Fraud) |