13 datasets found
  1. Bank Transaction Dataset for Fraud Detection

    • kaggle.com
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vala khorasani (2024). Bank Transaction Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    vala khorasani
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.

    Key Features:

    • TransactionID: Unique alphanumeric identifier for each transaction.
    • AccountID: Unique identifier for each account, with multiple transactions per account.
    • TransactionAmount: Monetary value of each transaction, ranging from small everyday expenses to larger purchases.
    • TransactionDate: Timestamp of each transaction, capturing date and time.
    • TransactionType: Categorical field indicating 'Credit' or 'Debit' transactions.
    • Location: Geographic location of the transaction, represented by U.S. city names.
    • DeviceID: Alphanumeric identifier for devices used to perform the transaction.
    • IP Address: IPv4 address associated with the transaction, with occasional changes for some accounts.
    • MerchantID: Unique identifier for merchants, showing preferred and outlier merchants for each account.
    • AccountBalance: Balance in the account post-transaction, with logical correlations based on transaction type and amount.
    • PreviousTransactionDate: Timestamp of the last transaction for the account, aiding in calculating transaction frequency.
    • Channel: Channel through which the transaction was performed (e.g., Online, ATM, Branch).
    • CustomerAge: Age of the account holder, with logical groupings based on occupation.
    • CustomerOccupation: Occupation of the account holder (e.g., Doctor, Engineer, Student, Retired), reflecting income patterns.
    • TransactionDuration: Duration of the transaction in seconds, varying by transaction type.
    • LoginAttempts: Number of login attempts before the transaction, with higher values indicating potential anomalies.

    This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.

  2. Data from: Online Payment Fraud Detection

    • kaggle.com
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnjaliGupta (2025). Online Payment Fraud Detection [Dataset]. https://www.kaggle.com/datasets/anjigupta05/online-payment-fraud-detection/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AnjaliGupta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by AnjaliGupta

    Released under CC0: Public Domain

    Contents

  3. Data from: Online Payment Fraud Detection

    • kaggle.com
    Updated Oct 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jainil Shah (2022). Online Payment Fraud Detection [Dataset]. https://www.kaggle.com/datasets/jainilcoder/online-payment-fraud-detection/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 26, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jainil Shah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    To identify online payment fraud with machine learning, we need to train a machine learning model for classifying fraudulent and non-fraudulent payments. For this, we need a dataset containing information about online payment fraud, so that we can understand what type of transactions lead to fraud. For this task, I collected a dataset from Kaggle, which contains historical information about fraudulent transactions which can be used to detect fraud in online payments. Below are all the columns from the dataset I’m using here:

    step: represents a unit of time where 1 step equals 1 hour type: type of online transaction amount: the amount of the transaction nameOrig: customer starting the transaction oldbalanceOrg: balance before the transaction newbalanceOrig: balance after the transaction nameDest: recipient of the transaction oldbalanceDest: initial balance of recipient before the transaction newbalanceDest: the new balance of recipient after the transaction isFraud: fraud transaction

    I hope you now know about the data I am using for the online payment fraud detection task. Now in the section below, I’ll explain how we can use machine learning to detect online payment fraud using Python.

  4. ieee-data-preprocessing

    • kaggle.com
    Updated Mar 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed NIANG (2020). ieee-data-preprocessing [Dataset]. https://www.kaggle.com/datasets/niangmohamed/ieeedatapreprocessing
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed NIANG
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Mohamed NIANG

    Released under CC0: Public Domain

    Contents

  5. Data from: ieeecis-fraud-detection

    • kaggle.com
    Updated Mar 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed NIANG (2020). ieeecis-fraud-detection [Dataset]. https://www.kaggle.com/niangmohamed/ieeecis-fraud-detection/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed NIANG
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Mohamed NIANG

    Released under CC0: Public Domain

    Contents

  6. o

    Online Review Authenticity Dataset

    • opendatabay.com
    .undefined
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Online Review Authenticity Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/d7a6f4c7-c99a-4d8e-b082-914e014129f1
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Reviews & Ratings
    Description

    This dataset is designed to support the creation and detection of fake reviews for online products. It comprises a collection of 40,000 product reviews, equally split between 20,000 authentic, human-generated reviews and 20,000 computer-generated fake reviews. The dataset includes information on review content, categorisation, and associated ratings, making it a valuable resource for developing and testing review integrity solutions within e-commerce and other online platforms.

    Columns

    • review dateaset: Likely indicates the type or source of the review within the dataset.
    • category: Specifies the product category the review belongs to, such as 'Kindle_Store_5' or 'Books_5'.
    • rating: The numerical rating given in the review.
    • label: A classification label, possibly indicating if a review is original (OR) or computer-generated (CG).
    • text_: The actual textual content of the product review.

    Distribution

    The dataset contains a total of 40,412 unique entries, with a balanced distribution of 20,000 fake and 20,000 real product reviews. Data is typically provided in a CSV file format.

    The distribution of ratings is as follows: * 1.00 - 1.20: 2,155 entries * 2.00 - 2.20: 1,967 entries * 3.00 - 3.20: 3,786 entries * 4.00 - 4.20: 7,965 entries * 4.80 - 5.00: 24,559 entries

    The dataset categorisation includes: * Kindle_Store_5: 12% * Books_5: 11% * Other: 77% (31,332 entries)

    Usage

    This dataset is ideal for training machine learning models to identify and flag fraudulent or computer-generated product reviews. It can be utilised for: * Developing Natural Language Processing (NLP) models for sentiment analysis and text classification. * Building AI & Machine Learning solutions for fraud detection in online marketplaces. * Researching the characteristics and patterns of authentic versus fabricated consumer feedback. * Enhancing the trustworthiness and reliability of online review systems.

    Coverage

    The dataset has global coverage, making it applicable for systems and research worldwide. While specific time ranges for the reviews themselves are not explicitly detailed, the data's utility is broad across various product categories and review contexts within e-commerce.

    License

    CC-BY

    Who Can Use It

    This dataset is suitable for: * Data Scientists and Machine Learning Engineers: To develop and fine-tune models for fake review detection and NLP tasks. * Researchers: Studying consumer behaviour, online trust, and adversarial attacks in digital platforms. * E-commerce Businesses: To implement internal systems for maintaining review authenticity and improving customer trust. * Academics and Students: For educational purposes, projects, and academic studies in AI, NLP, and data science.

    Dataset Name Suggestions

    • Fake Product Reviews Dataset
    • Online Review Authenticity Dataset
    • E-commerce Review Integrity Data
    • AI Review Detection Dataset
    • Customer Review Verification Set

    Attributes

    Original Data Source: 🚨 Fake Reviews Dataset

  7. Fabricated Fraud Detection

    • kaggle.com
    Updated Dec 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gilad (2019). Fabricated Fraud Detection [Dataset]. https://www.kaggle.com/giladmanor/fraud-detection/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gilad
    Description

    Demonstration of Synthetic data usability for Fraud Detection

    This Demonstration utilized a fraud detection data set and kernel, referenced below to showcase the accuracy and safety of using the products of the kymera fabrication machine

    The original data set we have used is the Synthetic Financial Datasets For Fraud Detection This file accurately mimics the original data set features while in fact generating the entire data set from scratch.

  8. Enron Fraud Email Dataset

    • kaggle.com
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Advaith S Rao (2023). Enron Fraud Email Dataset [Dataset]. https://www.kaggle.com/datasets/advaithsrao/enron-fraud-email-dataset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Advaith S Rao
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. The data has been made public and presents a diverse set of email information ranging from internal, marketing emails to spam and fraud attempts.

    In the early 2000s, Leslie Kaelbling at MIT purchased the dataset and noted that, though the dataset contained scam emails, it also had several integrity problems. The dataset was updated later, but it becomes key to ensure privacy in the data while it is used to train a deep neural network model.

    Though the Enron Email Dataset contains over 500K emails, one of the problems with the dataset is the availability of labeled frauds in the dataset. Label annotation is done to detect an umbrella of fraud emails accurately. Since, fraud emails fall into several types such as Phishing, Financial, Romance, Subscription, and Nigerian Prince scams, there have to be multiple heuristics used to label all types of fraudulent emails effectively.

    To tackle this problem, heuristics have been used to label the Enron data corpus using email signals, and automated labeling has been performed using simple ML models on other smaller email datasets available online. These fraud annotation techniques are discussed in detail below.

    To perform fraud annotation on the Enron dataset as well as provide more fraud examples for modeling, two more fraud data sources have been used, Phishing Email Dataset: https://www.kaggle.com/dsv/6090437 Social Engineering Dataset: http://aclweb.org/aclwiki

    Label Annotation

    To label the Enron email dataset two signals are used to filter suspicious emails and label them into fraud and non-fraud classes. Automated ML labeling Email Signals

    Automated ML Labeling

    The following heuristics are used to annotate labels for Enron email data using the other two data sources,

    Phishing Model Annotation: A high-precision SVM model trained on the Phishing mails dataset, which is used to annotate the Phishing Label on the Enron Dataset.

    Social Engineering Model Annotation: A high-precision SVM model trained on the Social Engineering mails dataset, which is used to annotate the Social Engineering Label on the Enron Dataset.

    The two ML Annotator models use Term Frequency Inverse Document Frequency (TF-IDF) to embed the input text and make use of SVM models with Gaussian Kernel.

    If either of the models predicted that an email was a fraud, the mail metadata was checked for several email signals. If these heuristics meet the requirements of a high-probability fraud email, we label it as a fraud email.

    Email Signals

    Email Signal-based heuristics are used to filter and target suspicious emails for fraud labeling specifically. The signals used were,

    Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes have a higher chance of containing quality fraud emails.

    Suspicious Folders: The Enron data is dumped into several folders for every employee. Folders consist of inbox, deleted_items, junk, calendar, etc. A set of folders with a higher chance of containing fraud emails, such as Deleted Items and Junk.

    Sender Type: The sender type was categorized as ‘Internal’ and ‘External’ based on their email address.

    Low Communication: A threshold of 4 emails based on the table below was used to define Low Communication. A user qualifies as a Low-Comm sender if their emails are below this threshold. Mails sent from low-comm senders have been assigned with a high probability of being a fraud.

    Contains Replies and Forwards: If an email contains forwards or replies, a low probability was assigned for it to be a fraud email.

    Manual Inspection

    To ensure high-quality labels, the mismatch examples from ML Annotation have been manually inspected for Enron dataset relabeling.

    Dataset Breakdown

    FraudNon-Fraud
    2327445090

    Citations

    Enron Dataset Title: Enron Email Dataset URL: https://www.cs.cmu.edu/~enron/ Publisher: MIT, CMU Author: Leslie Kaelbling, William W. Cohen Year: 2015

    Phishing Email Detection Dataset Title: Phishing Email Detection URL: https://www.kaggle.com/dsv/6090437 DOI: 10.34740/KAGGLE/DSV/6090437 Publisher: Kaggle Author: Subhadeep Chakraborty Year: 2023

    CLAIR Fraud Email Collection Title: CLAIR collection of fraud email URL: http://aclweb.org/aclwiki Author: Radev, D. Year: 2008

  9. Users IDs

    • kaggle.com
    Updated Oct 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arturo Garcia (2019). Users IDs [Dataset]. https://www.kaggle.com/artmatician/users-ids/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arturo Garcia
    Description

    Dataset

    This dataset was created by Arturo Garcia

    Contents

  10. Credit Card Fraud Dataset

    • kaggle.com
    Updated Sep 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cyber Cop (2021). Credit Card Fraud Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/2624805
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 17, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cyber Cop
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    The context of this dataset is to find fraudulent credit cards by analyzing the features. The detection of fraudulent credit card can be done using ML or DL.

    Acknowledgements

    The data actually collected from Weka Repository: https://weka.8497.n7.nabble.com/file/n23121/credit_fruad.arff

  11. Ecommerce Counterfeit Products Dataset

    • kaggle.com
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    aimlVeera (2025). Ecommerce Counterfeit Products Dataset [Dataset]. https://www.kaggle.com/datasets/aimlveera/counterfeit-product-detection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    aimlVeera
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Overview

    This synthetic dataset was specifically designed to support machine learning research and development in counterfeit product detection and anti-fraud systems. The dataset mimics real-world patterns found in e-commerce platforms while containing no actual sensitive or proprietary information, making it ideal for educational purposes, algorithm development, and public research.

    Key Features and Data Points

    Product-Level Features

    Basic Product Information:

    • Product ID, category, brand name, and pricing
    • Six main categories: Electronics, Fashion, Cosmetics, Pharmaceuticals, Luxury Goods, and Automotive Parts
    • Realistic brand variations including subtle misspellings common in counterfeit products

    Seller Characteristics:

    • Seller ratings (1.0-5.0 scale) with counterfeits typically showing lower ratings
    • Review counts ranging from 0 to 10,000, with legitimate sellers having more reviews
    • Geographic information including seller country and shipping origin

    Quality Indicators:

    • Number of product images (counterfeits typically have fewer)
    • Product description length (counterfeits often have shorter, less detailed descriptions)
    • Spelling errors count in product listings
    • Certification badges and warranty information
    • Domain age of seller websites

    Operational Metrics:

    • Shipping timeframes (counterfeits often have longer delivery times)
    • Payment method variety (legitimate sellers offer more options)
    • Return policy clarity and contact information completeness
    • Product views, purchases, and wishlist additions

    Transaction-Level Features

    Transaction Details:

    • Unique transaction and customer identifiers
    • Transaction dates spanning one year of activity
    • Customer demographics and purchase history
    • Quantity and pricing information with realistic market ranges

    Payment and Shipping:

    • Payment methods including credit cards, PayPal, cryptocurrency, and wire transfers
    • Shipping speeds and costs
    • Discount patterns and promotional activity

    Risk Indicators:

    • Transaction velocity flags for unusual purchasing patterns
    • Geolocation mismatches between customer and payment information
    • Device fingerprint analysis for new vs. returning customers
    • Bulk order patterns and refund request frequencies
    Key Applications
    • Training classification models for counterfeit product detection
    • Developing fraud detection algorithms for e-commerce platforms
    • Academic research in consumer protection and marketplace security
    • Building risk assessment systems for online marketplaces
    • Educational projects in data science and machine learning
  12. Duplicate Analysis

    • kaggle.com
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alinaswe Simfukwe (2025). Duplicate Analysis [Dataset]. https://www.kaggle.com/datasets/alinaswesimfukwe/duplicate-analysis/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alinaswe Simfukwe
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Overview:

    Total Records: 749 Original Records: 700 Duplicate Records: 49 (7% of total) File Name: synthetic_claims_with_duplicates.csv Key Features:

    Claim Information: Unique claim IDs (CLAIM000001 to CLAIM000700) Employee IDs (EMP0001 to EMP0700) Realistic employee names Financial Data: Amounts range: 100.00 to 20,000.00 Service codes: SVC001, SVC002, SVC003, SVC004 Departments: Finance, HR, IT, Marketing, Operations Transaction Details: Dates within the last 2 years Timestamps for submission Statuses: Submitted, Approved, Paid Random UUIDs for submitter IDs Fraud Detection: 49 exact duplicates (7%) Random distribution throughout the dataset Boolean is_duplicate flag for identification Purpose: The dataset is designed to test fraud detection systems, particularly for identifying duplicate transactions. It simulates real-world scenarios where duplicate entries might occur due to fraud or data entry errors.

    Usage:

    Testing duplicate transaction detection Training fraud detection models Data validation and cleaning Algorithm benchmarking The dataset is now ready for analysis in your fraud detection system.

  13. Feature Engineering Data

    • kaggle.com
    Updated Jul 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mat Leonard (2019). Feature Engineering Data [Dataset]. https://www.kaggle.com/matleonard/feature-engineering-data/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 23, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mat Leonard
    Description

    This dataset is a sample from the TalkingData AdTracking competition. I kept all the positive examples (where is_attributed == 1), while discarding 99% of the negative samples. The sample has roughly 20% positive examples.

    For this competition, your objective was to predict whether a user will download an app after clicking a mobile app advertisement.

    File descriptions

    train_sample.csv - Sampled data

    Data fields

    Each row of the training data contains a click record, with the following features.

    • ip: ip address of click.
    • app: app id for marketing.
    • device: device type id of user mobile phone (e.g., iphone 6 plus, iphone 7, huawei mate 7, etc.)
    • os: os version id of user mobile phone
    • channel: channel id of mobile ad publisher
    • click_time: timestamp of click (UTC)
    • attributed_time: if user download the app for after clicking an ad, this is the time of the app download
    • is_attributed: the target that is to be predicted, indicating the app was downloaded

    Note that ip, app, device, os, and channel are encoded.

    I'm also including Parquet files with various features for use within the course.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
vala khorasani (2024). Bank Transaction Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection
Organization logo

Bank Transaction Dataset for Fraud Detection

Detailed Analysis of Transactional Behavior and Anomaly Detection

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
vala khorasani
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.

Key Features:

  • TransactionID: Unique alphanumeric identifier for each transaction.
  • AccountID: Unique identifier for each account, with multiple transactions per account.
  • TransactionAmount: Monetary value of each transaction, ranging from small everyday expenses to larger purchases.
  • TransactionDate: Timestamp of each transaction, capturing date and time.
  • TransactionType: Categorical field indicating 'Credit' or 'Debit' transactions.
  • Location: Geographic location of the transaction, represented by U.S. city names.
  • DeviceID: Alphanumeric identifier for devices used to perform the transaction.
  • IP Address: IPv4 address associated with the transaction, with occasional changes for some accounts.
  • MerchantID: Unique identifier for merchants, showing preferred and outlier merchants for each account.
  • AccountBalance: Balance in the account post-transaction, with logical correlations based on transaction type and amount.
  • PreviousTransactionDate: Timestamp of the last transaction for the account, aiding in calculating transaction frequency.
  • Channel: Channel through which the transaction was performed (e.g., Online, ATM, Branch).
  • CustomerAge: Age of the account holder, with logical groupings based on occupation.
  • CustomerOccupation: Occupation of the account holder (e.g., Doctor, Engineer, Student, Retired), reflecting income patterns.
  • TransactionDuration: Duration of the transaction in seconds, varying by transaction type.
  • LoginAttempts: Number of login attempts before the transaction, with higher values indicating potential anomalies.

This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.

Search
Clear search
Close search
Google apps
Main menu