44 datasets found
  1. Bank Transaction Dataset for Fraud Detection

    • kaggle.com
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vala khorasani (2024). Bank Transaction Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    vala khorasani
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.

    Key Features:

    • TransactionID: Unique alphanumeric identifier for each transaction.
    • AccountID: Unique identifier for each account, with multiple transactions per account.
    • TransactionAmount: Monetary value of each transaction, ranging from small everyday expenses to larger purchases.
    • TransactionDate: Timestamp of each transaction, capturing date and time.
    • TransactionType: Categorical field indicating 'Credit' or 'Debit' transactions.
    • Location: Geographic location of the transaction, represented by U.S. city names.
    • DeviceID: Alphanumeric identifier for devices used to perform the transaction.
    • IP Address: IPv4 address associated with the transaction, with occasional changes for some accounts.
    • MerchantID: Unique identifier for merchants, showing preferred and outlier merchants for each account.
    • AccountBalance: Balance in the account post-transaction, with logical correlations based on transaction type and amount.
    • PreviousTransactionDate: Timestamp of the last transaction for the account, aiding in calculating transaction frequency.
    • Channel: Channel through which the transaction was performed (e.g., Online, ATM, Branch).
    • CustomerAge: Age of the account holder, with logical groupings based on occupation.
    • CustomerOccupation: Occupation of the account holder (e.g., Doctor, Engineer, Student, Retired), reflecting income patterns.
    • TransactionDuration: Duration of the transaction in seconds, varying by transaction type.
    • LoginAttempts: Number of login attempts before the transaction, with higher values indicating potential anomalies.

    This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.

  2. Synthetic Bank Transactions

    • kaggle.com
    zip
    Updated Mar 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Harris (2021). Synthetic Bank Transactions [Dataset]. https://www.kaggle.com/radistaleks/synthetic-bank-transactions
    Explore at:
    zip(13820207 bytes)Available download formats
    Dataset updated
    Mar 20, 2021
    Authors
    John Harris
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Inspiration

    Many projects require datasets about bank transactions to test their systems. Unfortunately, it is hard to find a dataset that would have transaction product categorization which is important for many analytical projects.

    Content

    There you have 4 datasets. Clients - basic information about bank users. Categories - standart transaction categories which are being by many banks worldwide. Transactions - the core of our dataset, basic information about transactions like who is the second account of transaction, category, amount, etc. Subscriptions - information about subscriptions, in other words, transactions which are made automatically.

  3. Credit card fraud detection Date 25th of June 2015

    • kaggle.com
    Updated Oct 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zohair ahmed (2023). Credit card fraud detection Date 25th of June 2015 [Dataset]. https://www.kaggle.com/datasets/qnqfbqfqo/credit-card-fraud-detection-date-25th-of-june-2015
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2023
    Dataset provided by
    Kaggle
    Authors
    Zohair ahmed
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

    It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-senstive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

    The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on http://mlg.ulb.ac.be/BruFence and http://mlg.ulb.ac.be/ARTML.

  4. Brazil Bank Account Spending Dataset

    • kaggle.com
    Updated Jul 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Süfyan Taşkın (2020). Brazil Bank Account Spending Dataset [Dataset]. https://www.kaggle.com/sufyant/brazil-bank-account-spending-dataset/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 20, 2020
    Dataset provided by
    Kaggle
    Authors
    Süfyan Taşkın
    Description

    Dataset

    This dataset was created by Süfyan Taşkın

    Contents

  5. BitClout 50K Profiles Dump

    • kaggle.com
    Updated Mar 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Esteban Gómez (2021). BitClout 50K Profiles Dump [Dataset]. https://www.kaggle.com/michaelstevan/bitclout-50000-profiles-dump/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Miguel Esteban Gómez
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Miguel Esteban Gómez

    Released under CC0: Public Domain

    Contents

  6. A

    ‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘Phishing Dataset for Machine Learning’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-phishing-dataset-for-machine-learning-2690/f1656d17/?iid=043-921&v=presentation
    Explore at:
    Dataset updated
    Nov 5, 2019
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Phishing Dataset for Machine Learning’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/phishing-dataset-for-machine-learning on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Anti-phishing refers to efforts to block phishing attacks. Phishing is a kind of cybercrime where attackers pose as known or trusted entities and contact individuals through email, text or telephone and ask them to share sensitive information. Typically, in a phishing email attack, and the message will suggest that there is a problem with an invoice, that there has been suspicious activity on an account, or that the user must login to verify an account or password. Users may also be prompted to enter credit card information or bank account details as well as other sensitive data. Once this information is collected, attackers may use it to access accounts, steal data and identities, and download malware onto the user’s computer.

    Content

    This dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages, which were downloaded from January to May 2015 and from May to June 2017. An improved feature extraction technique is employed by leveraging the browser automation framework (i.e., Selenium WebDriver), which is more precise and robust compared to the parsing approach based on regular expressions.

    Anti-phishing researchers and experts may find this dataset useful for phishing features analysis, conducting rapid proof of concept experiments or benchmarking phishing classification models.

    Acknowledgements

    Tan, Choon Lin (2018), “Phishing Dataset for Machine Learning: Feature Evaluation”, Mendeley Data, V1, doi: 10.17632/h3cgnj8hft.1 Source of the Dataset.

    --- Original source retains full ownership of the source dataset ---

  7. Customer_Data

    • kaggle.com
    Updated Mar 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alireza Rastegar (2023). Customer_Data [Dataset]. https://www.kaggle.com/datasets/alirezaai/customer-data/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alireza Rastegar
    Description
    ItemDescription
    BALANCEOutstanding balance on the credit card account
    BALANCE_FREQUENCYHow often the balance is updated
    PURCHASESTotal amount of purchases made on the credit card
    ONEOFF_PURCHASESTotal amount of one-time purchases made on the credit card
    INSTALLMENTS_PURCHASESTotal amount of purchases made on the credit card that were paid back in installments
    CASH_ADVANCEAmount of cash withdrawn from the credit card account as a cash advance
    PURCHASES_FREQUENCYHow often purchases are made on the credit card
    ONEOFF_PURCHASES_FREQUENCYHow often one-time purchases are made on the credit card
    PURCHASES_INSTALLMENTS_FREQUENCYHow often purchases that are paid back in installments are made on the credit card
    CASH_ADVANCE_FREQUENCYHow often cash advances are taken out on the credit card
    CASH_ADVANCE_TRXNumber of cash advance transactions made on the credit card account
    PURCHASES_TRXNumber of purchase transactions made on the credit card account
    CREDIT_LIMITMaximum amount of credit the customer is allowed to use on the credit card
    PAYMENTSTotal amount of payments made on the credit card account
    MINIMUM_PAYMENTSMinimum amount of payments required on the credit card account
    PRC_FULL_PAYMENTPercentage of the balance that is paid in full by the customer each month
    TENURENumber of years the customer has been using the credit card account
  8. A

    ‘UPI apps Transactions in 2021’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘UPI apps Transactions in 2021’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-upi-apps-transactions-in-2021-c503/a356228b/?iid=002-537&v=presentation
    Explore at:
    Dataset updated
    Dec 31, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘UPI apps Transactions in 2021’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ramjasmaurya/upi-apps-transactions-in-2021 on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://miro.medium.com/max/1400/1*94MvdhxeCQHoD7A4K1vlWg.png">

    Unified Payments Interface (UPI) is an instant real-time payment system developed by National Payments Corporation of India (NPCI) facilitating inter-bank peer-to-peer (P2P) and person-to-merchant (P2M) transactions.NPCI is umbrella organisation for all digital payments. The interface is regulated by the Reserve Bank of India (RBI) and works by instantly transferring funds between two bank accounts on a mobile platform. As of November 2021, there are 274 banks available on UPI with a monthly volume of 4.18 billion transactions and a value of ₹7.1 trillion (US$94 billion) UPI witnessed 68 billion transactions till November 2021. The mobile-only payment system helped transact a total of ₹34.95 lakh crore (US$460 billion) during the 67 months of operation starting from 2016. As of May 2021, the platform has 150 million monthly active users in India with plans to achieve 500 million by 2025. IIT Madras is also working to integrate voice command feature that can support English and Indian vernacular language in future. The proportion of UPI transactions in total volume of digital transactions grew from 23% in 2018-19 to 55% in 2020-21 with an average value of ₹1,849 per transaction

    --- Original source retains full ownership of the source dataset ---

  9. Mt.Gox Leaked Transaction

    • kaggle.com
    Updated Mar 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XBlock (2020). Mt.Gox Leaked Transaction [Dataset]. https://www.kaggle.com/xblock/mtgox-leaked-transaction/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 23, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    XBlock
    Description

    This data set is the transaction data leaked by mt.gox exchange.

    First, we combine the buy and sell transaction fields of the same transaction, and then de duplicate them through transaction time, transaction account, etc. to ensure the uniqueness of each transaction data. This transaction data is very useful for analyzing the user behavior of bitcoin market.

    We have done a market manipulation study using this data set.

    For more details about blockchain dataset, please click here.

  10. Data from: Network Activity Anomaly Detection

    • kaggle.com
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Presha Monga (2024). Network Activity Anomaly Detection [Dataset]. https://www.kaggle.com/datasets/preshamonga/network-activity-anomaly-detection/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Presha Monga
    Description

    In the Target column, Normal(No Attack) = 0, Neptune Attack =1

    Description of the columns present in the Dataset:

    1. duration: Length (in seconds) of the connection.
    2. protocoltype: The protocol used in the connection (e.g., TCP, UDP, ICMP).
    3. service: The network service on the destination (e.g., HTTP, FTP, SMTP).
    4. flag: Status flag of the connection (e.g., SF for normal termination).
    5. srcbytes: Number of data bytes sent from the source to the destination.
    6. dstbytes: Number of data bytes sent from the destination to the source.
    7. land: A binary flag indicating if the connection is to the same host (source IP equals destination IP).
    8. wrongfragment: Number of wrong fragments in the connection.
    9. urgent: Number of urgent packets in the connection.
    10. hot: Number of "hot" indicators (e.g., access to sensitive files).
    11. numfailedlogins: Number of failed login attempts.
    12. loggedin: A binary flag indicating if the user logged in successfully (1 if yes, 0 if no).
    13. numcompromised: Number of compromised conditions.
    14. rootshell: A binary flag indicating if a root shell was obtained.
    15. suattempted: A binary flag indicating if the "su" command was attempted (used for switching user privileges).
    16. numroot: Number of "root" accesses.
    17. numfilecreations: Number of file creation operations.
    18. numshells: Number of shell prompts invoked.
    19. numaccessfiles: Number of accesses to control files.
    20. numoutboundcmds: Number of outbound commands in an FTP session.
    21. ishostlogin: A binary flag indicating if the login belongs to a "host" user.
    22. isguestlogin: A binary flag indicating if the login belongs to a "guest" user.
    23. count: Number of connections to the same host as the current connection in the past two seconds.
    24. srvcount: Number of connections to the same service as the current connection in the past two seconds.
    25. serrorrate: Percentage of connections that have "SYN" errors.
    26. srvserrorrate: Percentage of connections to the same service that have "SYN" errors.
    27. rerrorrate: Percentage of connections that have "REJ" errors.
    28. srvrerrorrate: Percentage of connections to the same service that have "REJ" errors.
    29. samesrvrate: Percentage of connections to the same service.
    30. diffsrvrate: Percentage of connections to different services.
    31. srvdiffhostrate: Percentage of connections to different hosts in the same service.
    32. dsthostcount: Number of connections to the same destination host.
    33. dsthostsrvcount: Number of connections to the same service at the destination host.
    34. dsthostsamesrvrate: Percentage of connections to the same service at the destination host.
    35. dsthostdiffsrvrate: Percentage of connections to different services at the destination host.
    36. dsthostsamesrcportrate: Percentage of connections from the same source port to the destination host.
    37. dsthostsrvdiffhostrate: Percentage of connections to different destination hosts in the same service.
    38. dsthostserrorrate: Percentage of connections to the destination host that have "SYN" errors.
    39. dsthostsrvserrorrate: Percentage of connections to the same service at the destination host that have "SYN" errors.
    40. dsthostrerrorrate: Percentage of connections to the destination host that have "REJ" errors.
    41. dsthostsrvrerrorrate: Percentage of connections to the same service at the destination host that have "REJ" errors.
    42. lastflag: The status of the last connection in this session.
    43. attack: The target label, indicating if the connection is normal (0) or a Neptune attack (1).
  11. f

    Details of feature variables of the data set.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  12. Credit Card Defaulter

    • kaggle.com
    Updated Jun 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arsh Anwar (2021). Credit Card Defaulter [Dataset]. https://www.kaggle.com/d4rklucif3r/defaulter/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 10, 2021
    Dataset provided by
    Kaggle
    Authors
    Arsh Anwar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset is all about credit card defaulters. It contains 5 Columns 1) ID - Id of customer 2) Default - Is the person a loan defaulter 3) Student - Is the person a student 4) Balance - balance in his/her account 5) Income - His/Her income

  13. Financial Transaction and Risk Management Dataset

    • kaggle.com
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Financial Transaction and Risk Management Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/financial-transaction-and-risk-management-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About the Dataset This dataset contains financial transaction records and risk management data for accounting systems. It includes a variety of transactional data, such as transaction IDs, amounts, categories, and payment methods, alongside associated risk incidents like fraud, errors, and misstatements. The dataset also captures system metadata, such as user activity, transaction processing time, login frequency, and geographical region of the IP. The data is designed to simulate real-world accounting system operations and risk events, enabling the development and testing of AI-driven risk prediction models. The dataset can be used for research in real-time financial risk management, fraud detection, and improving decision-making processes in accounting systems using artificial intelligence.

  14. Facebook Spam Dataset

    • kaggle.com
    Updated Apr 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khaja Hussain SK (2021). Facebook Spam Dataset [Dataset]. https://www.kaggle.com/khajahussainsk/facebook-spam-dataset/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Khaja Hussain SK
    Description

    Context Collection of Facebook spam-legit profile and content-based data. It can be used for classification tasks.

    Content The dataset can be used for building machine learning models. To collect the dataset, Facebook API and Facebook Graph API are used and the data is collected from public profiles. There are 500 legit profiles and 100 spam profiles. The list of features is as follows with Label (0-legit, 1-spam). 1. Number of friends 2. Number of followings 3. Number of Community 4. The age of the user account (in days) 5. Total number of posts shared 6. Total number of URLs shared 7. Total number of photos/videos shared 8. Fraction of the posts containing URLs 9. Fraction of the posts containing photos/videos 10. Average number of comments per post 11. Average number of likes per post 12. Average number of tags in a post (Rate of tagging) 13. Average number of hashtags present in a post

    Inspiration Dataset helps the community to understand how features can help to differ Facebook legit users from spam users.

  15. Cryptocurrency Historical Prices [Updated Daily]

    • kaggle.com
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usama Buttar (2023). Cryptocurrency Historical Prices [Updated Daily] [Dataset]. https://www.kaggle.com/datasets/usamabuttar/cryptocurrency-historical-prices-updated-daily
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2023
    Dataset provided by
    Kaggle
    Authors
    Usama Buttar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a comprehensive collection of historical price records for the top 1000 cryptocurrencies. The data in this dataset is updated daily, providing a reliable and up-to-date source of information for cryptocurrency traders, researchers, and enthusiasts.

    Each file in the dataset includes the following columns: date, open price, high price, low price, closing price, adjusted closing price, and trading volume. These columns provide a detailed picture of the daily price movements and trading activity of each cryptocurrency in the dataset.

    The "date" column indicates the day on which the price data was recorded, while the "open" column provides the opening price of the cryptocurrency for that day. The "high" and "low" columns indicate the highest and lowest prices of the cryptocurrency on that day, respectively. The "close" column represents the closing price of the cryptocurrency on that day, while the "adjusted close" column takes into account any dividends or other corporate actions that may have affected the price. Finally, the "volume" column shows the trading volume of the cryptocurrency on that day.

    With this dataset, users can analyze and visualize the performance of individual cryptocurrencies, compare them to one another, and track trends over time. The data is ideal for use in machine learning models, predictive analytics, and other data-driven applications.

  16. CC_Fraud

    • kaggle.com
    zip
    Updated Mar 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam Kowitt (2021). CC_Fraud [Dataset]. https://www.kaggle.com/samkowitt/cc-fraud
    Explore at:
    zip(69155672 bytes)Available download formats
    Dataset updated
    Mar 2, 2021
    Authors
    Sam Kowitt
    Description

    Context

    This data-set contains >300,000 anonymized transactions. The variables are anonymized to protect the consumers information but they represent fields such as how long has the consumer had the account in a way which protects the information. Each row represents a users transaction. This data-set was built so that using the classifier you can build a model which can use the anonymized variables to predict which transactions are potentially fraudulent.

    Content

    The data-set contains a fraud rate of ~0.1% and thus is highly unbalanced.

    The variables are as follows: Time, anonymized variables (30 variables), $ Amount, Class (Fraud Classifier)

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  17. Securitisation Vehicles

    • kaggle.com
    zip
    Updated Sep 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Praveen Kumar (2020). Securitisation Vehicles [Dataset]. https://www.kaggle.com/penchalaiah123/securitisation-vehicles
    Explore at:
    zip(16518 bytes)Available download formats
    Dataset updated
    Sep 12, 2020
    Authors
    Praveen Kumar
    Description

    Context

    The data are detailed series underlying the Financial Accounts, ABS Cat NoAA 5232.0. They cover special purpose vehicles registered or incorporated in Australia to securitise selected assets, and whose issues are independently rated by a recognised rating agency. See :Changes to Tables-C/ in the DecemberA 1996 issue of the Bulletin for a further discussion of securitisation vehicles. Some data prior to JuneA 1993 are partly estimated.

    Content

    :Mortgages-C/ include both residential and non-residential mortgages.

    :Other loans and placements-C/ include operating lease and lease finance receivables, secured loans to originators and loans secured by other types of assets.

    Holdings of :Asset-backed bonds-C/ refers to individual securitisation vehicles-C/ holdings of asset-backed bonds issued by other securitisation vehicles.

    :All other assets-C/ include cash and deposits with Australian banks and corporations registered under the Financial Sector (Collection of Data) Act 2001 and all other claims not already included.

    :Other liabilities-C/ include loans and advances from Australian banks, corporations registered under the Financial Sector (Collection of Data) Act 2001 and other financial institutions, along with all other liabilities not already included.

  18. DS4 Work - Marketing Dataset

    • kaggle.com
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beytullah Soylev (2024). DS4 Work - Marketing Dataset [Dataset]. https://www.kaggle.com/datasets/soylevbeytullah/customer-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Beytullah Soylev
    Description

    The dataset includes various features about the bank's customers:

    Customer ID: Unique identifier for each credit card holder. Balance: Remaining balance in the customer's account. Balance Frequency: How often the balance is updated (score between 0 and 1, with 0 indicating infrequent updates and 1 signifying frequent updates). Purchases: Total amount of purchases made from the account. One-Off Purchases: Maximum purchase amount made in a single transaction. Installment Purchases: Amount of purchases made in installments. Cash Advance: Amount of cash advanced using the credit card. Purchases Frequency: How often purchases are made (score between 0 and 1, similar to balance frequency). One-Off Purchases Frequency: How often customers make one-time purchases. Installment Purchases Frequency: How often customers make installment purchases. Cash Advance Frequency: How often customers take cash advances. Cash Advance Transactions: Number of cash advance transactions. Purchases Transactions: Number of purchase transactions. Credit Limit: Maximum credit limit for the specific user. Payments: Total amount of payments made by the user. Minimum Payment: Minimum payment amount required by the user. Percentage of Full Payment: Percentage of the total balance paid by the user (0 indicates no payment, 100 indicates full payment). Tenure: Length of time the customer has been a credit card user.

  19. Paid In Contributions to IBRD/IDA/IFC Trust Funds

    • kaggle.com
    zip
    Updated Nov 27, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). Paid In Contributions to IBRD/IDA/IFC Trust Funds [Dataset]. https://www.kaggle.com/theworldbank/paid-in-contributions-to-ibrd-ida-ifc-trust-funds
    Explore at:
    zip(229396 bytes)Available download formats
    Dataset updated
    Nov 27, 2019
    Dataset authored and provided by
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    A Recipient-executed Grant is a Trust Fund Grant that is provided to a third party under a grant agreement, and for which the Bank plays an operational role - i.e., the Bank normally appraises and supervises activities financed by these funds. This dataset provides data on the amount of grant funds committed in the course of a fiscal year and payments made out of a Trust Fund account to eligible recipients, in accordance with the legal agreements. In fulfilling its responsibilities, the World Bank as Trustee complies with all sanctions applicable to World Bank transactions. All definitions should be regarded at present as provisional and not final, and are subject to revision at any time. Data is provided at the individual Trust Fund level and is updated as of 04/02/2015. No further updates are planned for this particular dataset, please visit the Global Partnership and Trust Fund Operations website for more details: http://go.worldbank.org/GABMG2YEI0

    Context

    This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore World Bank's Financial Data using Kaggle and all of the data sources available through the World Bank organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

    This dataset is distributed under a Creative Commons Attribution 3.0 IGO license.

    Cover photo by Joseph Gonzalez on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

    This dataset is distributed under Creative Commons Attribution 3.0 IGO

  20. Iris Species

    • kaggle.com
    zip
    Updated Sep 27, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCI Machine Learning (2016). Iris Species [Dataset]. https://www.kaggle.com/datasets/uciml/iris
    Explore at:
    zip(3687 bytes)Available download formats
    Dataset updated
    Sep 27, 2016
    Dataset authored and provided by
    UCI Machine Learning
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

    It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

    The columns in this dataset are:

    • Id
    • SepalLengthCm
    • SepalWidthCm
    • PetalLengthCm
    • PetalWidthCm
    • Species

    Sepal Width vs. Sepal Length

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
vala khorasani (2024). Bank Transaction Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection
Organization logo

Bank Transaction Dataset for Fraud Detection

Detailed Analysis of Transactional Behavior and Anomaly Detection

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
vala khorasani
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.

Key Features:

  • TransactionID: Unique alphanumeric identifier for each transaction.
  • AccountID: Unique identifier for each account, with multiple transactions per account.
  • TransactionAmount: Monetary value of each transaction, ranging from small everyday expenses to larger purchases.
  • TransactionDate: Timestamp of each transaction, capturing date and time.
  • TransactionType: Categorical field indicating 'Credit' or 'Debit' transactions.
  • Location: Geographic location of the transaction, represented by U.S. city names.
  • DeviceID: Alphanumeric identifier for devices used to perform the transaction.
  • IP Address: IPv4 address associated with the transaction, with occasional changes for some accounts.
  • MerchantID: Unique identifier for merchants, showing preferred and outlier merchants for each account.
  • AccountBalance: Balance in the account post-transaction, with logical correlations based on transaction type and amount.
  • PreviousTransactionDate: Timestamp of the last transaction for the account, aiding in calculating transaction frequency.
  • Channel: Channel through which the transaction was performed (e.g., Online, ATM, Branch).
  • CustomerAge: Age of the account holder, with logical groupings based on occupation.
  • CustomerOccupation: Occupation of the account holder (e.g., Doctor, Engineer, Student, Retired), reflecting income patterns.
  • TransactionDuration: Duration of the transaction in seconds, varying by transaction type.
  • LoginAttempts: Number of login attempts before the transaction, with higher values indicating potential anomalies.

This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.

Search
Clear search
Close search
Google apps
Main menu