10 datasets found
  1. P

    Amazon-Fraud Dataset

    • paperswithcode.com
    Updated Dec 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2024). Amazon-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/amazon-fraud
    Explore at:
    Dataset updated
    Dec 23, 2024
    Authors
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
    Description

    Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

    Dataset Statistics

    # Nodes%Fraud Nodes (Class=1)
    11,9449.5
    Relation# Edges
    U-P-U
    U-S-U
    U-V-U1,036,737
    All

    Graph Construction

    The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

    To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

  2. h

    Fraud-R1-LLM-Defense-Fraud-Benchmark

    • huggingface.co
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shenzhe Zhu (2025). Fraud-R1-LLM-Defense-Fraud-Benchmark [Dataset]. https://huggingface.co/datasets/Chouoftears/Fraud-R1-LLM-Defense-Fraud-Benchmark
    Explore at:
    Dataset updated
    Feb 16, 2025
    Authors
    Shenzhe Zhu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Fraud-R1 : A Comprehensive Benchmark for Assessing LLM Robustness Against Fraud and Phishing Inducement

    Shu Yang*, Shenzhe Zhu*, Zeyu Wu, Keyu Wang, Junchi Yao, Junchao Wu, Lijie Hu, Mengdi Li, Derek F. Wong, Di Wang† (*Contribute equally, †Corresponding author) 😃 Github | 📜 Project Page | 📝 arxiv ❗️Content Warning: This repo contains examples of harmful language.

      📰 News
    

    2025/02/16: ❗️We have released our evaluation code. 2025/02/16: ❗️We have released our dataset.… See the full description on the dataset page: https://huggingface.co/datasets/Chouoftears/Fraud-R1-LLM-Defense-Fraud-Benchmark.

  3. P

    Kaggle-Credit Card Fraud Dataset Dataset

    • paperswithcode.com
    Updated Sep 15, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Kaggle-Credit Card Fraud Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/kaggle-credit-card-fraud-dataset
    Explore at:
    Dataset updated
    Sep 15, 2013
    Description

    The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

    It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependent cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

    Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

  4. P

    Yelp-Fraud Dataset

    • paperswithcode.com
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2025). Yelp-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/yelpchi
    Explore at:
    Dataset updated
    Apr 21, 2025
    Authors
    Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
    Description

    Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

    Dataset Statistics

    # Nodes%Fraud Nodes (Class=1)
    45,95414.5
    Relation# Edges
    R-U-R
    R-T-R
    R-S-R3,402,743
    All

    Graph Construction

    The Yelp spam review dataset includes hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp. We conduct a spam review detection task on the Yelp-Fraud dataset which is a binary classification task. We take 32 handcrafted features from SpEagle paper as the raw node features for Yelp-Fraud. Based on previous studies which show that opinion fraudsters have connections in user, product, review text, and time, we take reviews as nodes in the graph and design three relations: 1) R-U-R: it connects reviews posted by the same user; 2) R-S-R: it connects reviews under the same product with the same star rating (1-5 stars); 3) R-T-R: it connects two reviews under the same product posted in the same month.

    To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

  5. H

    Data from: Detection of illicit accounts over the Ethereum blockchain

    • dataverse.harvard.edu
    • dataverse.nl
    • +1more
    csv, txt
    Updated Sep 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2021). Detection of illicit accounts over the Ethereum blockchain [Dataset]. http://doi.org/10.34894/GKAQYN
    Explore at:
    csv(1016388), txt(506)Available download formats
    Dataset updated
    Sep 21, 2021
    Dataset provided by
    Harvard Dataverse
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.null/customlicense?persistentId=doi:10.34894/GKAQYNhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.null/customlicense?persistentId=doi:10.34894/GKAQYN

    Description

    The recent technological advent of cryptocurrencies and their respective benefits have been shrouded with a number of illegal activities operating over the network such as money laundering, bribery, phishing, fraud, among others. In this work we focus on the Ethereum network, which has seen over 400 million transactions since its inception. Using 2179 accounts flagged by the Ethereum community for their illegal activity coupled with 2502 normal accounts, we seek to detect illicit accounts based on their transaction history using the XGBoost classifier. Using 10 fold cross-validation, XGBoost achieved an average accuracy of 0.963 ( ± 0.006) with an average AUC of 0.994 ( ± 0.0007). The top three features with the largest impact on the final model output were established to be ‘Time diff between first and last (Mins)’, ‘Total Ether balance’ and ‘Min value received’. Based on the results we conclude that the proposed approach is highly effective in detecting illicit accounts over the Ethereum network. Our contribution is multi-faceted; firstly, we propose an effective method to detect illicit accounts over the Ethereum network; secondly, we provide insights about the most important features; and thirdly, we publish the compiled data set as a benchmark for future related works.

  6. P

    BAF Dataset

    • paperswithcode.com
    Updated Nov 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sérgio Jesus; José Pombal; Duarte Alves; André Cruz; Pedro Saleiro; Rita P. Ribeiro; João Gama; Pedro Bizarro (2022). BAF Dataset [Dataset]. https://paperswithcode.com/dataset/baf
    Explore at:
    Dataset updated
    Nov 28, 2022
    Authors
    Sérgio Jesus; José Pombal; Duarte Alves; André Cruz; Pedro Saleiro; Rita P. Ribeiro; João Gama; Pedro Bizarro
    Description

    Bank Account Fraud (BAF) is a large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized, real-world bank account opening fraud detection dataset.

  7. f

    List and descriptions of benchmark datasets.

    • plos.figshare.com
    xls
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asoke K. Nandi; Kuldeep Kaur Randhawa; Hong Siang Chua; Manjeevan Seera; Chee Peng Lim (2023). List and descriptions of benchmark datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0260579.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Asoke K. Nandi; Kuldeep Kaur Randhawa; Hong Siang Chua; Manjeevan Seera; Chee Peng Lim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    List and descriptions of benchmark datasets.

  8. EDINET-Bench

    • huggingface.co
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sakana AI (2025). EDINET-Bench [Dataset]. https://huggingface.co/datasets/SakanaAI/EDINET-Bench
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    Sakana AIhttps://sakana.ai/
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    EDINET-Bench

    📚 Paper | 📝 Blog | 🧑‍💻 Code EDINET-Bench is a Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. This dataset is built leveraging EDINET, a platform managed by the Financial Services Agency (FSA) of Japan that provides access to disclosure documents such as securities reports.

      Notice
    

    June 9, 2025: This dataset was… See the full description on the dataset page: https://huggingface.co/datasets/SakanaAI/EDINET-Bench.

  9. P

    mebeblurf Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eran Eidinger; Roee Enbar; Tal Hassner, mebeblurf Dataset [Dataset]. https://paperswithcode.com/dataset/adience
    Explore at:
    Authors
    Eran Eidinger; Roee Enbar; Tal Hassner
    Description

    Matanga Darknet — 2025 Access Guide

    As internet censorship intensifies, Shadow Marketplaces remain crucial tools for anonymous transactions. Matanga Darknet is one of the most reliable platforms offering secure deals, wide product selection, and user-friendly interface. This article explains how to access Matanga Darknet, its advantages, and security measures for darknet operations.

    Current mirrors for Matanga Darknet

    Search engines may block darknet resources, so we've compiled official and backup mirrors:

    Clearnet mirror (if Tor unavailable):

    https://mat2web.top (use with VPN!)

    Official .onion address (Tor Browser only):

    http://matanzkgpadqndp44ysejfdwehmy4m22mzevmicoth6ebzequny6ayid.onion/

    Backup domain (if main domain blocked):

    https://matangaweb.com/ (verified URL)

    Important! Always verify site's PGP signature and avoid phishing clones.

    Additional links|mirrors

    Download Tor Browser

    https://www.torproject.org/download/

    What is Matanga Darknet? Matanga Darknet is a darknet marketplace operating on the Escrow (escrow) model that guarantees transaction security. Key features: - Anonymous payments (BTC+LTC) - No KYC (no identity verification) - Seller rating system and honest reviews

    How to access Matanga Darknet? 1. Install Tor Browser (

    Advantages of Matanga Darknet over other marketplaces ? Wide selection of high-quality products ? PGP encryption for communications ? Automatic seller payouts

    How to avoid scammers? - Verify .onion addresses through forums (Dread, Telegram channels) - Don't follow links from emails/messengers - Use hardware wallets (Ledger, Trezor)

    Future of darknet marketplaces With the development of decentralization technologies (Freenet, I2P), Matanga Darknet plans to implement: - Fully p2p trading without a central server - NFT-based transaction guarantees

    Useful links https://flagyl4people.top/matanga-tbilisi-1.php

    Conclusion Matanga Darknet remains one of the safest darknet marketplaces in 2025. Use only verified mirrors and practice good digital hygiene to avoid blocks and fraud Matanga Guru Matanga Tbilisi Matanga Georgia Matanga Shop Matanga Pro Matanga Matanga Guru Tbilisi Matanga Be Matanga Ge Matanga Shop Tbilisi Matanga Ru Matanga Org Matanga Me Matanga Be Matanga Drugs Matanga Cc Matanga Onion Link Matanga Pw Onion Matanga Top Matanga Грузия Matanga Be Tbilisi Matanga Org В Обход

    Matanga Org Matanga B Matanga Batumi Matanga Be В Обход Блокировки Matanga De Matanga Dijo La Changa Matanga Drugs Tbilisi Matanga Gru Matanga Gruzia

    Matanga Hill Matanga Onion Matanga Pro Регистрация Matanga Pv Matanga Pw Matanga Pw В Обход Блокировки Matanga Registration Matanga Tor Приложение Matanga A-Pvp Crystal Matanga Guru Analogi Matanga Guru Cache Matanga Pro Gde Naiti Nomer Matanga How To Buy Drag On Matanga Matanga Be Matanga Guru На Matanga Be Matanga Shop Matanga Be Matanga Bariga Matanga Chat Matanga Club Matanga Dark Web Matanga Drug Matanga Drugs Site Matanga Exy Matanga Game On Steam

    Matanga Georgia Canabis Matanga Godnotaba Matanga Guru Ru Matanga Kokain Matanga Link Matanga Live Matanga Marixuana Matanga Market Matanga Means In Bengali Matanga Narkotiki Matanga Onion Links Matanga Org Onion Matanga Pw Обход Matanga Register Matanga Registracia Matanga Rp Matanga Ru Union Matanga Russia Matanga Shop Georgia Matanga Sx Matanga Tor Link Matanga Url Matanga Найди Все То Что Искал Matanga Shop Biz Matanga Be Onion Matanga Be Santamaria123456 Matanga Br Matanga Bve Matanga Eb Matanga Ffun Matanga Fn Matanga Fun Matanga G Matanga Ge ?????????

    Matanga Grou Matanga Gru Matanga Gur Matanga Guru Batumi Matanga Guru Tbilisi Matanga Gurua Matanga Gururu Matanga Guruu Matanga Guu Matanga Nt Matanga Onins Matanga Oru Matanga Png Matanga Pro Registration Matanga Pron Matanga Prp Matanga Sx Matanga Union Matanga Uru Matanga Wp Matanga Ww Matanga Wwp Matanga Xs Org Matanga Apk Rash Matanga Registration Matanga Ruonion Matanga Sites Like Matanga Matanga Be Visitor Authorize

  10. h

    H2GB

    • huggingface.co
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junhong Lin (2025). H2GB [Dataset]. https://huggingface.co/datasets/junhongmit/H2GB
    Explore at:
    Dataset updated
    Jun 12, 2025
    Authors
    Junhong Lin
    Description

    When Heterophily Meets Heterogeneity:Challenges and a New Large-Scale Graph Benchmark

    Junhong Lin¹, Xiaojie Guo², Shuaicheng Zhang³, Yada Zhu², Dawei Zhou³, Julian Shun¹ ¹ MIT CSAIL, ² IBM Research, ³ Virginia Tech

    This repository hosts a subset of datasets from H2GB, a large-scale benchmark suite designed to evaluate graph learning models on heterophilic and heterogeneous graphs. These graphs naturally arise in real-world applications such as fraud detection, malware… See the full description on the dataset page: https://huggingface.co/datasets/junhongmit/H2GB.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu (2024). Amazon-Fraud Dataset [Dataset]. https://paperswithcode.com/dataset/amazon-fraud

Amazon-Fraud Dataset

Multi-relational Graph Dataset for Amazon Fraudulent Account Detection

Explore at:
72 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 23, 2024
Authors
Yingtong Dou; Zhiwei Liu; Li Sun; Yutong Deng; Hao Peng; Philip S. Yu
Description

Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

Dataset Statistics

# Nodes%Fraud Nodes (Class=1)
11,9449.5
Relation# Edges
U-P-U
U-S-U
U-V-U1,036,737
All

Graph Construction

The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users.

To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.

Search
Clear search
Close search
Google apps
Main menu