Facebook
TwitterIn 2023, half of the social engineering attacks worldwide were scams, making it the most common type of cyberattack in this category. Phishing ranked second, with **** percent of the attacks, while business e-mail compromise (BEC) made up nearly ** percent of the total spear-phishing attacks.
Facebook
Twitter******** and*************************************n was the most common social engineering method used by cybercriminals in Poland in 2023.
Facebook
TwitterIn 2023, business e-mail compromise (BEC) scams were the most common type of social engineering attacks using Gmail.com. Roughly **** percent of such cyberattacks detected on Gmail.com were identified as BEC scams. General scamming ranked second, with over ** percent, and phishing was identified in *** percent of social engineering attacks abusing Gmail.com.
Facebook
TwitterIn 2023, Gmail was the e-mail service most frequently abused in social engineering attacks worldwide, with 22 percent of such attacks using it. Outlook, Hotmail, and other services ranked far behind.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset records real-world data breach incidents across multiple years and sectors, capturing key information about how and when companies experienced data losses.
| Attribute | Description | Example |
|---|---|---|
organisation | Name of the company or entity affected by the data breach. | MailMyPrescriptions.com, Sequoia Capital Operations, LLC |
year | The year in which the breach occurred. | 2020, 2023 |
records_lost | Number of records or personal data entries compromised in the breach. | 368537, 841836 |
sector | Industry or business category of the affected organization. | Finance, Technology, Retail, Telecom |
method | The primary cause or attack vector used in the breach. | Hacking, Social Engineering, Malware Attack, Brute Force Attack |
Data Types:**
organisation, sector, method)year, records_lost)Years Covered: Multiple years up to at least 2023
No missing values, making it clean and ready for analysis.
sector, method, and year.records_lost values vary widely; consider using log scaling for better visualization.method might include overlapping or similar attack types—can be standardized (e.g., “Hacking” vs “Cyber Attack”).
Facebook
TwitterAccording to surveys of working adults and IT professionals conducted in 2023, almost ***** in ** respondents reported having encountered vishing attacks. This represents a slight decrease from ** percent in the year prior. Vishing attacks are a type of social engineering attacks performed over phone calls or voice messages for phishing.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
TwitterIn 2023, software/API vulnerabilities accounted for 38.6 percent of initial access vectors in cyberattacks, up by around 10 percent compared to 28.2 percent in 2022. Previously compromised credentials represented 20.5 percent of attacks, while social engineering and phishing were responsible for 17 percent of cyberattacks in the examined year.
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The UK cybersecurity insurance market is booming, driven by rising cyber threats and stricter regulations. Discover key trends, market size projections, and leading players in this rapidly expanding sector. Learn about market segmentation, growth drivers and challenges. Recent developments include: September 2023: Cowbell is committed to addressing cyber risk challenges on a global scale, and our expansion into the UK is a testament to this. Cowbell Prime One is tailored towards SME and mid-market customers and allows brokers to customize cyber policies for different risk exposures, such as email scams, ransomware, and social engineering., March 2023: Cyber insurance provider Coalition is set to enter the excess cyber insurance market in the United Kingdom to help protect businesses with enhanced coverage. The firm has confirmed that it will extend its reach to provide full-follow form coverage and protection of up to GBP 10 million (USD 12126000) above a primary layer of insurance from another insurer for both cyber and technology professional indemnity (PI) lines.. Key drivers for this market are: Data Privacy Regulations, Business Interruption. Potential restraints include: Data Privacy Regulations, Business Interruption. Notable trends are: Impact of Cyber Insurance Policy Coverage.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. The data has been made public and presents a diverse set of email information ranging from internal, marketing emails to spam and fraud attempts.
In the early 2000s, Leslie Kaelbling at MIT purchased the dataset and noted that, though the dataset contained scam emails, it also had several integrity problems. The dataset was updated later, but it becomes key to ensure privacy in the data while it is used to train a deep neural network model.
Though the Enron Email Dataset contains over 500K emails, one of the problems with the dataset is the availability of labeled frauds in the dataset. Label annotation is done to detect an umbrella of fraud emails accurately. Since, fraud emails fall into several types such as Phishing, Financial, Romance, Subscription, and Nigerian Prince scams, there have to be multiple heuristics used to label all types of fraudulent emails effectively.
To tackle this problem, heuristics have been used to label the Enron data corpus using email signals, and automated labeling has been performed using simple ML models on other smaller email datasets available online. These fraud annotation techniques are discussed in detail below.
To perform fraud annotation on the Enron dataset as well as provide more fraud examples for modeling, two more fraud data sources have been used, Phishing Email Dataset: https://www.kaggle.com/dsv/6090437 Social Engineering Dataset: http://aclweb.org/aclwiki
To label the Enron email dataset two signals are used to filter suspicious emails and label them into fraud and non-fraud classes. Automated ML labeling Email Signals
The following heuristics are used to annotate labels for Enron email data using the other two data sources,
Phishing Model Annotation: A high-precision SVM model trained on the Phishing mails dataset, which is used to annotate the Phishing Label on the Enron Dataset.
Social Engineering Model Annotation: A high-precision SVM model trained on the Social Engineering mails dataset, which is used to annotate the Social Engineering Label on the Enron Dataset.
The two ML Annotator models use Term Frequency Inverse Document Frequency (TF-IDF) to embed the input text and make use of SVM models with Gaussian Kernel.
If either of the models predicted that an email was a fraud, the mail metadata was checked for several email signals. If these heuristics meet the requirements of a high-probability fraud email, we label it as a fraud email.
Email Signal-based heuristics are used to filter and target suspicious emails for fraud labeling specifically. The signals used were,
Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes have a higher chance of containing quality fraud emails.
Suspicious Folders: The Enron data is dumped into several folders for every employee. Folders consist of inbox, deleted_items, junk, calendar, etc. A set of folders with a higher chance of containing fraud emails, such as Deleted Items and Junk.
Sender Type: The sender type was categorized as ‘Internal’ and ‘External’ based on their email address.
Low Communication: A threshold of 4 emails based on the table below was used to define Low Communication. A user qualifies as a Low-Comm sender if their emails are below this threshold. Mails sent from low-comm senders have been assigned with a high probability of being a fraud.
Contains Replies and Forwards: If an email contains forwards or replies, a low probability was assigned for it to be a fraud email.
To ensure high-quality labels, the mismatch examples from ML Annotation have been manually inspected for Enron dataset relabeling.
| Fraud | Non-Fraud |
|---|---|
| 2327 | 445090 |
Enron Dataset Title: Enron Email Dataset URL: https://www.cs.cmu.edu/~enron/ Publisher: MIT, CMU Author: Leslie Kaelbling, William W. Cohen Year: 2015
Phishing Email Detection Dataset Title: Phishing Email Detection URL: https://www.kaggle.com/dsv/6090437 DOI: 10.34740/KAGGLE/DSV/6090437 Publisher: Kaggle Author: Subhadeep Chakraborty Year: 2023
CLAIR Fraud Email Collection Title: CLAIR collection of fraud email URL: http://aclweb.org/aclwiki Author: Radev, D. Year: 2008
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was compiled by researchers to study phishing email tactics. It combines emails from a variety of sources to create a comprehensive resource for analysis.
Enron and Ling Datasets: These datasets focus on the core content of phishing emails, containing subject lines, email body text, and labels indicating whether the email is spam (phishing) or legitimate.
CEAS, Nazario, Nigerian Fraud, and SpamAssassin Datasets: These datasets provide broader context for the emails, including sender information, recipient information, date, and labels for spam/legitimate classification.
The final dataset combines the information from the initial datasets into a single resource for analysis. This dataset contains:
This dataset allows researchers to study the content of phishing emails and the context in which they are sent to improve detection methods.
Please cite the following two articles if you are using this dataset:
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
Twitterhttps://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Facebook
TwitterBetween November 2023 and October 2024, system intrusion was the leading pattern in breaches from cyberattacks in the Asia-Pacific region, with ***** breaches. Social engineering patterns followed, showing up in *** breaches in the region during this period.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterIn 2023, half of the social engineering attacks worldwide were scams, making it the most common type of cyberattack in this category. Phishing ranked second, with **** percent of the attacks, while business e-mail compromise (BEC) made up nearly ** percent of the total spear-phishing attacks.