36 datasets found

A Dataset of over 500.000 commercial email newsletters, as collected by...
zenodo.org
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Maass; Max Maass; Stephan Schwär; Stephan Schwär; Matthias Hollick; Matthias Hollick (2022). A Dataset of over 500.000 commercial email newsletters, as collected by PrivacyMail.info [Dataset]. http://doi.org/10.5281/zenodo.6509751
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6509751
Dataset updated
Jun 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Maass; Max Maass; Stephan Schwär; Stephan Schwär; Matthias Hollick; Matthias Hollick
Description
This dataset contains the data from roughly two years of operating PrivacyMail.info, an Open Source Email privacy measurement platform. It contains slightly over 500.000 commercial newsletters, as crowdsourced by users of PrivacyMail.info. You can find the methodology discussed in our paper: Max Maass, Stephan Schwär, and Matthias Hollick. "Towards transparency in email tracking." Annual Privacy Forum, 2019. The source code can be found on github.com/privacymail/privacymail

Please note that, due to its crowdsourced nature, this dataset is a sample of opportunity - it is not representative for all newsletters on the Internet, and likely contains biases based on how it was collected. Notably, German-language newsletters will likely be heavily over-represented.

Dataset Structure
The dataset is structured as follows: On the top level are folders describing the website the newsletter belongs to. Inside that folder are subfolders for each identity that was registered for that website. Inside each of these folders are a series of .eml files that represent the received email messages.

Copyright and Licensing
This dataset is set to non-public due to copyright concerns: The contents of the email messages are (presumably) protected by copyright in most jurisdictions. Most copyright doctrines contain exceptions for non-commercial research use - thus, we feel it is appropriate and acceptable to share the data on a case-by-case basis, the same way we did before shutting down PrivacyMail.info. When requesting access to the data, please briefly describe what research you want to conduct with it, and we will grant you access.

We thus do not put any explicit license on this dataset. Please do not share the raw data publicly. We request that you cite the above-mentioned paper and this dataset in any publications that result from it.
Enron Fraud Email Dataset
kaggle.com
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Advaith S Rao (2023). Enron Fraud Email Dataset [Dataset]. https://www.kaggle.com/datasets/advaithsrao/enron-fraud-email-dataset/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Advaith S Rao
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. The data has been made public and presents a diverse set of email information ranging from internal, marketing emails to spam and fraud attempts.

In the early 2000s, Leslie Kaelbling at MIT purchased the dataset and noted that, though the dataset contained scam emails, it also had several integrity problems. The dataset was updated later, but it becomes key to ensure privacy in the data while it is used to train a deep neural network model.

Though the Enron Email Dataset contains over 500K emails, one of the problems with the dataset is the availability of labeled frauds in the dataset. Label annotation is done to detect an umbrella of fraud emails accurately. Since, fraud emails fall into several types such as Phishing, Financial, Romance, Subscription, and Nigerian Prince scams, there have to be multiple heuristics used to label all types of fraudulent emails effectively.

To tackle this problem, heuristics have been used to label the Enron data corpus using email signals, and automated labeling has been performed using simple ML models on other smaller email datasets available online. These fraud annotation techniques are discussed in detail below.

To perform fraud annotation on the Enron dataset as well as provide more fraud examples for modeling, two more fraud data sources have been used, Phishing Email Dataset: https://www.kaggle.com/dsv/6090437 Social Engineering Dataset: http://aclweb.org/aclwiki

Label Annotation

To label the Enron email dataset two signals are used to filter suspicious emails and label them into fraud and non-fraud classes. Automated ML labeling Email Signals

Automated ML Labeling

The following heuristics are used to annotate labels for Enron email data using the other two data sources,

Phishing Model Annotation: A high-precision SVM model trained on the Phishing mails dataset, which is used to annotate the Phishing Label on the Enron Dataset.

Social Engineering Model Annotation: A high-precision SVM model trained on the Social Engineering mails dataset, which is used to annotate the Social Engineering Label on the Enron Dataset.

The two ML Annotator models use Term Frequency Inverse Document Frequency (TF-IDF) to embed the input text and make use of SVM models with Gaussian Kernel.

If either of the models predicted that an email was a fraud, the mail metadata was checked for several email signals. If these heuristics meet the requirements of a high-probability fraud email, we label it as a fraud email.

Email Signals

Email Signal-based heuristics are used to filter and target suspicious emails for fraud labeling specifically. The signals used were,

Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes have a higher chance of containing quality fraud emails.

Suspicious Folders: The Enron data is dumped into several folders for every employee. Folders consist of inbox, deleted_items, junk, calendar, etc. A set of folders with a higher chance of containing fraud emails, such as Deleted Items and Junk.

Sender Type: The sender type was categorized as ‘Internal’ and ‘External’ based on their email address.

Low Communication: A threshold of 4 emails based on the table below was used to define Low Communication. A user qualifies as a Low-Comm sender if their emails are below this threshold. Mails sent from low-comm senders have been assigned with a high probability of being a fraud.

Contains Replies and Forwards: If an email contains forwards or replies, a low probability was assigned for it to be a fraud email.

Manual Inspection

To ensure high-quality labels, the mismatch examples from ML Annotation have been manually inspected for Enron dataset relabeling.

Dataset Breakdown

Fraud Non-Fraud
2327 445090

Citations

Enron Dataset Title: Enron Email Dataset URL: https://www.cs.cmu.edu/~enron/ Publisher: MIT, CMU Author: Leslie Kaelbling, William W. Cohen Year: 2015

Phishing Email Detection Dataset Title: Phishing Email Detection URL: https://www.kaggle.com/dsv/6090437 DOI: 10.34740/KAGGLE/DSV/6090437 Publisher: Kaggle Author: Subhadeep Chakraborty Year: 2023

CLAIR Fraud Email Collection Title: CLAIR collection of fraud email URL: http://aclweb.org/aclwiki Author: Radev, D. Year: 2008
Email Dataset for Automatic Response Suggestion within a University
figshare.com
pdf
Updated Feb 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Singh; Dibyendu Mishra; Sanchit Bansal; Vinayak Agarwal; Anjali Goyal; Ashish Sureka (2018). Email Dataset for Automatic Response Suggestion within a University [Dataset]. http://doi.org/10.6084/m9.figshare.5853057.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5853057.v1
Dataset updated
Feb 4, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Aditya Singh; Dibyendu Mishra; Sanchit Bansal; Vinayak Agarwal; Anjali Goyal; Ashish Sureka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We have developed an application and solution approach (using this dataset) for automatically generating and suggesting short email responses to support queries in a university environment. Our proposed solution can be used as one tap or one click solution for responding to various types of queries raised by faculty members and students in a university. Office of Academic Affairs (OAA), Office of Student Life (OSL) and Information Technology Helpdesk (ITD) are support functions within a university which receives hundreds of email messages on the daily basis. Email communication is still the most frequently used mode of communication by these departments. A large percentage of emails received by these departments are frequent and commonly used queries or request for information. Responding to every query by manually typing is a tedious and time consuming task. Furthermore a large percentage of emails and their responses are consists of short messages. For example, an IT support department in our university receives several emails on Wi-Fi not working or someone needing help with a projector or requires an HDMI cable or remote slide changer. Another example is emails from students requesting the office of academic affairs to add and drop courses which they cannot do it directly. The dataset consists of emails messages which are generally received by ITD, OAA and OSL in Ashoka University. The dataset also contains intermediate results while conducting machine learning experiments.
Spam email classification
kaggle.com
Updated Sep 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousef Mohamed (2023). Spam email classification [Dataset]. https://www.kaggle.com/datasets/yousefmohamed20/spam-email-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2023
Dataset provided by
Kaggle
Authors
Yousef Mohamed
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is a csv file containing related information of 5157 randomly picked email files and their respective labels for spam or not-spam classification. The csv file contains 5157 rows, each row for each email. There are 2 columns. The first column indicates Email category (spam or ham), The second column indicates the email sent.
d
US Consumer Marketing Data - 269M+ Consumer Records - 95% Email and Direct...
datarade.ai
Updated Jun 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giant Partners (2022). US Consumer Marketing Data - 269M+ Consumer Records - 95% Email and Direct Dials Accuracy [Dataset]. https://datarade.ai/data-products/consumer-business-data-postal-phone-email-demographics-giant-partners
Explore at:
Dataset updated
Jun 1, 2022
Dataset authored and provided by
Giant Partners
Area covered
United States of America
Description
Premium B2C Consumer Database - 269+ Million US Records

Supercharge your B2C marketing campaigns with comprehensive consumer database, featuring over 269 million verified US consumer records. Our 20+ year data expertise delivers higher quality and more extensive coverage than competitors.

Core Database Statistics

Consumer Records: Over 269 million

Email Addresses: Over 160 million (verified and deliverable)

Phone Numbers: Over 76 million (mobile and landline)

Mailing Addresses: Over 116,000,000 (NCOA processed)

Geographic Coverage: Complete US (all 50 states)

Compliance Status: CCPA compliant with consent management

Targeting Categories Available

Demographics: Age ranges, education levels, occupation types, household composition, marital status, presence of children, income brackets, and gender (where legally permitted)

Geographic: Nationwide, state-level, MSA (Metropolitan Service Area), zip code radius, city, county, and SCF range targeting options

Property & Dwelling: Home ownership status, estimated home value, years in residence, property type (single-family, condo, apartment), and dwelling characteristics

Financial Indicators: Income levels, investment activity, mortgage information, credit indicators, and wealth markers for premium audience targeting

Lifestyle & Interests: Purchase history, donation patterns, political preferences, health interests, recreational activities, and hobby-based targeting

Behavioral Data: Shopping preferences, brand affinities, online activity patterns, and purchase timing behaviors

Multi-Channel Campaign Applications

Deploy across all major marketing channels:

Email marketing and automation

Social media advertising

Search and display advertising (Google, YouTube)

Direct mail and print campaigns

Telemarketing and SMS campaigns

Programmatic advertising platforms

Data Quality & Sources

Our consumer data aggregates from multiple verified sources:

Public records and government databases

Opt-in subscription services and registrations

Purchase transaction data from retail partners

Survey participation and research studies

Online behavioral data (privacy compliant)

Technical Delivery Options

File Formats: CSV, Excel, JSON, XML formats available

Delivery Methods: Secure FTP, API integration, direct download

Processing: Real-time NCOA, email validation, phone verification

Custom Selections: 1,000+ selectable demographic and behavioral attributes

Minimum Orders: Flexible based on targeting complexity

Unique Value Propositions

Dual Spouse Targeting: Reach both household decision-makers for maximum impact

Cross-Platform Integration: Seamless deployment to major ad platforms

Real-Time Updates: Monthly data refreshes ensure maximum accuracy

Advanced Segmentation: Combine multiple targeting criteria for precision campaigns

Compliance Management: Built-in opt-out and suppression list management

Ideal Customer Profiles

E-commerce retailers seeking customer acquisition

Financial services companies targeting specific demographics

Healthcare organizations with compliant marketing needs

Automotive dealers and service providers

Home improvement and real estate professionals

Insurance companies and agents

Subscription services and SaaS providers

Performance Optimization Features

Lookalike Modeling: Create audiences similar to your best customers

Predictive Scoring: Identify high-value prospects using AI algorithms

Campaign Attribution: Track performance across multiple touchpoints

A/B Testing Support: Split audiences for campaign optimization

Suppression Management: Automatic opt-out and DNC compliance

Pricing & Volume Options

Flexible pricing structures accommodate businesses of all sizes:

Pay-per-record for small campaigns

Volume discounts for large deployments

Subscription models for ongoing campaigns

Custom enterprise pricing for high-volume users

Data Compliance & Privacy

VIA.tools maintains industry-leading compliance standards:

CCPA (California Consumer Privacy Act) compliant

CAN-SPAM Act adherence for email marketing

TCPA compliance for phone and SMS campaigns

Regular privacy audits and data governance reviews

Transparent opt-out and data deletion processes

Getting Started

Our data specialists work with you to:

Define your target audience criteria

Recommend optimal data selections

Provide sample data for testing

Configure delivery methods and formats

Implement ongoing campaign optimization

Why We Lead the Industry

With over two decades of data industry experience, we combine extensive database coverage with advanced targeting capabilities. Our commitment to data quality, compliance, and customer success has made us the preferred choice for businesses seeking superior B2C marketing performance.

Contact our team to discuss your specific targeting requirements and receive custom pricing for your marketing objectives.
o
Spam Mail Prediction Dataset
opendatabay.com
.undefined
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Spam Mail Prediction Dataset [Dataset]. https://www.opendatabay.com/data/dataset/080d396c-0650-452b-9bef-d6bb3fa9366e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Fraud Detection & Risk Management
Description
The dataset consists of a collection of emails categorized into two major classes: spam and not spam. It is designed to facilitate the development and evaluation of spam detection or email filtering systems.

The spam emails in the dataset are typically unsolicited and unwanted messages that aim to promote products or services, spread malware, or deceive recipients for various malicious purposes. These emails often contain misleading subject lines, excessive use of advertisements, unauthorized links, or attempts to collect personal information.

The non-spam emails in the dataset are genuine and legitimate messages sent by individuals or organizations. They may include personal or professional communication, newsletters, transaction receipts, or any other non-malicious content.

The dataset encompasses emails of varying lengths, languages, and writing styles, reflecting the inherent heterogeneity of email communication. This diversity aids in training algorithms that can generalize well to different types of emails, making them robust against different spammer tactics and variations in non-spam email content.

Original Data Source: Spam Mail Prediction Dataset
email-EU
zenodo.org
opendatalab.com
+1more
json
Updated Nov 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Landry; Nicholas Landry (2023). email-EU [Dataset]. http://doi.org/10.5281/zenodo.10155823
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10155823
Dataset updated
Nov 19, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nicholas Landry; Nicholas Landry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview
This hypergraph dataset was generated using email data from a large European research institution for a period from October 2003 to May 2005 (18 months). Information about all incoming and outgoing emails between members of the research institution has been anonymized. The e-mails only represent communication between institution members (the core), and the dataset does not contain incoming messages from or outgoing messages to the rest of the world.
This is a temporal hypergraph dataset, which here means a sequence of timestamped hyperedges where each hyperedge is a set of nodes. Timestamps are in ISO8601 format. In email communication, messages can be sent to multiple recipients. In this dataset, nodes are email addresses at a European research institution. The original data source only contains directed temporal edge tuples (sender, receiver, timestamp), where timestamps are recorded at 1-second resolution. The hyperedges are undirected and consist of a sender and all receivers grouped such that the email between the sender and each receiver has the same timestamp.
Statistics
Some basic statistics of this dataset are:
number of nodes: 1,005
number of timestamped hyperedges: 235,263
distribution of the connected components:
Component Size, Number
986, 1
1, 19
Source of original data
Source: email-Eu dataset
References
If you use this dataset, please cite these references:
Simplicial closure and higher-order link prediction, Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg. Proceedings of the National Academy of Sciences (PNAS), 2018.
Local Higher-order Graph Clustering, Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. Proceedings of KDD, 2017.
Graph Evolution: Densification and Shrinking Diameters, Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. ACM Transactions on Knowledge Discovery from Data, 2007.
Enron Email Time-Series Network
zenodo.org
explore.openaire.eu
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2020). Enron Email Time-Series Network [Dataset]. http://doi.org/10.5281/zenodo.1342353
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1342353
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

To build a graph G = (V, E), we use email addresses as nodes V. Every node v_i has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge e_ij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

'id-email.csv' is a file containing the actual email addresses.
Arabic Phishing and Legitimate emails - Samples
kaggle.com
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rian Sh. Al-yozbaky (2024). Arabic Phishing and Legitimate emails - Samples [Dataset]. https://www.kaggle.com/datasets/rianshalyozbaky/arabic-phishing-and-legitimate-emails-samples
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rian Sh. Al-yozbaky
Description
Dataset of Phishing and Legitimate emails This dataset includes 1250 email messages, divided into two parts: The first are phishing emails, which contain 250 email messages. The second is legitimate email and includes 1000 email messages.

This dataset was created by gathering more than 4,000 email messages from multiple international databases, processing, and analyzing them. The best examples that might be utilized in cybersecurity research, particularly in preventing and recognizing phishing messages, were chosen because some of them are not appropriate for testing.

Please be aware that the file contains the full dataset.
P
How to Login Roadrunner Account? | A Complete Guide Dataset
paperswithcode.com
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). How to Login Roadrunner Account? | A Complete Guide Dataset [Dataset]. https://paperswithcode.com/dataset/how-to-login-roadrunner-account-a-complete
Explore at:
Dataset updated
Jun 17, 2025
Description
(Toll Free) Number +1-341-900-3252 Email remains a vital communication tool for both personal and professional use. For those who have been using (Toll Free) Number +1-341-900-3252 Time Warner Cable services, the Roadrunner email service is a familiar name. (Toll Free) Number +1-341-900-3252 Now managed by Spectrum, the Roadrunner email platform is still active and accessible for users with existing accounts. However, to access all its features and ensure smooth communication, it's essential to understand how to set up, use, and manage your Roadrunner login account effectively (Toll Free) Number +1-341-900-3252 (Toll Free) Number +1-341-900-3252 .

What Is a Roadrunner Login Account? A Roadrunner login account is the email account created through Time Warner Cable’s Roadrunner service, now handled by Spectrum. Although new Roadrunner accounts are no longer issued, existing users can continue to access their email using the credentials associated with their original account.

The Roadrunner login account functions like any other email service, allowing users to send, receive, organize, and store emails. It's especially popular among long-time customers who prefer the simplicity and reliability of the interface.

Setting Up a Roadrunner Login Account For users with an existing Roadrunner email address, setting up access on new devices or email clients is straightforward. While you cannot create a new Roadrunner login account, here’s how to set up your existing account on various platforms:

(Toll Free) Number +1-341-900-3252

On Web Browser Open your preferred browser.

Navigate to the Spectrum or legacy Roadrunner email portal.

Enter your Roadrunner email address and password.

Click "Sign In" to access your inbox.

On Email Clients (Outlook, Thunderbird, etc.) To configure your Roadrunner login account on email software, you need both incoming and outgoing server details:

Incoming Server (IMAP or POP3): Server: mail.twc.com Port: 993 (IMAP), 110 (POP3) Security: SSL/TLS

Outgoing Server (SMTP): Server: mail.twc.com Port: 587 Security: STARTTLS

Make sure to enter your full email address and password when setting up.

Benefits of Using a Roadrunner Login Account While Roadrunner email may seem old-school to some, it still offers various features that benefit users:

(Toll Free) Number +1-341-900-3252

Reliable Service Users report that their Roadrunner login account remains stable and reliable for both sending and receiving emails.

Simple Interface Unlike many modern, cluttered email interfaces, Roadrunner email is known for its clean and user-friendly layout.

Storage and Access Roadrunner provides decent storage limits and access across various devices including desktops, laptops, and mobile phones.

(Toll Free) Number +1-341-900-3252

Spam Filtering The spam detection system for Roadrunner login accounts helps keep your inbox clean and secure.

Troubleshooting Roadrunner Login Issues If you're having trouble accessing your Roadrunner login account, you're not alone. Below are some of the most common issues and how to fix them:

Forgot Password If you forget your Roadrunner password, visit the Spectrum account recovery page. You’ll need to verify your identity and then reset your password.

Incorrect Credentials Double-check the spelling of your email address and password. Also, make sure Caps Lock isn’t turned on, which can cause login errors.

Locked Account Too many failed login attempts may result in your Roadrunner login account being temporarily locked. Waiting a few minutes or resetting the password usually resolves this.

Server Settings If your email client isn’t working, make sure you're using the correct IMAP/POP and SMTP settings as listed above.

(Toll Free) Number +1-341-900-3252

Managing Your Roadrunner Login Account Properly managing your Roadrunner login account ensures it stays secure and functional over time. Here are a few tips:

Update Recovery Options Make sure your account has a valid recovery email or phone number, so you can regain access if needed.

Regular Password Changes For security purposes, it’s advisable to change your password every few months.

Organize Emails Use folders and filters to keep your inbox organized. This will help you manage important messages more effectively.

Delete Unnecessary Emails Clearing old or unwanted messages can help you stay within storage limits and improve overall account performance.

Keeping Your Roadrunner Login Account Secure With cybersecurity threats on the rise, protecting your Roadrunner login account is more important than ever:

Use a strong and unique password combining letters, numbers, and symbols.

(Toll Free) Number +1-341-900-3252

Avoid using public Wi-Fi to access your email unless you're using a VPN.

Enable two-step authentication if available through Spectrum.

Never click suspicious links or download attachments from unknown senders.

Accessing Roadrunner Email on Mobile Devices To use your Roadrunner login account on a smartphone or tablet:

Go to your device’s email app and add a new account.

Choose "Other" or "Manual Setup" if prompted.

Enter your Roadrunner email address and password.

Input the server settings manually as previously mentioned.

Save and sync.

(Toll Free) Number +1-341-900-3252

Once configured, you can send and receive emails from your mobile device just like you would from a computer. (Toll Free) Number +1-341-900-3252

Final Thoughts Though it may not be as modern as Gmail or Outlook, the Roadrunner login account continues to serve many long-time users with reliability and simplicity. Whether you’re checking email on your desktop or syncing it with your mobile device, understanding how to manage and secure your Roadrunner account is key to staying connected and protected. (Toll Free) Number +1-341-900-3252
a
Email.cz image spam dataset v1
academictorrents.com
bittorrent
Updated Dec 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vit Listik (2019). Email.cz image spam dataset v1 [Dataset]. https://academictorrents.com/details/06f2389082e9c034fa4a73aaee00131a27e388b6
Explore at:
bittorrent(2660566545)Available download formats
Dataset updated
Dec 30, 2019
Dataset authored and provided by
Vit Listik
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The problem with email image spam classification is known from the year 2005. There are several approaches to this task. Lately, those approaches use convolutional neural networks (CNN). We propose a novel approach to the image spam classification task. Our approach is based on CNN and transfer learning, namely Resnet v1 used for semantic feature extraction and one layer Feedforward Neural Network for classification. We have shown that this approach can achieve state-of-the-art performance on publicly available datasets. 99% F1-score on two datasets [dredze 2007, Princeton] and 96% F1-score on the combination of these datasets. Due to the availability of GPUs, this approach may be used for just-in-time classification in anti-spam systems handling huge amounts of emails. We have observed also that mentioned publicly available datasets are no longer representative. We overcame this limitation by using a much richer dataset from a one-week long real traffic of the freemail provider Email.
h
FinePersonas-Synthetic-Email-Conversations
huggingface.co
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla (2024). FinePersonas-Synthetic-Email-Conversations [Dataset]. https://huggingface.co/datasets/argilla/FinePersonas-Synthetic-Email-Conversations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Argilla
License
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Description
FinePersonas Synthetic Email Conversations

FinePersonas Synthetic Email Conversations is a dataset containing around 115k conversations via email between two personas from the argilla/FinePersonas-v0.1. Conversations were generated using NousResearch/Hermes-3-Llama-3.1-70B.

🗞️ News

[10/16/2024] New subsets: added two new subsets unfriendly_email_conversations and unprofessional_email_conversations.

How were the conversations generated?… See the full description on the dataset page: https://huggingface.co/datasets/argilla/FinePersonas-Synthetic-Email-Conversations.
d
Best Healthcare Solutions Provider | Healthcare Data | Physician Data by...
datarade.ai
Updated Jun 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Infotanks Media (2021). Best Healthcare Solutions Provider | Healthcare Data | Physician Data by Infotanks Media [Dataset]. https://datarade.ai/data-products/best-healthcare-solutions-provider-healthcare-data-physic-infotanks-media
Explore at:
Dataset updated
Jun 21, 2021
Dataset authored and provided by
Infotanks Media
Area covered
Mexico, Saint Helena, Wallis and Futuna, Sri Lanka, French Guiana, Ethiopia, Colombia, Malta, Latvia, Korea (Republic of)
Description
"Facilitate marketing campaigns with the healthcare email list from Infotanks Media that includes doctors, healthcare professionals, NPI numbers, physician specialties, and more. Buy targeted email lists of healthcare professionals and connect with doctors, specialists, and other healthcare professionals to promote your products and services. Hyper personalize campaigns to increase engagement for better chances of conversion. Reach out to our data experts today! Access 1.2 million physician contact database with 150+ specialities including chiropractors, cardiologists, psychiatrists, and radiologists among others. Get ready to integrate healthcare email lists from Infotanks Media to start email marketing campaigns through any CRM and ESP. Contact us right now! Ensure guaranteed lead generation with segmented email marketing strategies for specialists, departments, and more. Make the best use of target marketing to progress and move closer to your business goals with email listing services for healthcare professionals. Infotanks Media provides 100% verified healthcare email lists with the highest email deliverability guarantee of 95%. Get a custom quote today as per your requirements. Enhance your marketing campaigns with healthcare email lists from 170+ countries to build your global outreach. Request your free sample today! Personalize your business communication and interactions to maximize conversion rates with high quality contact data. Grow your business network in your target markets from anywhere in the world with a guaranteed 95% contact accuracy of the healthcare email lists from Infotanks Media. Contact data experts at Infotanks Media from the healthcare industry to get a quick sample for free. Write to us or call today!

Hyper target within and outside your desired markets with GDPR and CAN-SPAM compliant healthcare email lists that get integrated into your CRM and ESPs. Balance out the sales and marketing efforts by aligning goals using email lists from the healthcare industry. Build strong business relationships with potential clients through personalized campaigns. Call Infotanks Media for a free consultation. Explore new geographies and target markets with a focused approach using healthcare email lists. Align your sales teams and marketing teams through personalized email marketing campaigns to ensure they accomplish business goals together. Add value and grow revenue to take your business to the next level of success. Double up your business and revenue growth with email lists of healthcare professionals. Send segmented campaigns to monitor behaviors and understand the purchasing habits of your potential clients. Send follow up nurturing email marketing campaigns to attract your potential clients to become converted customers. Close deals sooner with detailed information of your prospects using the healthcare email list from Infotanks Media. Reach healthcare professionals on their preferred platform of communication with the email list of healthcare professionals. Identify, capture, explore, and grow in your target markets anywhere in the world with a fully verified, validated, and compliant email database of healthcare professionals. Move beyond the traditional approach and automate sales cycles with buying triggers sent through email marketing campaigns. Use the healthcare email list from Infotanks Media to engage with your targeted potential clients and get them to respond. Increase email marketing campaign response rate to convert better! Reach out to Infotanks Media to customize your healthcare email lists. Call today!"
4367x PII Label-Specific Essays (by 7b Models)
kaggle.com
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Werner (2024). 4367x PII Label-Specific Essays (by 7b Models) [Dataset]. https://www.kaggle.com/datasets/valentinwerner/pii-label-specific-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Valentin Werner
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Evaluation of my dataset with my .915 baseline:

F5 score = .690 - Recall = .692, Precision = .639

Distribution of data:

843x Address (ca. 500 US)

496x Names (Incl. Middle Names, Pronounciation or Nicknames)

537x Userid

704x Username (Incl. Name)

531x Phone

755x Email (Incl. Name)

501x URL

See linked notebook for generation.

Remarks on labels:

EMAIL:

Email is always based on name, but random domains

Prompt was to also write about their favourite book, they are heavily favouring “to kill a mockingbird”

PHONE:

Generated from multiple countries for diversity

Labelling of phone numbers should only include the full number (not parts of it)

ADDRESSES:

From multiple countries for diversity

For US Addresses, State abbreviations are mapped to full name, so these are labeled as well

Addresses are only labelled as such if it starts with either of the first two words of the full address (e.g., if house number misses for us address, it is still labelled)

NAMES:

Middle names are sometimes generated, either separeted with " " or "-"

Pronounciations and nicknames were generated and labelled

However, “t’oma” as in my name Thomas is derived from the arameic word “t’oma” was not tagged. Let me know if this is wrong. They are relatively easy to identify in the names dataset by looking for “derived from”

URL:

Short domains, full websites and full URIs

USERID:

Mostly random generated string, number combination - not oriented on other formats

Can mostly easily be augmented by replacing the userid

Userid is sometimes split in text into parts - these splits are not labelled (not sure if this is right)

USERNAMES:

either generated based on name OR animal+birthyear OR colour+fruit
Dataset analysing the crossover between archivists, recordkeeping...
figshare.com
xlsx
Updated Aug 29, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Grant (2018). Dataset analysing the crossover between archivists, recordkeeping professionals and research data management using email list data [Dataset]. http://doi.org/10.6084/m9.figshare.7007903.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7007903.v1
Dataset updated
Aug 29, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rebecca Grant
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset relates to research on the connections between archives professionals and research data management. It consists of a single Excel spreadsheet with four sheets, containing an analysis of emails sent to two email discussions lists: Archives-NRA (Archivists, conservators and records managers) and Research-Dataman. The coded dataset and a list of codes used for each mailing list is provided.The two datasets were downloaded from the JiscMail Email Discussion list archives on 27 July 2018. The Archives-NRA dataset was compiled by conducting a free text search for "research data" on the mailing list's archives, and the metadata for every search result was downloaded and coded (144 metadata records in total). The resulting coded dataset demonstrates how frequently archivists and records professionals discuss research data on the Archives-NRA list, the topics which are discussed, and an increase in these discussions over time. The Research-Dataman dataset was compiled by conducting a free text search for "archivist" on the mailing list's archives, and the metadata for every search result was downloaded and coded (197 emails total). The resulting coded dataset demonstrates how frequently data management professionals seek the advice of archivists or advertise vacancies for archivists, and how often archivists email this mailing list. The names and email addresses of the mailing list participants have been redacted for privacy reasons but the original full-text emails can be accessed by members of the respective mailing lists using the URLs provided in the dataset.
d
City of Tempe 2023 Business Survey Data
catalog.data.gov
s.cnmilf.com
+10more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2023 Business Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2023-business-survey-data
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
City of Tempe
Area covered
Tempe
Description
These data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.
P
Dataset of Grouped Commit Author IDs after Identity Resolution Dataset
paperswithcode.com
zenodo.org
Updated May 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Dataset of Grouped Commit Author IDs after Identity Resolution Dataset [Dataset]. https://paperswithcode.com/dataset/dataset-of-grouped-commit-author-ids-after
Explore at:
Dataset updated
May 5, 2021
Description
This Dataset contains the IDs of 5,427,024 commit authors who have created commits in git version control system, and have more than 1 ID in git. It is a compressed CSV file (separated by ; ) with 14,861,538 author IDs, where the first column is the group ID, which is same as the first (randomly selected) author ID of the group, and the second column is the author ID that is part of the group. If an author was found to have 2 different IDs: I1, I2, then it is recorded in the file in 2 separate lines, with the lines being I1;I1 and I1;I2, i.e. the first column is the group identifier, which is one of the IDs in a group, and the second column contains the different author IDs in separate lines. This data set contains email addresses for various Git author's, but the '@' within the email address has been replaced with a '#'.
t
CommunitySurvey2023weighted
data.tempe.gov
data-academy.tempe.gov
+6more
Updated Jan 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). CommunitySurvey2023weighted [Dataset]. https://data.tempe.gov/datasets/tempegov::communitysurvey2023weighted
Explore at:
Dataset updated
Jan 2, 2024
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
t
City of Tempe 2023 Community Survey Data
data.tempe.gov
data-academy.tempe.gov
+8more
Updated Jan 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2023 Community Survey Data [Dataset]. https://data.tempe.gov/maps/cacfb4bb56244552a6587fd2aa3fb06d
Explore at:
Dataset updated
Jan 2, 2024
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
d
CommunitySurvey2023unweighted
catalog.data.gov
datasets.ai
+4more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). CommunitySurvey2023unweighted [Dataset]. https://catalog.data.gov/dataset/communitysurvey2023unweighted
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
City of Tempe
Description
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary

Fraud	Non-Fraud
2327	445090

Facebook

Twitter

Click to copy link

Link copied

Cite

Max Maass; Max Maass; Stephan Schwär; Stephan Schwär; Matthias Hollick; Matthias Hollick (2022). A Dataset of over 500.000 commercial email newsletters, as collected by PrivacyMail.info [Dataset]. http://doi.org/10.5281/zenodo.6509751

A Dataset of over 500.000 commercial email newsletters, as collected by PrivacyMail.info

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.6509751

Dataset updated

Jun 13, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Max Maass; Max Maass; Stephan Schwär; Stephan Schwär; Matthias Hollick; Matthias Hollick

Description

This dataset contains the data from roughly two years of operating PrivacyMail.info, an Open Source Email privacy measurement platform. It contains slightly over 500.000 commercial newsletters, as crowdsourced by users of PrivacyMail.info. You can find the methodology discussed in our paper: Max Maass, Stephan Schwär, and Matthias Hollick. "Towards transparency in email tracking." Annual Privacy Forum, 2019. The source code can be found on github.com/privacymail/privacymail

Please note that, due to its crowdsourced nature, this dataset is a sample of opportunity - it is not representative for all newsletters on the Internet, and likely contains biases based on how it was collected. Notably, German-language newsletters will likely be heavily over-represented.

Dataset Structure
The dataset is structured as follows: On the top level are folders describing the website the newsletter belongs to. Inside that folder are subfolders for each identity that was registered for that website. Inside each of these folders are a series of .eml files that represent the received email messages.

Copyright and Licensing
This dataset is set to non-public due to copyright concerns: The contents of the email messages are (presumably) protected by copyright in most jurisdictions. Most copyright doctrines contain exceptions for non-commercial research use - thus, we feel it is appropriate and acceptable to share the data on a case-by-case basis, the same way we did before shutting down PrivacyMail.info. When requesting access to the data, please briefly describe what research you want to conduct with it, and we will grant you access.

We thus do not put any explicit license on this dataset. Please do not share the raw data publicly. We request that you cite the above-mentioned paper and this dataset in any publications that result from it.

Clear search

Close search

Google apps

Main menu

A Dataset of over 500.000 commercial email newsletters, as collected by...

Enron Fraud Email Dataset

Label Annotation

Automated ML Labeling

Email Signals

Manual Inspection

Dataset Breakdown

Citations

Email Dataset for Automatic Response Suggestion within a University

Spam email classification

US Consumer Marketing Data - 269M+ Consumer Records - 95% Email and Direct...

Spam Mail Prediction Dataset

email-EU

Overview

Statistics

Source of original data

References

Enron Email Time-Series Network

Arabic Phishing and Legitimate emails - Samples

How to Login Roadrunner Account? | A Complete Guide Dataset

Email.cz image spam dataset v1

FinePersonas-Synthetic-Email-Conversations

Best Healthcare Solutions Provider | Healthcare Data | Physician Data by...

4367x PII Label-Specific Essays (by 7b Models)

Evaluation of my dataset with my .915 baseline:

Distribution of data:

See linked notebook for generation.

Remarks on labels:

EMAIL:

PHONE:

ADDRESSES:

NAMES:

URL:

USERID:

USERNAMES:

Dataset analysing the crossover between archivists, recordkeeping...

City of Tempe 2023 Business Survey Data

Dataset of Grouped Commit Author IDs after Identity Resolution Dataset

CommunitySurvey2023weighted

City of Tempe 2023 Community Survey Data

CommunitySurvey2023unweighted

A Dataset of over 500.000 commercial email newsletters, as collected by PrivacyMail.info