81 datasets found

Enron Email Dataset
academictorrents.com
bittorrent
Updated Aug 26, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enron (2016). Enron Email Dataset [Dataset]. https://academictorrents.com/details/4697a6e1e7841602651b087d84f904d43590d4ff
Explore at:
bittorrent(443254787)Available download formats
Dataset updated
Aug 26, 2016
Dataset authored and provided by
Enronhttp://www.enron.com/
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
To quote the data source: "This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems, and it is thanks to them (not me) that the dataset is available. The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees". Invalid email addresses were converted to something of the form user@enron.com whenever possible (i.e., recipient is specified in som
Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...
datarade.ai
Updated Oct 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2024). Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat files available – 170M+, Verified Profiles - Best Price Guarantee [Dataset]. https://datarade.ai/data-products/success-ai-us-premium-b2b-emails-phone-numbers-dataset-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 12, 2024
Dataset provided by
Area covered
United States
Description
Success.ai offers a comprehensive, enterprise-ready B2B leads data solution, ideal for businesses seeking access to over 150 million verified employee profiles and 170 million work emails. Our data empowers organizations across industries to target key decision-makers, optimize recruitment, and fuel B2B marketing efforts. Whether you're looking for UK B2B data, B2B marketing data, or global B2B contact data, Success.ai provides the insights you need with pinpoint accuracy.

Tailored for B2B Sales, Marketing, Recruitment and more: Our B2B contact data and B2B email data solutions are designed to enhance your lead generation, sales, and recruitment efforts. Build hyper-targeted lists based on job title, industry, seniority, and geographic location. Whether you’re reaching mid-level professionals or C-suite executives, Success.ai delivers the data you need to connect with the right people.

API Features:

Real-Time Updates: Our APIs deliver real-time updates, ensuring that the contact data your business relies on is always current and accurate.

High Volume Handling: Designed to support up to 860k API calls per day, our system is built for scalability and responsiveness, catering to enterprises of all sizes.

Flexible Integration: Easily integrate with CRM systems, marketing automation tools, and other enterprise applications to streamline your workflows and enhance productivity.

Key Categories Served: B2B sales leads – Identify decision-makers in key industries, B2B marketing data – Target professionals for your marketing campaigns, Recruitment data – Source top talent efficiently and reduce hiring times, CRM enrichment – Update and enhance your CRM with verified, updated data, Global reach – Coverage across 195 countries, including the United States, United Kingdom, Germany, India, Singapore, and more.

Global Coverage with Real-Time Accuracy: Success.ai’s dataset spans a wide range of industries such as technology, finance, healthcare, and manufacturing. With continuous real-time updates, your team can rely on the most accurate data available: 150M+ Employee Profiles: Access professional profiles worldwide with insights including full name, job title, seniority, and industry. 170M Verified Work Emails: Reach decision-makers directly with verified work emails, available across industries and geographies, including Singapore and UK B2B data. GDPR-Compliant: Our data is fully compliant with GDPR and other global privacy regulations, ensuring safe and legal use of B2B marketing data.

Key Data Points for Every Employee Profile: Every profile in Success.ai’s database includes over 20 critical data points, providing the information needed to power B2B sales and marketing campaigns: Full Name, Job Title, Company, Work Email, Location, Phone Number, LinkedIn Profile, Experience, Education, Technographic Data, Languages, Certifications, Industry, Publications & Awards.

Use Cases Across Industries: Success.ai’s B2B data solution is incredibly versatile and can support various enterprise use cases, including: B2B Marketing Campaigns: Reach high-value professionals in industries such as technology, finance, and healthcare. Enterprise Sales Outreach: Build targeted B2B contact lists to improve sales efforts and increase conversions. Talent Acquisition: Accelerate hiring by sourcing top talent with accurate and updated employee data, filtered by job title, industry, and location. Market Research: Gain insights into employment trends and company profiles to enrich market research. CRM Data Enrichment: Ensure your CRM stays accurate by integrating updated B2B contact data. Event Targeting: Create lists for webinars, conferences, and product launches by targeting professionals in key industries.

Use Cases for Success.ai's Contact Data - Targeted B2B Marketing: Create precise campaigns by targeting key professionals in industries like tech and finance. - Sales Outreach: Build focused sales lists of decision-makers and C-suite executives for faster deal cycles. - Recruiting Top Talent: Easily find and hire qualified professionals with updated employee profiles. - CRM Enrichment: Keep your CRM current with verified, accurate employee data. - Event Targeting: Create attendee lists for events by targeting relevant professionals in key sectors. - Market Research: Gain insights into employment trends and company profiles for better business decisions. - Executive Search: Source senior executives and leaders for headhunting and recruitment. - Partnership Building: Find the right companies and key people to develop strategic partnerships.

Why Choose Success.ai’s Employee Data? Success.ai is the top choice for enterprises looking for comprehensive and affordable B2B data solutions. Here’s why: Unmatched Accuracy: Our AI-powered validation process ensures 99% accuracy across all data points, resulting in higher engagement and fewer bounces. Global Scale: With 150M+ employee profiles and 170M veri...
A Dataset of over 500.000 commercial email newsletters, as collected by...
zenodo.org
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Maass; Max Maass; Stephan Schwär; Stephan Schwär; Matthias Hollick; Matthias Hollick (2022). A Dataset of over 500.000 commercial email newsletters, as collected by PrivacyMail.info [Dataset]. http://doi.org/10.5281/zenodo.6509751
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6509751
Dataset updated
Jun 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Maass; Max Maass; Stephan Schwär; Stephan Schwär; Matthias Hollick; Matthias Hollick
Description
This dataset contains the data from roughly two years of operating PrivacyMail.info, an Open Source Email privacy measurement platform. It contains slightly over 500.000 commercial newsletters, as crowdsourced by users of PrivacyMail.info. You can find the methodology discussed in our paper: Max Maass, Stephan Schwär, and Matthias Hollick. "Towards transparency in email tracking." Annual Privacy Forum, 2019. The source code can be found on github.com/privacymail/privacymail

Please note that, due to its crowdsourced nature, this dataset is a sample of opportunity - it is not representative for all newsletters on the Internet, and likely contains biases based on how it was collected. Notably, German-language newsletters will likely be heavily over-represented.

Dataset Structure
The dataset is structured as follows: On the top level are folders describing the website the newsletter belongs to. Inside that folder are subfolders for each identity that was registered for that website. Inside each of these folders are a series of .eml files that represent the received email messages.

Copyright and Licensing
This dataset is set to non-public due to copyright concerns: The contents of the email messages are (presumably) protected by copyright in most jurisdictions. Most copyright doctrines contain exceptions for non-commercial research use - thus, we feel it is appropriate and acceptable to share the data on a case-by-case basis, the same way we did before shutting down PrivacyMail.info. When requesting access to the data, please briefly describe what research you want to conduct with it, and we will grant you access.

We thus do not put any explicit license on this dataset. Please do not share the raw data publicly. We request that you cite the above-mentioned paper and this dataset in any publications that result from it.
Email Thread Summary Dataset
kaggle.com
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marawan Mamdouh (2023). Email Thread Summary Dataset [Dataset]. https://www.kaggle.com/datasets/marawanxmamdouh/email-thread-summary-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marawan Mamdouh
Description
Email Thread Summary Dataset

Overview:

The Email Thread Dataset consists of two main files: email_thread_details and email_thread_summaries. These files collectively offer a comprehensive compilation of email thread information alongside human-generated summaries.

Email Thread Details:

Description:

The email_thread_details file provides a detailed perspective on individual email threads, encompassing crucial information such as subject, timestamp, sender, recipients, and the content of the email.

Columns:

thread_id: A unique identifier for each email thread.

subject: Subject of the email thread.

timestamp: Timestamp indicating when the message was sent.

from: Sender of the email.

to: List of recipients of the email.

body: Content of the email message.

Additional Information:

The "to" column is available in both CSV and Pickle (pkl) formats, facilitating convenient access to recipient information as a column of lists of strings.

Email Thread Summaries:

Description:

The email_thread_summaries file contains concise summaries crafted by human annotators for each email thread, offering a high-level overview of the content.

Columns:

thread_id: A unique identifier for each email thread.

summary: A concise summary of the email thread.

Dataset Structure:

The dataset is organized into threads and emails. There are a total of 4,167 threads and 21,684 emails, providing a rich source of information for analysis and research purposes.

Threads: 4,167 threads

Emails: 21,684 emails

Language:

Languages: English (en)

Use Cases:

Natural Language Processing (NLP) Research:

Analyze email thread contents and human-generated summaries for advancements in NLP tasks.

Text Summarization Models:

Train and evaluate text summarization models using the provided email threads and summaries.

Email Analytics:

Gain insights into communication patterns, sender-receiver relationships, and content analysis.

File Formats:

CSV Files:

Easily importable into various data analysis tools.

Pickle (pkl) Files:

Facilitates direct reading of the "to" column as a column of lists of strings.

JSON Files:

Offers compatibility with JSON data structures, providing an additional option for users who prefer or require this widely-used format in their analytical workflows.

****JSON File Features Description****

[ { "thread_id": [unique identifier], "subject": "[email thread subject]", "timestamp": [timestamp in milliseconds], "from": "[sender's name and identifier]", "to": [ "[recipient 1]", "[recipient 2]", "[recipient 3]", ... ], "body": "[email content]" }, ... ]

[ { "thread_id": [unique identifier], "summary": "[summary content]" }, ... ]

****Files Structure:****

- Dataset ├── CSV │ ├── email_thread_details.csv │ └── email_thread_summaries.csv ├── Pickle │ ├── email_thread_details.pkl │ └── email_thread_summaries.pkl └── JSON ├── email_thread_details.json └── email_thread_summaries.json

License:

This dataset is provided under the MIT License.

Disclaimer:

The dataset has been anonymized and sanitized to ensure privacy and confidentiality.
d
Personal Emails | 100M+ Personal Emails for US Consumers and Contacts | B2C...
datarade.ai
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bytemine (2025). Personal Emails | 100M+ Personal Emails for US Consumers and Contacts | B2C Contact Data | Email Data | Personal Emails [Dataset]. https://datarade.ai/data-products/personal-emails-100m-personal-emails-for-us-consumers-and-bytemine
Explore at:
.json, .csv, .xls, .sql, .txt, .jsonl, .parquetAvailable download formats
Dataset updated
Aug 6, 2025
Dataset authored and provided by
Bytemine
Area covered
United States
Description
Bytemine offers access to over 100 million verified personal email addresses for US consumers and professionals. This extensive B2C contact database is designed to support modern outreach, digital marketing, lead generation, and customer engagement across channels that reach people where they are most responsive — their personal inbox.

Unlike traditional work email databases that limit outreach to business hours or corporate filters, personal emails enable more flexible, direct, and often higher-converting communication. Whether you're running direct-to-consumer campaigns, re-engaging inactive users, or enriching existing contact records, Bytemine provides the scale and data quality you need to connect effectively.

Our personal email dataset includes:

100 million+ verified personal email addresses (Gmail, Yahoo, Outlook, etc.) Matched with names, phone numbers, location, and demographic attributes 50+ enriched fields including age range, gender, location, occupation, and consumer behavior signals Optional inclusion of job title, company, and professional details for dual B2B-B2C targeting

All emails are verified and regularly updated to ensure deliverability, reduce bounce rates, and improve sender reputation. Contacts are sourced through direct data licensing agreements with consumer platforms, B2C applications, and verified aggregators, ensuring compliance and reliability.

This data is ideal for:

B2C marketing campaigns (email newsletters, promotions, lifecycle emails) Direct-to-consumer product launches and brand activations Customer re-engagement and loyalty campaigns Lookalike audience creation for paid media CRM enrichment with consumer-facing contact info Identity resolution and cross-channel targeting Data onboarding for ad platforms or audience segmentation Consumer surveys, polling, and research

Bytemine’s personal email dataset empowers your marketing, growth, and data teams with clean, structured, and highly scalable contact information. Each record can be enriched with behavioral and demographic data, enabling advanced personalization and segmentation strategies.

Access is available through:

Web-based search platform for easy filtering, export, and targeting

API access for integration into your product, workflow, or marketing engine

With flexible delivery options and scalable pricing, Bytemine supports startups, growth teams, agencies, and enterprise platforms looking to expand their reach and drive performance with verified consumer data.

If you're looking to power outreach across consumer inboxes, enrich B2C data, or build a scalable, compliant contact database, Bytemine’s personal email dataset is the fastest way to connect with real people across the United States.
d
Email Address Data | Validated Personal and Business Emails | 148MM+ US B2B...
datarade.ai
.json, .csv, .xls
Updated Feb 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salutary Data (2024). Email Address Data | Validated Personal and Business Emails | 148MM+ US B2B Contacts [Dataset]. https://datarade.ai/data-products/salutary-data-email-address-data-validated-personal-and-b-salutary-data
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Feb 20, 2024
Dataset authored and provided by
Salutary Data
Area covered
United States of America
Description
Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4M+ companies, and is updated regularly to ensure we have the most up-to-date information.

We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.

What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.

Products: API Suite Web UI Full and Custom Data Feeds

Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
Z
Dataset of Survey on Current Email Management Practices
data.niaid.nih.gov
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sachdeva, Anisha (2023). Dataset of Survey on Current Email Management Practices [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8028184
Explore at:
Dataset updated
Jun 13, 2023
Dataset authored and provided by
Sachdeva, Anisha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains anonymised survey responses from a comprehensive study conducted to explore current email management practices among users. The survey aimed to gain insights into how individuals handle and organize their email communications in various contexts. The survey questionnaire consisted of carefully designed questions related to email usage patterns, organisational strategies, folder structures, and automation utilised for email management. The survey also explored participants' preferences for automated rule-based filtering functionality and any challenges they face in effectively managing their mailbox.

Researchers and professionals interested in email management and information organisation can leverage this dataset for research, analysis, and potential improvements in email client design and functionality.

We kindly request that any publications or research utilising this dataset appropriately acknowledge and cite the original source to ensure proper attribution to the survey and its participants.
The Enron Email Dataset
kaggle.com
zip
Updated Jun 16, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Cukierski (2016). The Enron Email Dataset [Dataset]. http://www.kaggle.com/forums/f/1322/the-enron-email-dataset
Explore at:
zip(375294957 bytes)Available download formats
Dataset updated
Jun 16, 2016
Authors
Will Cukierski
Description
The Enron email dataset contains approximately 500,000 emails generated by employees of the Enron Corporation. It was obtained by the Federal Energy Regulatory Commission during its investigation of Enron's collapse.

This is the May 7, 2015 Version of dataset, as published at https://www.cs.cmu.edu/~./enron/
d
Data from: EnronSR: A Benchmark for Evaluating AI-Generated Email Replies
search.dataone.org
dataverse.harvard.edu
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shay, Moran; Davidson, Roei; Grinberg, Nir (2024). EnronSR: A Benchmark for Evaluating AI-Generated Email Replies [Dataset]. http://doi.org/10.7910/DVN/RQBWAC
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/RQBWAC
Dataset updated
Mar 6, 2024
Dataset provided by
Harvard Dataverse
Authors
Shay, Moran; Davidson, Roei; Grinberg, Nir
Description
EnronSR, is a benchmark dataset based on the Enron email corpus that contains both naturally occurring human- and AI-generated email replies for the same set of messages. This resource enables the public benchmarking of novel language-generation models and facilitates a comparison against the strong, production-level baseline of Google Smart Reply used by millions of people.
h
Panza-emails
huggingface.co
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IST Austria Distributed Algorithms and Systems Lab (2025). Panza-emails [Dataset]. https://huggingface.co/datasets/ISTA-DASLab/Panza-emails
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Dataset authored and provided by
IST Austria Distributed Algorithms and Systems Lab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Panza Emails dataset

This dataset contains collections of emails of three authentic users (david, isabel, and marcus), with personal information (names, places, etc.) replaced by other ones for donor privacy. Except for these changes, the language of the emails is genuine. The intention of this dataset is to allow researchers to study strategies for text personalization. The data was donated explicitly for this purpose. This dataset is ethically collected and fully licensed for… See the full description on the dataset page: https://huggingface.co/datasets/ISTA-DASLab/Panza-emails.
MultiSocial
zenodo.org
data.niaid.nih.gov
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal (2025). MultiSocial [Dataset]. http://doi.org/10.5281/zenodo.13846152
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13846152
Dataset updated
Aug 20, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MultiSocial is a dataset (described in a paper) for multilingual (22 languages) machine-generated text detection benchmark in social-media domain (5 platforms). It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual large language models by using 3 iterations of paraphrasing. The dataset has been anonymized to minimize amount of sensitive data by hiding email addresses, usernames, and phone numbers.

If you use this dataset in any publication, project, tool or in any other form, please, cite the paper.

Disclaimer

Due to data source (described below), the dataset may contain harmful, disinformation, or offensive content. Based on a multilingual toxicity detector, about 8% of the text samples are probably toxic (from 5% in WhatsApp to 10% in Twitter). Although we have used data sources of older date (lower probability to include machine-generated texts), the labeling (of human-written text) might not be 100% accurate. The anonymization procedure might not successfully hiden all the sensitive/personal content; thus, use the data cautiously (if feeling affected by such content, report the found issues in this regard to dpo[at]kinit.sk). The intended use if for non-commercial research purpose only.

Data Source

The human-written part consists of a pseudo-randomly selected subset of social media posts from 6 publicly available datasets:

Telegram data originated in Pushshift Telegram, containing 317M messages (Baumgartner et al., 2020). It contains messages from 27k+ channels. The collection started with a set of right-wing extremist and cryptocurrency channels (about 300 in total) and was expanded based on occurrence of forwarded messages from other channels. In the end, it thus contains a wide variety of topics and societal movements reflecting the data collection time.

Twitter data originated in CLEF2022-CheckThat! Task 1, containing 34k tweets on COVID-19 and politics (Nakov et al., 2022, combined with Sentiment140, containing 1.6M tweets on various topics (Go et al., 2009).

Gab data originated in the dataset containing 22M posts from Gab social network. The authors of the dataset (Zannettou et al., 2018) found out that “Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls.” They also found out that hate speech is much more prevalent there compared to Twitter, but lower than 4chan's Politically Incorrect board.

Discord data originated in Discord-Data, containing 51M messages. This is a long-context, anonymized, clean, multi-turn and single-turn conversational dataset based on Discord data scraped from a large variety of servers, big and small. According to the dataset authors, it contains around 0.1% of potentially toxic comments (based on the applied heuristic/classifier).

WhatsApp data originated in whatsapp-public-groups, containing 300k messages (Garimella & Tyson, 2018). The public dataset contains the anonymised data, collected for around 5 months from around 178 groups. Original messages were made available to us on request to dataset authors for research purposes.

From these datasets, we have pseudo-randomly sampled up to 1300 texts (up to 300 for test split and the remaining up to 1000 for train split if available) for each of the selected 22 languages (using a combination of automated approaches to detect the language) and platform. This process resulted in 61,592 human-written texts, which were further filtered out based on occurrence of some characters or their length, resulting in about 58k human-written texts.

The machine-generated part contains texts generated by 7 LLMs (Aya-101, Gemini-1.0-pro, GPT-3.5-Turbo-0125, Mistral-7B-Instruct-v0.2, opt-iml-max-30b, v5-Eagle-7B-HF, vicuna-13b). All these models were self-hosted except for GPT and Gemini, where we used the publicly available APIs. We generated the texts using 3 paraphrases of the original human-written data and then preprocessed the generated texts (filtered out cases when the generation obviously failed).

The dataset has the following fields:

'text' - a text sample,

'label' - 0 for human-written text, 1 for machine-generated text,

'multi_label' - a string representing a large language model that generated the text or the string "human" representing a human-written text,

'split' - a string identifying train or test split of the dataset for the purpose of training and evaluation respectively,

'language' - the ISO 639-1 language code identifying the detected language of the given text,

'length' - word count of the given text,

'source' - a string identifying the source dataset / platform of the given text,

'potential_noise' - 0 for text without identified noise, 1 for text with potential noise.

ToDo Statistics (under construction)
d
US CEO Contact Data | 1.8MM+ CEO Profiles with Validated Work Email, Mobile...
datarade.ai
.json, .csv, .xls
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salutary Data (2023). US CEO Contact Data | 1.8MM+ CEO Profiles with Validated Work Email, Mobile Phone + More [Dataset]. https://datarade.ai/data-products/salutary-data-us-ceo-contact-data-500k-ceo-profiles-with-salutary-data
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Aug 12, 2023
Dataset authored and provided by
Salutary Data
Area covered
United States of America
Description
Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4MM+ companies, and is updated regularly to ensure we have the most up-to-date information.

We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.

What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.

Products: API Suite Web UI Full and Custom Data Feeds

Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
Email CTR Prediction
kaggle.com
Updated Nov 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sk4467 (2022). Email CTR Prediction [Dataset]. https://www.kaggle.com/datasets/sk4467/email-ctr-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sk4467
Description
Most organizations today rely on email campaigns for effective communication with users. Email communication is one of the popular ways to pitch products to users and build trustworthy relationships with them. Email campaigns contain different types of CTA (Call To Action). The ultimate goal of email campaigns is to maximize the Click Through Rate (CTR). CTR = No. of users who clicked on at least one of the CTA / No. of emails delivered. This Dataset contains details of body length, sub length, mean paragraph , day of week, is weekend, etc.
w
Immigration system statistics data tables
gov.uk
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
Explore at:
Dataset updated
Aug 21, 2025
Dataset provided by
GOV.UK
Authors
Home Office
Description
List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

Accessible file formats

The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.

Related content

Immigration system statistics, year ending June 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives

Passenger arrivals

https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)

‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

Electronic travel authorisation

https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

Entry clearance visas granted outside the UK

https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)

https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

Additional data relating to in country and overseas Visa applications can be fo
70,000 Active buyer email list from Amazon & ebay for #Email_marketing
dataandsons.com
csv, zip
Updated Dec 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
boobxff.blogspot.com (2020). 70,000 Active buyer email list from Amazon & ebay for #Email_marketing [Dataset]. https://www.dataandsons.com/categories/markets/70-000-active-buyer-email-list-from-amazon-and-ebay-for-email-marketing
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 12, 2020
Dataset provided by
Authors
boobxff.blogspot.com
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
About this Dataset

You will get an active email list for real and active buyers who make regular purchases through Amazon and other e-commerce sites. This email list contains 100% original email address. You can also use these emails to increase visits to your website, blog, or YouTube channel. I offer you now, a great treasure to use whenever you want.

So don't waste your time and start boosting your ecommerce business online.

The buyers will be from:

United States of America Canada Europe Union

$ There are no duplicate emails $ No fake IDs $ Audiences ready to buy

Category

Markets

Keywords

market,emails,email ma,list,buyer

Row Count

70150

Price

$90.00
p
Bitcoin User Email Data
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Bitcoin User Email Data [Dataset]. https://listtodata.com/bitcoin-user-email-list
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Authors
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Turkmenistan, Benin, Albania, Mauritania, Hong Kong, Suriname, Somalia, United Arab Emirates, Burkina Faso, Svalbard and Jan Mayen
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Bitcoin user email list helps you reach real Bitcoin users directly. Moreover, Bitcoin is a fast-growing digital currency used worldwide for trading and payments. Therefore, many people treat it like gold and use it for valuable deals. In addition, businesses can connect with Bitcoin investors, traders, and experts through this email list. As a result, you can promote your products to the right audience without wasting time. Hence, you can easily create targeted marketing campaigns for Bitcoin users. Besides, you can grow your client base, build trust, and increase your sales. Additionally, these contacts can help you build strong business relationships in the crypto market.

Bitcoin user email list saves both time and effort. Likewise, our list ensures you get accurate and updated Bitcoin user information. Therefore, you can quickly send offers, updates, or newsletters. Moreover, this targeted approach improves your marketing results and boosts ROI. In short, a Bitcoin user email list is a smart tool for any crypto business. Above all, it helps you connect, promote, and grow in the fast-moving Bitcoin industry through List to Data. Bitcoin user email database is a powerful resource for connecting with active Bitcoin users. In addition, this database includes verified contact details like names, emails, and locations. As a result, you can reach people who are genuinely interested in cryptocurrency. Furthermore, using this database helps you send targeted offers to the right audience. Besides, you can promote Bitcoin-related products, courses, or platforms directly to interested users. Likewise, this approach saves time, improves response rates, and increases sales. On the other hand, it also helps you build trust and long-term relationships with clients.

Bitcoin user email database is an affordable and effective tool for crypto businesses. Additionally, our Bitcoin user email database gets regular updates to ensure accuracy. Hence, you can run marketing campaigns without worrying about outdated contacts. Moreover, the data helps you understand user interests and behaviors. Therefore, you can create smarter marketing plans and achieve better results. Above all, it connects you with the right people to grow your business quickly.
d
Targeted Email List | Global Database | 2 Billion+ Contacts
datacaptive.com
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataCaptive™ (2025). Targeted Email List | Global Database | 2 Billion+ Contacts [Dataset]. https://www.datacaptive.com/targeted-email-lists/
Explore at:
Dataset updated
May 11, 2025
Dataset authored and provided by
DataCaptive™
Area covered
Finland, Kuwait, Georgia, United States, Ireland, New Zealand, Canada, Mexico, Belgium, Netherlands
Description
Discover unparalleled business opportunities with our Targeted Email List, featuring over 2 billion global contacts.

Explore our global B2B contact and company database, providing essential data fields including Name, Website, Contact First Name, Contact Last Name, Job Title, Email Address, Phone Number, Revenue Size, Employee Size, Location, City, State, Country, Zip Code, and additional customizable data fields upon request. Access a comprehensive repository tailored to meet your specific business needs, ensuring you have access to accurate and detailed information for effective networking and targeted outreach.
p
China Email Data
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). China Email Data [Dataset]. https://listtodata.com/china-email-list
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Authors
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
China
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
China email data delivers high-quality contacts, enabling you to expand your reach and dominate your market. Access the vast potential of China’s market with our China Email Data, a powerful database for global businesses. This resource offers verified leads across industries, ensuring your campaigns resonate with the right audience. At List to Data, we prioritize accuracy and reliability, delivering a directory that’s both comprehensive and actionable. Moreover, our material is regularly updated to reflect the latest market trends. Whether you’re expanding your reach or launching a new product, this dataset provides the foundation for success. Simplify your marketing efforts and boost engagement with this trusted library. Trust List to Data to deliver the tools you need for impactful outreach and lasting connections. China consumer email list may help you change your outreach efforts by ensuring your message reaches the appropriate people every time! This comprehensive resource provides access to a massive network of potential customers. As a result, you can increase your brand visibility and drive sales. Moreover, our data is regularly updated and verified. Therefore, you can improve your marketing ROI. Consequently, you can target specific demographics and regions. Furthermore, this valuable resource allows you to connect with key decision-makers. Finally, List to Data offers this powerful dataset to fuel your business growth in China. China business email list is a powerful resource for reaching professionals in China. This database provides verified leads to ensure your campaigns are effective. Additionally, it is designed to save time and maximize ROI. Moreover, the directory is regularly updated for accuracy. Furthermore, it offers a seamless way to expand your market reach. As a result, you can enhance your marketing efforts with reliable information. In addition, this library of contacts is tailored for both B2B and B2C outreach. Finally, trust List To Data to deliver a dataset that drives results and boosts your market presence.
B
Statcan Dialogue Dataset
borealisdata.ca
search.dataone.org
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xing Han Lu; Siva Reddy; Harm de Vries (2023). Statcan Dialogue Dataset [Dataset]. http://doi.org/10.5683/SP3/NR0BMY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/NR0BMY
Dataset updated
Apr 6, 2023
Dataset provided by
Borealis
Authors
Xing Han Lu; Siva Reddy; Harm de Vries
License
https://borealisdata.ca/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.5683/SP3/NR0BMYhttps://borealisdata.ca/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.5683/SP3/NR0BMY
Description
Welcome to the data repository for requesting access to the Statcan Dialogue Dataset! Before requesting access, you can visit our website or read our EACL 2023 paper Requesting Access In order to use our dataset, you must agree to the terms of use and restrictions before requesting access (see below). We will manually review each request and grant access or reach out to you for further information. To facilitate the process, make sure that: Your Dataverse account is linked to your professional/research website, which we may review to ensure the dataset will be used for the intended purpose Your request is made with an academic (e.g. .edu) or professional email (e.g. @servicenow.com). To do this, your have to set your primary email to your academic/professional email, or create a new Dataverse account. If your academic institution does not end with .edu, or you are part of a professional group that does not have an email address, please contact us (see email in paper). Abstract: We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.
u
Email app statistics
opendata.umea.se
opendataumea.opendatasoft.com
csv, excel, json
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Email app statistics [Dataset]. https://opendata.umea.se/explore/dataset/getemailappusageusercounts/
Explore at:
csv, json, excelAvailable download formats
Dataset updated
Sep 1, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Get the count of unique users that connected to Exchange Online using any email app.

Facebook

Twitter

Click to copy link

Link copied

Cite

Enron (2016). Enron Email Dataset [Dataset]. https://academictorrents.com/details/4697a6e1e7841602651b087d84f904d43590d4ff

Enron Email Dataset

Explore at:

bittorrent(443254787)Available download formats

Dataset updated

Aug 26, 2016

Dataset authored and provided by

Enronhttp://www.enron.com/

License

https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

Description

To quote the data source: "This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems, and it is thanks to them (not me) that the dataset is available. The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees". Invalid email addresses were converted to something of the form user@enron.com whenever possible (i.e., recipient is specified in som

Clear search

Close search

Google apps

Main menu

Enron Email Dataset

Success.ai | | US Premium B2B Emails & Phone Numbers Dataset - APIs and flat...

A Dataset of over 500.000 commercial email newsletters, as collected by...

Email Thread Summary Dataset

Email Thread Summary Dataset

Overview:

Email Thread Details:

Description:

Columns:

Additional Information:

Email Thread Summaries:

Description:

Columns:

Dataset Structure:

Language:

Use Cases:

File Formats:

****Files Structure:****

License:

Disclaimer:

Personal Emails | 100M+ Personal Emails for US Consumers and Contacts | B2C...

Email Address Data | Validated Personal and Business Emails | 148MM+ US B2B...

Dataset of Survey on Current Email Management Practices

The Enron Email Dataset

Data from: EnronSR: A Benchmark for Evaluating AI-Generated Email Replies

Panza-emails

MultiSocial

Disclaimer

Data Source

US CEO Contact Data | 1.8MM+ CEO Profiles with Validated Work Email, Mobile...

Email CTR Prediction

Immigration system statistics data tables

Accessible file formats

Related content

Passenger arrivals

Electronic travel authorisation

Entry clearance visas granted outside the UK

70,000 Active buyer email list from Amazon & ebay for #Email_marketing

About this Dataset

Category

Keywords

Row Count

Price

Bitcoin User Email Data

Targeted Email List | Global Database | 2 Billion+ Contacts

China Email Data

Statcan Dialogue Dataset

Email app statistics

Enron Email DatasetSee More Versions

Files Structure:

Enron Email Dataset