20 datasets found

Enron Email Time-Series Network
zenodo.org
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2020). Enron Email Time-Series Network [Dataset]. http://doi.org/10.5281/zenodo.1342353
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1342353
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

To build a graph G = (V, E), we use email addresses as nodes V. Every node v_i has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge e_ij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

'id-email.csv' is a file containing the actual email addresses.
h
cnn_dailymail
huggingface.co
Updated Aug 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abigail See (2023). cnn_dailymail [Dataset]. https://huggingface.co/datasets/abisee/cnn_dailymail
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Authors
Abigail See
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for CNN Dailymail Dataset

Dataset Summary

The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

Supported Tasks and Leaderboards

'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.
u
The total number of mailboxes and number of active mailboxes every day
opendata.umea.se
opendataumea.aws-ec2-eu-central-1.opendatasoft.com
csv, excel, json
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). The total number of mailboxes and number of active mailboxes every day [Dataset]. https://opendata.umea.se/explore/dataset/getmailboxusagemailboxcounts0/
Explore at:
json, csv, excelAvailable download formats
Dataset updated
Oct 1, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The total number of user mailboxes in Umeå kommun and how many are active each day of the reporting period. A mailbox is considered active if the user sent or read any email.
Email CTR Prediction
kaggle.com
Updated Nov 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sk4467 (2022). Email CTR Prediction [Dataset]. https://www.kaggle.com/datasets/sk4467/email-ctr-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sk4467
Description
Most organizations today rely on email campaigns for effective communication with users. Email communication is one of the popular ways to pitch products to users and build trustworthy relationships with them. Email campaigns contain different types of CTA (Call To Action). The ultimate goal of email campaigns is to maximize the Click Through Rate (CTR). CTR = No. of users who clicked on at least one of the CTA / No. of emails delivered. This Dataset contains details of body length, sub length, mean paragraph , day of week, is weekend, etc.
d
Global Domain Name Data | DNS and Risk Classification via Dataset & API |...
datarade.ai
.json, .csv
Updated Nov 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datazag (2024). Global Domain Name Data | DNS and Risk Classification via Dataset & API | 267M+ Domains Covering Over 1570 Domain Zones | Updated Daily [Dataset]. https://datarade.ai/data-products/datazag-global-domain-name-data-dns-and-risk-classificatio-datazag
Explore at:
.json, .csvAvailable download formats
Dataset updated
Nov 2, 2024
Dataset authored and provided by
Datazag
Area covered
Bahamas, Marshall Islands, Dominica, Lesotho, Norway, State of, Niue, Gambia, Kenya, Paraguay
Description
DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.

The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.

DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.

Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email marketing applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.
h
cnn_dailymail
huggingface.co
tensorflow.org
+1more
Updated Dec 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ccdv (2021). cnn_dailymail [Dataset]. https://huggingface.co/datasets/ccdv/cnn_dailymail
Explore at:
Dataset updated
Dec 18, 2021
Authors
ccdv
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CNN/DailyMail non-anonymized summarization dataset.

There are two features: - article: text of news article, used as the document to be summarized - highlights: joined text of highlights with and around each highlight, which is the target summary
Aggregated Virtual Patient Model Dataset
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantinos Deltouzos; Konstantinos Deltouzos (2020). Aggregated Virtual Patient Model Dataset [Dataset]. http://doi.org/10.5281/zenodo.2670048
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2670048
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Konstantinos Deltouzos; Konstantinos Deltouzos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is a collection of aggregated clinical parameters for the participants (such as clinical scores), parameters extracted from the utilized devices (such as average heart rate per day, average gait speed etc.), and coupled events about them (such as falls, loss of orientation etc.). It contains information which was collected during the clinical evaluation of the older people from medical experts.This information represents the clinical status of the older person across different domains, e.g. physical, psychological, cognitive etc.

The dataset contains several medical features which are used by clinicians to assess the overall state of the older people.

The purpose of the Virtual Patient Model is to assess the overall state of the older people based on their medical parameters, and to find associations between these parameters and frailty status.

A list of the recorded clinical parameters and their description is shown below:

- part_id: The user ID, which should be a 4-digit number

- q_date: The recording timestamp, which follows the “YYYY-MM-DDTHH:mm:ss.fffZ” format (eg. 14 September 2017 12:23:34.567, is formatted as 2019-09-14T12:23:34.567Z)

- clinical_visit: As several clinical evaluations were performed to each older adult, this number shows for which clinical evaluation these measurements refer to

- fried: Ordinal categorization of frailty level according to Fried operational definition of frailty

- hospitalization_one_year: Number of nonscheduled hospitalizations in the last year

- hospitalization_three_years: Number of nonscheduled hospitalizations in the last three years

- ortho_hypotension: Presence of orthostatic hypotension

- vision: Visual difficulty (qualitative ordinal evaluation)

- audition: Hearing difficulty (qualitative ordinal evaluation)

- weight_loss: Unintentional weight loss >4.5 kg in the past year (categorical answer)

- exhaustion_score: Self-reported exhaustion (categorical answer)

- raise_chair_time: Time in seconds to perform a lower limb strength clinical test

- balance_single: Single foot station (Balance) (categorical answer)

- gait_get_up: Time in seconds to perform the 3meters’ Timed Get Up And Go Test

- gait_speed_4m: Speed for 4 meters’ straight walk

- gait_optional_binary: Gait optional evaluation (qualitative evaluation by the investigator)

- gait_speed_slower: Slowed walking speed (categorical answer)

- grip_strength_abnormal: Grip strength outside the norms (categorical answer)

- low_physical_activity: Low physical activity (categorical answer)

- falls_one_year: Number of falls in the last year

- fractures_three_years: Number of fractures during the last 3 years

- fried_clinician: Fried’s categorization according to clinician’s estimation (when missing data for answering the Fried’s operational frailty definition questionnaire)

- bmi_score: Body Mass Index (in Kg/m²)

- bmi_body_fat: Body Fat (%)

- waist: Waist circumference (in cm)

- lean_body_mass: Lean Body Mass (%)

- screening_score: Mini Nutritional Assessment (MNA) screening score

- cognitive_total_score: Montreal Cognitive Assessment (MoCA) test score

- memory_complain: Memory complain (categorical answer)

- mmse_total_score: Folstein Mini-Mental State Exam score

- sleep: Reported sleeping problems (qualitative ordinal evaluation)

- depression_total_score: 15-item Geriatric Depression Scale (GDS-15)

- anxiety_perception: Anxiety auto-evaluation (visual analogue scale 0-10)

- living_alone: Living Conditions (categorical answer)

- leisure_out: Leisure activities (number of leisure activities per week)

- leisure_club: Membership of a club (categorical answer)

- social_visits: Number of visits and social interactions per week

- social_calls: Number of telephone calls exchanged per week

- social_phone: Approximate time spent on phone per week

- social_skype: Approximate time spent on videoconference per week

- social_text: Number of written messages (SMS and emails) sent by the participant per week

- house_suitable_participant: Subjective suitability of the housing environment according to participant’s evaluation (categorical answer)

- house_suitable_professional: Subjective suitability of the housing environment according to investigator’s evaluation (categorical answer)

- stairs_number: Number of steps to access house (without possibility to use elevator)

- life_quality: Quality of life self-rating (visual analogue scale 0-10)

- health_rate: Self-rated health status (qualitative ordinal evaluation)

- health_rate_comparison: Self-assessed change since last year (qualitative ordinal evaluation)

- pain_perception: Self-rated pain (visual analogue scale 0-10)

- activity_regular: Regular physical activity (ordinal answer)

- smoking: Smoking (categorical answer)

- alcohol_units: Alcohol Use (average alcohol units consumption per week)

- katz_index: Katz Index of ADL score

- iadl_grade: Instrumental Activities of Daily Living score

- comorbidities_count: Number of comorbidities

- comorbidities_significant_count: Number of comorbidities which affect significantly the person’s functional status

- medication_count: Number of active substances taken on a regular basis
c
ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-reminder
Explore at:
Dataset updated
Jun 4, 2025
Description
The Reminder extension for CKAN enhances data management by providing automated email notifications based on dataset expiry dates and update subscriptions. Designed to work with CKAN versions 2.2 and up, but tested on 2.5.2, this extension offers a straightforward mechanism for keeping users informed about dataset updates and expirations, promoting better data governance and engagement. The extension leverages a daily cron job to check expiry dates and trigger emails. Key Features: Data Expiry Notifications: Sends email notifications when datasets reach their specified expiry date. A daily cronjob process determines when to send these emails. Note that failure of the cronjob will prevent email delivery for that day. Dataset Update Subscriptions: Allows users to subscribe to specific datasets to receive notifications upon updates via a subscription form snippet that can be included in dataset templates. Unsubscribe Functionality: Includes an unsubscribe link in each notification email, enabling users to easily manage their subscriptions. Configuration Settings: Supports at least one recipient for reminder emails via configuration settings in the CKAN config file. Bootstrap Styling: Intended for use with Bootstrap 3+ for styling, but may still work with Bootstrap 2 with potential style inconsistencies. Technical Integration: The Reminder extension integrates into CKAN via plugins, necessitating the addition of reminder to the ckan.plugins setting in the CKAN configuration file. The extension requires database initialization using paster commands to support the subscription functionality. Setting up a daily cronjob is necessary for the automated sending of reminder and notification emails. Benefits & Impact: By implementing the Reminder extension, CKAN installations can improve data management and user engagement. Automated notifications ensure that stakeholders are aware of dataset expirations and updates, leading to better data governance, and more active user involvement in data ecosystems. This extension provides an easy-to-implement solution for managing data lifecycles and keeping users informed.
d
JRII-S Dataset
datasets.ai
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+1more
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). JRII-S Dataset [Dataset]. https://datasets.ai/datasets/jrii-s-dataset
Explore at:
Dataset updated
Aug 6, 2024
Dataset authored and provided by
U.S. Environmental Protection Agency
Description
The sonic data within the building array is composed of 26 days of 30-minute average data from 30 sonic anemometers. The unobstructed tower sonic data is also the same, but of the 5 heights of the tower. The data files have 48 columns associated with date and time identifiers as well as meteorological turbulence measurements. This dataset is not publicly accessible because: The data were not collected by EPA and are hosted external to the agency. It can be accessed through the following means: The detailed sonic dataset is freely available to others wishing to perform additional analysis however, it is large and not readily posted. The complete dataset is included in the comprehensive JR II data archive set up by the DHS Science and Technology (S&T) Directorate, Chemical Security Analysis Center (CSAC). To obtain the data, an email request can be sent to JackRabbit@st.dhs.gov. The user can then access the archive on the Homeland Security Information Network (HSIN). Format: The sonic data within the Jack Rabbit II (JRII) mock-urban building array are in 30-minute averaged daily excel files separated by each sonic anemometer with numerous variables. The unobstructed, raw 10Hz tower data are in .dat files and processed into 30-minute average daily csv files by sonic height.

This dataset is associated with the following publication: Pirhalla, M., D. Heist, S. Perry, S. Hanna, T. Mazzola, S.P. Arya, and V. Aneja. Urban Wind Field Analysis from the Jack Rabbit II Special Sonic Anemometer Study. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 243: 14, (2020).
d
Medallion Drivers - Active
catalog.data.gov
data.cityofnewyork.us
+6more
Updated Sep 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Medallion Drivers - Active [Dataset]. https://catalog.data.gov/dataset/medallion-drivers-active
Explore at:
Dataset updated
Sep 27, 2025
Dataset provided by
data.cityofnewyork.us
Description
PLEASE NOTE: This dataset, which includes all TLC Licensed Drivers who are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_medallion_drivers_active.csv This is a list of drivers with a current TLC Driver License, which authorizes drivers to operate NYC TLC licensed yellow and green taxicabs and for-hire vehicles (FHVs). This list is accurate as of the date and time shown in the Last Date Updated and Last Time Updated fields. Questions about the contents of this dataset can be sent by email to: licensinginquiries@tlc.nyc.gov.
2025 Municipal Primary Election Mail Ballot Requests Department of State NO...
data.pa.gov
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of State (2025). 2025 Municipal Primary Election Mail Ballot Requests Department of State NO FURTHER UPDATES [Dataset]. https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2025-Municipal-Primary-Election-Mail-Ballot-Reques/ih4x-yb7a
Explore at:
kmz, csv, application/geo+json, xlsx, xml, kmlAvailable download formats
Dataset updated
Jun 11, 2025
Dataset provided by
United States Department of Statehttp://state.gov/
Authors
Department of State
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
This dataset describes the current state of mail ballot requests for the 2025 Municipal Primary Election. It’s a snapshot in time of the current volume of ballot requests across the Commonwealth. The file contains all mail ballot requests except ballot applications that are declined as duplicate.

This point-in-time transactional data is being published for informational purposes to provide detailed data pertaining to the processing of absentee and mail-in ballots by county election offices. This data is extracted once per day from the Statewide Uniform Registry of Electors (SURE system), and it reflects activity recorded by the counties in the SURE system at the time of the data extraction.

Please note that county election offices will continue to process ballot applications (as applicable), record ballots, reconcile ballot data, and make corrections when necessary, and this will continue through, and even after, Election Day. Administrative practices for recording transactions in the system will vary by county. For example, some counties record individual transactions as they occur, while others record transactions in batches at specific intervals. These activities may result in substantial changes to a county's reported data from one day to the next. County practices also differ on when cancelled ballot data is entered into the database (i.e., before or after the election). Some counties do not enter cancelled ballot data entirely.

Additional notes specific to this dataset: • Counties can enter cancellation codes without entering a ballot returned date. • Some cancellation codes are a result of administrative processes, meaning the ballot was never mailed to the voter before it was cancelled (e.g., there was an error when the label was printed). • Confidential and protected voters are not included in this file. • Counties can only enter one cancel code per ballot, even if there are multiple errors. Different counties may vary in what code they choose to use when this arises, or they may choose to use the catch-all category of 'CANC - OTHER'.

Type of data included in this file: This data includes all mail ballot applications processed by counties, which includes voters on the permanent mail-in and absentee ballot lists. Multiple rows in this data may correspond to the same voter if they submitted more than one application or had a(n) cancelled ballot(s). A deidentified voter ID has been provided to allow data users to identify when rows correspond to the same voter. This ID is randomized and cannot be used to match to SURE, the Full Voter Export, or previous iterations of the Statewide Mail Ballot File. All application types in this file are considered a type of mail ballot. Some of the applications are considered UOCAVA (Uniformed and Overseas Citizens Absentee Voting Act) or UMOVA (Uniform Military and Overseas Voters Act) ballots. These are listed below:

• CRI - Civilian - Remote/Isolated • CVO - Civilian Overseas • F - Federal (Unregistered) • M - Military • MRI - Military - Remote/Isolated • V - Veteran • BV - Bedridden Veteran • BVRI - Bedridden Veteran - Remote/Isolated *We may not have all application types in the file for every election.
App Developer Data | B2B Contact Data for IT Professionals Worldwide | 170M...
datarade.ai
Updated Oct 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2021). App Developer Data | B2B Contact Data for IT Professionals Worldwide | 170M Verified Profiles with Emails & Phone Numbers | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/app-developer-data-b2b-contact-data-for-it-professionals-wo-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 27, 2021
Dataset provided by
Area covered
Lesotho, Syrian Arab Republic, Micronesia (Federated States of), Italy, Liechtenstein, Eritrea, Greece, Anguilla, Senegal, Vanuatu
Description
Success.ai’s B2B Contact Data for IT Professionals Worldwide is an advanced, AI-validated solution designed to help businesses connect with top IT talent and decision-makers globally. With access to over 170 million verified profiles, this dataset includes key contact information such as work emails, phone numbers, and additional professional details, ensuring you can easily engage with IT leaders and specialists across various industries.

Our comprehensive data is continually updated to ensure accuracy, relevance, and compliance with global standards. Whether you're looking to expand your network, enhance lead generation, or improve recruitment processes, Success.ai’s IT professional database is designed to meet the evolving needs of your business.

Key Features of Success.ai’s IT Professional Contact Data

Global Coverage Across the IT Industry Success.ai offers a diverse range of IT professionals, including but not limited to:

Software Engineers & Developers: Specialists in coding, programming, and software development. IT Managers & Directors: Decision-makers responsible for IT infrastructure and strategy. Systems Administrators: Experts managing system installations, configurations, and troubleshooting. Cloud Computing Specialists: Professionals focused on cloud storage and infrastructure services. Cybersecurity Experts: IT professionals safeguarding data and systems from cyber threats. IT Consultants & Analysts: Advisers providing strategic recommendations on technology improvements.

This dataset spans 170M+ verified profiles across more than 250 countries, ensuring you reach the right IT professionals, wherever they are.

Verified and Continuously Updated Data

99% Accuracy: Data is AI-validated to ensure that you are reaching the right contacts with accurate, up-to-date information. Real-Time Updates: Success.ai’s dataset is constantly refreshed, ensuring that the information you receive is always relevant and timely. Global Compliance: Our data collection adheres to GDPR, CCPA, and other data privacy standards, ensuring that your outreach practices are ethical and compliant.

Customizable Data Solutions Success.ai provides multiple delivery methods to suit your business needs:

API Integration: Seamlessly integrate our data into your CRM, marketing automation, or lead-generation systems for real-time updates. Custom Flat Files: Receive highly targeted and segmented datasets, preformatted to your specifications, making integration easy.

Why Choose Success.ai’s IT Professional Contact Data?

Best Price Guarantee We offer the most competitive pricing in the industry, ensuring you get exceptional value for high-quality, verified contact data.

Targeted Outreach to IT Professionals Our comprehensive dataset is perfect for precision targeting, making it easier to connect with key IT professionals. With detailed profiles, including work emails and phone numbers, you can engage with decision-makers directly and increase the efficiency of your campaigns.

Strategic Use Cases

Lead Generation: Use our verified contact information to target IT decision-makers and specialists for your lead generation campaigns. Sales Outreach: Reach out to key IT managers, directors, and consultants to promote your product or service and close high-value deals. Recruitment: Source top-tier IT talent with verified contact data for software developers, network administrators, and IT executives. Marketing Campaigns: Run hyper-targeted marketing campaigns for IT professionals globally to promote tech services, job openings, or industry innovations. Business Expansion: Use data-driven insights to expand your global outreach, identifying opportunities and building relationships in untapped markets.

Key Data Highlights

170M+ Verified Profiles of IT professionals worldwide, covering a wide range of roles and industries. 50M Work Emails to help you reach the right IT contacts. 30M Company Profiles with insights on the organizations that these professionals represent. 700M+ LinkedIn Professional Profiles globally, enhancing your ability to access verified IT contacts across various platforms.

Powerful APIs for Enhanced Functionality

Enrichment API Keep your data up to date with our Enrichment API, providing real-time enrichment of your existing contact database. Perfect for businesses that want to maintain accurate and current information about their leads and customers.

Lead Generation API Maximize your lead generation campaigns by accessing Success.ai’s vast and verified dataset, which includes work emails and phone numbers for IT professionals worldwide. Our API supports up to 860,000 API calls per day, ensuring scalability for large enterprises.

Use Cases for IT Professional Contact Data

Lead Generation for IT Solutions Target IT decision-makers, software developers, and cybersecuri...
Lead Scoring Dataset
kaggle.com
zip
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrita Chatterjee (2020). Lead Scoring Dataset [Dataset]. https://www.kaggle.com/amritachatterjee09/lead-scoring-dataset
Explore at:
zip(411028 bytes)Available download formats
Dataset updated
Aug 17, 2020
Authors
Amrita Chatterjee
Description
Context

An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.

The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.

Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.

There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.

X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.

Content

Variables Description * Prospect ID - A unique ID with which the customer is identified. * Lead Number - A lead number assigned to each lead procured. * Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc. * Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc. * Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not. * Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not. * Converted - The target variable. Indicates whether a lead has been successfully converted or not. * TotalVisits - The total number of visits made by the customer on the website. * Total Time Spent on Website - The total time spent by the customer on the website. * Page Views Per Visit - Average number of pages on the website viewed during the visits. * Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc. * Country - The country of the customer. * Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form. * How did you hear about X Education - The source from which the customer heard about X Education. * What is your current occupation - Indicates whether the customer is a student, umemployed or employed. * What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course. * Search - Indicating whether the customer had seen the ad in any of the listed items. * Magazine
* Newspaper Article * X Education Forums
* Newspaper * Digital Advertisement * Through Recommendations - Indicates whether the customer came in through recommendations. * Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses. * Tags - Tags assigned to customers indicating the current status of the lead. * Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead. * Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content. * Get updates on DM Content - Indicates whether the customer wants updates on the DM Content. * Lead Profile - A lead level assigned to each customer based on their profile. * City - The city of the customer. * Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile * Asymmetric Profile Index * Asymmetric Activity Score * Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not. * a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not. * Last Notable Activity - The last notable activity performed by the student.

Acknowledgements

UpGrad Case Study

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
t
2.17 311 Email Response Times (summary)
data-academy.tempe.gov
performance.tempe.gov
+5more
Updated Nov 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2019). 2.17 311 Email Response Times (summary) [Dataset]. https://data-academy.tempe.gov/datasets/2-17-311-email-response-times-summary/about
Explore at:
Dataset updated
Nov 15, 2019
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Customer Relations Center (CRC) or Tempe 311 is often the first and possibly only contact a resident has with the City. Our goal is to make each interaction as smooth and efficient as possible. Our goal is to achieve a response rate to the Tempe 311 inbox messages (emails to 311, voicemails, emails from tempe.gov, work requests) of less than or equal to 1 business day for 90% of inquiries.This page provides data for the 311 Email Response Time performance measure.The performance measure dashboard is available at 2.17 311 Email Response Time.Additional InformationSource: tempe.gov and Accela CRMContact: Moncayo, KimContact E-Mail: Kim_Moncayo@tempe.govData Source Type: Preparation Method: All emails or voice messages from the Tempe311 inbox are entered into Accela CRM as a work order or a request. An excel report is pulled from Accela CRM of all Tempe311 Inbox entries and then checked and verified for response time.Publish Frequency: QuarterlyPublish Method: ManualData Dictionary
2024 General Election Mail Ballot Requests Department of State NO FURTHER...
data.pa.gov
Updated Apr 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of State (2025). 2024 General Election Mail Ballot Requests Department of State NO FURTHER UPDATES [Dataset]. https://data.pa.gov/w/3q5t-ddp8/33ch-zxdi?cur=ZSKjlCOEHml
Explore at:
application/geo+json, kmz, kml, xml, xlsx, csvAvailable download formats
Dataset updated
Apr 14, 2025
Dataset provided by
United States Department of Statehttp://state.gov/
Authors
Department of State
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
This dataset describes the current state of mail ballot requests for the 2024 General Election. It’s a snapshot in time of the current volume of ballot requests across the Commonwealth. The file contains all mail ballot requests except ballot applications that are declined as duplicate. The final version of this data was pulled on 1/24/2025.

This point-in-time transactional data is being published for informational purposes to provide detailed data pertaining to the processing of absentee and mail-in ballots by county election offices. This data is extracted once per day from the Statewide Uniform Registry of Electors (SURE system), and it reflects activity recorded by the counties in the SURE system at the time of the data extraction.

Please note that county election offices will continue to process ballot applications (as applicable), record ballots, reconcile ballot data, and make corrections when necessary, and this will continue through, and even after, Election Day. Administrative practices for recording transactions in the system will vary by county. For example, some counties record individual transactions as they occur, while others record transactions in batches at specific intervals. These activities may result in substantial changes to a county's reported data from one day to the next. County practices also differ on when cancelled ballot data is entered into the database (i.e., before or after the election). Some counties do not enter cancelled ballot data entirely.

Additional notes specific to this dataset: • Counties can enter cancellation codes without entering a ballot returned date. • Some cancellation codes are a result of administrative processes, meaning the ballot was never mailed to the voter before it was cancelled (e.g., there was an error when the label was printed). • Confidential and protected voters are not included in this file. • Counties can only enter one cancel code per ballot, even if there are multiple errors. Different counties may vary in what code they choose to use when this arises, or they may choose to use the catch-all category of 'CANC - OTHER'.

Type of data included in this file: This data includes all mail ballot applications processed by counties, which includes voters on the permanent mail-in and absentee ballot lists. Multiple rows in this data may correspond to the same voter if they submitted more than one application or had a(n) cancelled ballot(s). A deidentified voter ID has been provided to allow data users to identify when rows correspond to the same voter. This ID is randomized and cannot be used to match to SURE, the Full Voter Export, or previous iterations of the Statewide Mail Ballot File. All application types in this file are considered a type of mail ballot. Some of the applications are considered UOCAVA (Uniformed and Overseas Citizens Absentee Voting Act) or UMOVA (Uniform Military and Overseas Voters Act) ballots. These are listed below:

• CRI - Civilian - Remote/Isolated • CVO - Civilian Overseas • F - Federal (Unregistered) • M - Military • MRI - Military - Remote/Isolated • V - Veteran • BV - Bedridden Veteran • BVRI - Bedridden Veteran - Remote/Isolated *We may not have all application types in the file for every election.
Price Paid Data
gov.uk
Updated Sep 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HM Land Registry (2025). Price Paid Data [Dataset]. https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
Explore at:
Dataset updated
Sep 29, 2025
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
HM Land Registry
Description
Our Price Paid Data includes information on all property sales in England and Wales that are sold for value and are lodged with us for registration.

Get up to date with the permitted use of our Price Paid Data:
check what to consider when using or publishing our Price Paid Data

Using or publishing our Price Paid Data

If you use or publish our Price Paid Data, you must add the following attribution statement:

Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.

Price Paid Data is released under the http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/">Open Government Licence (OGL). You need to make sure you understand the terms of the OGL before using the data.

Under the OGL, HM Land Registry permits you to use the Price Paid Data for commercial or non-commercial purposes. However, OGL does not cover the use of third party rights, which we are not authorised to license.

Price Paid Data contains address data processed against Ordnance Survey’s AddressBase Premium product, which incorporates Royal Mail’s PAF® database (Address Data). Royal Mail and Ordnance Survey permit your use of Address Data in the Price Paid Data:

for personal and/or non-commercial use

to display for the purpose of providing residential property price information services

If you want to use the Address Data in any other way, you must contact Royal Mail. Email address.management@royalmail.com.

Address data

The following fields comprise the address data included in Price Paid Data:

Postcode

PAON Primary Addressable Object Name (typically the house number or name)

SAON Secondary Addressable Object Name – if there is a sub-building, for example, the building is divided into flats, there will be a SAON

Street

Locality

Town/City

District

County

August 2025 data (current month)

The August 2025 release includes:

the first release of data for August 2025 (transactions received from the first to the last day of the month)

updates to earlier data releases

Standard Price Paid Data (SPPD) and Additional Price Paid Data (APPD) transactions

As we will be adding to the August data in future releases, we would not recommend using it in isolation as an indication of market or HM Land Registry activity. When the full dataset is viewed alongside the data we’ve previously published, it adds to the overall picture of market activity.

Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.

Google Chrome (Chrome 88 onwards) is blocking downloads of our Price Paid Data. Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.

We update the data on the 20th working day of each month. You can download the:

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-monthly-update-new-version.csv">current month as a CSV file (CSV, 18.5MB)

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-monthly-update.txt">current month as a text file (TXT, 17.9MB)

Single file

These include standard and additional price paid data transactions received at HM Land Registry from 1 January 1995 to the most current monthly data.

Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.

The data is updated monthly and the average size of this file is 3.7 GB, you can download:

http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.txt">the complete Price Paid T
e
Folkomröstningsundersökningen 1957 - Dataset - B2FIND
b2find.eudat.eu
Updated Nov 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Folkomröstningsundersökningen 1957 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/0ad32828-86fb-5075-ba6e-9c07e51eb349
Explore at:
Dataset updated
Nov 11, 2024
Description
Since 1956 a election study have been carried out in conjunction with every parliamentary election in Sweden. Likewise studies have been carried out in conjunction with the two referenda that have taken place since then. In 1957 a referendum on the general supplementary pension scheme (ATP) took place. The respondents were questioned three times, two interviews were held before the day of the referendum and a mail survey were sent to them after the referendum. The study contains questions on the general supplementary pension scheme, national basic pension, opinion of the three proposals, sources of information, and newspaper reading. Apart from the questions concerning the referendum, the study examines Swedish national defence by including questions about the war risk and the atomic bomb. Purpose: Explain why people vote as they do and why an election ends in a particular way. Track and follow trends in the Swedish electoral democracy and make comparisons with other countries. Since 1956 a election study have been carried out in conjunction with every parliamentary election in Sweden. Likewise studies have been carried out in conjunction with the two referenda that have taken place since then. In 1957 a referendum on the general supplementary pension scheme (ATP) took place. The respondents were questioned three times, two interviews were held before the day of the referendum and a mail survey were sent to them after the referendum. The study contains questions on the general supplementary pension scheme, national basic pension, opinion of the three proposals, sources of information, and newspaper reading. Apart from the questions concerning the referendum, the study examines Swedish national defense by including questions about the war risk and the atomic bomb. Syfte: Förklara varför väljare röstar som de gör och varför val slutar som de gör. Spåra och följa trender i svensk valdemokrati och göra jämförelser med utvecklingen i andra länder.
Newspapers-Indian Daily Mail-1946 to 1947
data.gov.sg
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library Board (2024). Newspapers-Indian Daily Mail-1946 to 1947 [Dataset]. https://data.gov.sg/datasets/d_434d294555cbb371da63e9770d5b4ca1/view
Explore at:
Dataset updated
Jun 6, 2024
Dataset authored and provided by
National Library Boardhttp://www.nlb.gov.sg/
License
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Time period covered
Feb 2024 - Feb 2025
Area covered
India
Description
Dataset from National Library Board. For more information, visit https://data.gov.sg/datasets/d_434d294555cbb371da63e9770d5b4ca1/view
h
custom_summarization_dataset
huggingface.co
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junseong Park (2024). custom_summarization_dataset [Dataset]. https://huggingface.co/datasets/rasauq1122/custom_summarization_dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 16, 2024
Authors
Junseong Park
Description
Dataset Card for Custom Text Dataset

Dataset Name

Custom Text Dataset

Overview

This dataset contains text data for training summarization models. The data is collected from CNN/daily mail.

Composition

Number of records: 100 Fields: text, label

Collection Process

CNN/daily mail

Preprocessing

nothing

How to Use

from datasets import load_dataset dataset = load_dataset("path_to_dataset")

for example in… See the full description on the dataset page: https://huggingface.co/datasets/rasauq1122/custom_summarization_dataset.
d
CustomWeather API | Severe Weather Data | Global Severe Weather Advisories...
datarade.ai
.json, .xml, .csv
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CustomWeather (2020). CustomWeather API | Severe Weather Data | Global Severe Weather Advisories For 85,000 Weather Forecast Locations | Storm Data [Dataset]. https://datarade.ai/data-products/global-severe-weather-advisories-customweather
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Oct 22, 2020
Dataset authored and provided by
CustomWeather
Area covered
Japan, Sweden, Antarctica, Cambodia, Western Sahara, Cabo Verde, Serbia, Timor-Leste, Pakistan, Korea (Democratic People's Republic of)
Description
Features reports based on forecast severe weather conditions. such as high winds, blizzard conditions, possible severe thunderstorms, hurricane conditions, and heavy snow. Advisories are available for all 85,000 worldwide locations in CustomWeather’s global weather database. The severe weather data advisories are updated four times per day.

The product returns severe weather advisories based on the forecast for the next six or 24 hours or five days broken down into segments of the day (Morning, Afternoon, Evening, Overnight). Global Weather Data.

Custom alerts can be generated for any specific weather criteria, either in the past based on climate data or in the future based on weather forecasts. Weather alerts can also be generated that incorporate both past and future weather data. CustomWeather's custom alerts can be sent out via email to specific user groups, via FTP, or via SMS using carrier email-to-SMS transmission.

This severe weather data represents a part of CustomWeather's trove of historical, real-time, and forecast data sets covering the entire life cycle of weather - past, present, and future.

The Global Severe Weather Advisories includes information included in the following data categories: Environmental Data, Event Data, Geographic Data, Global Weather Data, Insurance Data, Lightning Data, Natural Disasters Data, News Data, Places Data, Precipitation Data, Rainfall Data, Severe Weather Data, Storm Data, Temperature Data, and Wind Data.

The backbone of CustomWeather's forecasting arm is our proven, high-resolution model, the CustomWeather 100 or CW100. The CW100 Model is based on physics, not statistics or airport observations. As a result, it can achieve significantly better accuracy than statistical models, especially for non-airport locations. While other forecast models are designed to forecast the entire atmosphere, the CW100 greatly reduces computational requirements by focusing entirely on conditions near the ground. This reduction of computations allows the model to resolve additional physical processes near the ground that are not resolved by other models. It also allows the CW100 to operate at a much higher resolution, typically 100x finer than standard models and other gridded forecasts.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2020). Enron Email Time-Series Network [Dataset]. http://doi.org/10.5281/zenodo.1342353

Enron Email Time-Series Network

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1342353

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

To build a graph G = (V, E), we use email addresses as nodes V. Every node v_i has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge e_ij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

'id-email.csv' is a file containing the actual email addresses.

Clear search

Close search

Google apps

Main menu

Enron Email Time-Series Network

cnn_dailymail

The total number of mailboxes and number of active mailboxes every day

Email CTR Prediction

Global Domain Name Data | DNS and Risk Classification via Dataset & API |...

cnn_dailymail

Aggregated Virtual Patient Model Dataset

ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta

JRII-S Dataset

Medallion Drivers - Active

2025 Municipal Primary Election Mail Ballot Requests Department of State NO...

App Developer Data | B2B Contact Data for IT Professionals Worldwide | 170M...

Lead Scoring Dataset

Context

Content

Acknowledgements

Inspiration

2.17 311 Email Response Times (summary)

2024 General Election Mail Ballot Requests Department of State NO FURTHER...

Price Paid Data

Using or publishing our Price Paid Data

Address data

August 2025 data (current month)

Single file

Folkomröstningsundersökningen 1957 - Dataset - B2FIND

Newspapers-Indian Daily Mail-1946 to 1947

custom_summarization_dataset

CustomWeather API | Severe Weather Data | Global Severe Weather Advisories...

Enron Email Time-Series Network