CNN/Daily Mail is a dataset for text summarization. Human generated abstractive summary bullets were generated from news stories in CNN and Daily Mail websites as questions (with one of the entities hidden), and stories as the corresponding passages from which the system is expected to answer the fill-in the-blank question. The authors released the scripts that crawl, extract and generate pairs of passages and questions from these websites.
In all, the corpus has 286,817 training pairs, 13,368 validation pairs and 11,487 test pairs, as defined by their scripts. The source documents in the training set have 766 words spanning 29.74 sentences on an average while the summaries consist of 53 words and 3.72 sentences.
This dataset provides Customer Service Satisfaction results from the Annual Community Survey. The survey questions assess satisfaction with overall customer service for inpiduals who had contacted the city in the past year. For years where there are multiple questions related to overall customer service and treatment, the average of those responses are providing in the summary dataset and the values for each question are provided in the detailed dataset.For years 2010-2014, respondents were first asked "Have you contacted the city in the past year?". If they answered that they had contacted the city, then they were asked additional questions about their experience. The "number of respondents" field represents the number of people who answered yes to the contact question.Responses of "don't know" are not included in this dataset, but can be found in the dataset for the entire Community Survey. A survey was not completed for 2015 (99999 indicates no recorded data).Due to changes in the survey questions, this dataset was last updated in 2017 and may not be updated again. The performance measure dashboard is available at 2.02 Customer Service Satisfaction.Additional InformationSource: Community Attitude SurveyContact: Wydale HolmesContact E-Mail: Wydale_Holmes@tempe.govData Source Type: Excel and PDFPreparation Method: Extracted from Annual Community Survey resultsPublish Frequency: AnnualPublish Method: ManualData Dictionary
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for CNN Dailymail Dataset
Dataset Summary
The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.
Supported Tasks and Leaderboards
'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.
The sonic data within the building array is composed of 26 days of 30-minute average data from 30 sonic anemometers. The unobstructed tower sonic data is also the same, but of the 5 heights of the tower. The data files have 48 columns associated with date and time identifiers as well as meteorological turbulence measurements. This dataset is not publicly accessible because: The data were not collected by EPA and are hosted external to the agency. It can be accessed through the following means: The detailed sonic dataset is freely available to others wishing to perform additional analysis however, it is large and not readily posted. The complete dataset is included in the comprehensive JR II data archive set up by the DHS Science and Technology (S&T) Directorate, Chemical Security Analysis Center (CSAC). To obtain the data, an email request can be sent to JackRabbit@st.dhs.gov. The user can then access the archive on the Homeland Security Information Network (HSIN). Format: The sonic data within the Jack Rabbit II (JRII) mock-urban building array are in 30-minute averaged daily excel files separated by each sonic anemometer with numerous variables. The unobstructed, raw 10Hz tower data are in .dat files and processed into 30-minute average daily csv files by sonic height. This dataset is associated with the following publication: Pirhalla, M., D. Heist, S. Perry, S. Hanna, T. Mazzola, S.P. Arya, and V. Aneja. Urban Wind Field Analysis from the Jack Rabbit II Special Sonic Anemometer Study. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 243: 14, (2020).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The total number of user mailboxes in Umeå kommun and how many are active each day of the reporting period. A mailbox is considered active if the user sent or read any email.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. The data has been made public and presents a diverse set of email information ranging from internal, marketing emails to spam and fraud attempts.
In the early 2000s, Leslie Kaelbling at MIT purchased the dataset and noted that, though the dataset contained scam emails, it also had several integrity problems. The dataset was updated later, but it becomes key to ensure privacy in the data while it is used to train a deep neural network model.
Though the Enron Email Dataset contains over 500K emails, one of the problems with the dataset is the availability of labeled frauds in the dataset. Label annotation is done to detect an umbrella of fraud emails accurately. Since, fraud emails fall into several types such as Phishing, Financial, Romance, Subscription, and Nigerian Prince scams, there have to be multiple heuristics used to label all types of fraudulent emails effectively.
To tackle this problem, heuristics have been used to label the Enron data corpus using email signals, and automated labeling has been performed using simple ML models on other smaller email datasets available online. These fraud annotation techniques are discussed in detail below.
To perform fraud annotation on the Enron dataset as well as provide more fraud examples for modeling, two more fraud data sources have been used, Phishing Email Dataset: https://www.kaggle.com/dsv/6090437 Social Engineering Dataset: http://aclweb.org/aclwiki
To label the Enron email dataset two signals are used to filter suspicious emails and label them into fraud and non-fraud classes. Automated ML labeling Email Signals
The following heuristics are used to annotate labels for Enron email data using the other two data sources,
Phishing Model Annotation: A high-precision SVM model trained on the Phishing mails dataset, which is used to annotate the Phishing Label on the Enron Dataset.
Social Engineering Model Annotation: A high-precision SVM model trained on the Social Engineering mails dataset, which is used to annotate the Social Engineering Label on the Enron Dataset.
The two ML Annotator models use Term Frequency Inverse Document Frequency (TF-IDF) to embed the input text and make use of SVM models with Gaussian Kernel.
If either of the models predicted that an email was a fraud, the mail metadata was checked for several email signals. If these heuristics meet the requirements of a high-probability fraud email, we label it as a fraud email.
Email Signal-based heuristics are used to filter and target suspicious emails for fraud labeling specifically. The signals used were,
Person Of Interest: There is a publicly available list of email addresses of employees who were liable for the massive data leak at Enron. These user mailboxes have a higher chance of containing quality fraud emails.
Suspicious Folders: The Enron data is dumped into several folders for every employee. Folders consist of inbox, deleted_items, junk, calendar, etc. A set of folders with a higher chance of containing fraud emails, such as Deleted Items and Junk.
Sender Type: The sender type was categorized as ‘Internal’ and ‘External’ based on their email address.
Low Communication: A threshold of 4 emails based on the table below was used to define Low Communication. A user qualifies as a Low-Comm sender if their emails are below this threshold. Mails sent from low-comm senders have been assigned with a high probability of being a fraud.
Contains Replies and Forwards: If an email contains forwards or replies, a low probability was assigned for it to be a fraud email.
To ensure high-quality labels, the mismatch examples from ML Annotation have been manually inspected for Enron dataset relabeling.
Fraud | Non-Fraud |
---|---|
2327 | 445090 |
Enron Dataset Title: Enron Email Dataset URL: https://www.cs.cmu.edu/~enron/ Publisher: MIT, CMU Author: Leslie Kaelbling, William W. Cohen Year: 2015
Phishing Email Detection Dataset Title: Phishing Email Detection URL: https://www.kaggle.com/dsv/6090437 DOI: 10.34740/KAGGLE/DSV/6090437 Publisher: Kaggle Author: Subhadeep Chakraborty Year: 2023
CLAIR Fraud Email Collection Title: CLAIR collection of fraud email URL: http://aclweb.org/aclwiki Author: Radev, D. Year: 2008
List of the data tables as part of the Immigration System Statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.
If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.
Immigration system statistics, year ending March 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/68258d71aa3556876875ec80/passenger-arrivals-summary-mar-2025-tables.xlsx">Passenger arrivals summary tables, year ending March 2025 (MS Excel Spreadsheet, 66.5 KB)
‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.
https://assets.publishing.service.gov.uk/media/681e406753add7d476d8187f/electronic-travel-authorisation-datasets-mar-2025.xlsx">Electronic travel authorisation detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 56.7 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality
ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality
https://assets.publishing.service.gov.uk/media/68247953b296b83ad5262ed7/visas-summary-mar-2025-tables.xlsx">Entry clearance visas summary tables, year ending March 2025 (MS Excel Spreadsheet, 113 KB)
https://assets.publishing.service.gov.uk/media/682c4241010c5c28d1c7e820/entry-clearance-visa-outcomes-datasets-mar-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 29.1 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome
Additional dat
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Medallion Drivers - Active’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/1c75e7ca-9626-4f3b-b18e-39c1efbc7f11 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
PLEASE NOTE: This dataset, which includes all TLC Licensed Drivers who are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_medallion_drivers_active.csv
This is a list of drivers with a current TLC Driver License, which authorizes drivers to operate NYC TLC licensed yellow and green taxicabs and for-hire vehicles (FHVs). This list is accurate as of the date and time shown in the Last Date Updated and Last Time Updated fields. Questions about the contents of this dataset can be sent by email to: licensinginquiries@tlc.nyc.gov.
--- Original source retains full ownership of the source dataset ---
DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.
The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.
DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.
Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email validation applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.
This dataset contains the Project Site Inventories from the Hazard Mitigation Assistance (HMA) subapplications/subgrants from the FEMA Grants Outcomes (FEMA GO) system (FEMA’s new grants management system). FEMA GO started accepting Flood Mitigation Assistance (FMA) and Building Resilient Infrastructure and Communities (BRIC) subapplications in Fiscal Year 2020. FEMA GO is projected to support the Hazard Mitigation Grant Program (HMGP) in Calendar Year 2023. For details on HMA Project Site Inventories not captured in FEMA GO, visit https://www.fema.gov/openfema-data-page/hazard-mitigation-assistance-mitigated-properties-v3.rnrnThis dataset contains information on the Project Site Inventories identified in the HMA subapplications/subgrants that have been submitted to or awarded in FEMA GO, as well as amendments made to the awarded subgrants. The Project Site Inventory contains information regarding the Building, Infrastructure/Utility/other, and/or Vacant Land proposed to be mitigated by the subapplication/subgrant. Sensitive information, such as Personally Identifiable Information (PII), has been removed to protect privacy. The information in this dataset has been deemed appropriate for publication to empower public knowledge of mitigation activities and the nature of HMA grant programs. For more information on the HMA grant programs, visit: https://www.fema.gov/grants/mitigation. For more information on FEMA GO, visit: https://www.fema.gov/grants/guidance-tools/fema-go.rnrnThis dataset comes from the source system mentioned above and is subject to a small percentage of human error. In some cases, data was not provided by the subapplicant, applicant, and/or entered into FEMA GO. Due to the voluntary nature of the Hazard Mitigation Assistance Programs, not all Project Site Inventory in this dataset will be mitigated. As FEMA GO continues development, additional fields may be added to this dataset to indicate the final status of individual inventory. This dataset is not intended to be used for any official federal financial reporting.rnFEMA's terms and conditions and citation requirements for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page: https://www.fema.gov/about/openfema/terms-conditions.rnrnFor answers to Frequently Asked Questions (FAQs) about the OpenFEMA program, API, and publicly available datasets, please visit: https://www.fema.gov/about/openfema/faq.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.
PLEASE NOTE: This dataset, which includes all TLC Licensed Drivers who are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_medallion_drivers_active.csv This is a list of drivers with a current TLC Driver License, which authorizes drivers to operate NYC TLC licensed yellow and green taxicabs and for-hire vehicles (FHVs). This list is accurate as of the date and time shown in the Last Date Updated and Last Time Updated fields. Questions about the contents of this dataset can be sent by email to: licensinginquiries@tlc.nyc.gov.
The Reminder extension for CKAN enhances data management by providing automated email notifications based on dataset expiry dates and update subscriptions. Designed to work with CKAN versions 2.2 and up, but tested on 2.5.2, this extension offers a straightforward mechanism for keeping users informed about dataset updates and expirations, promoting better data governance and engagement. The extension leverages a daily cron job to check expiry dates and trigger emails. Key Features: Data Expiry Notifications: Sends email notifications when datasets reach their specified expiry date. A daily cronjob process determines when to send these emails. Note that failure of the cronjob will prevent email delivery for that day. Dataset Update Subscriptions: Allows users to subscribe to specific datasets to receive notifications upon updates via a subscription form snippet that can be included in dataset templates. Unsubscribe Functionality: Includes an unsubscribe link in each notification email, enabling users to easily manage their subscriptions. Configuration Settings: Supports at least one recipient for reminder emails via configuration settings in the CKAN config file. Bootstrap Styling: Intended for use with Bootstrap 3+ for styling, but may still work with Bootstrap 2 with potential style inconsistencies. Technical Integration: The Reminder extension integrates into CKAN via plugins, necessitating the addition of reminder to the ckan.plugins setting in the CKAN configuration file. The extension requires database initialization using paster commands to support the subscription functionality. Setting up a daily cronjob is necessary for the automated sending of reminder and notification emails. Benefits & Impact: By implementing the Reminder extension, CKAN installations can improve data management and user engagement. Automated notifications ensure that stakeholders are aware of dataset expirations and updates, leading to better data governance, and more active user involvement in data ecosystems. This extension provides an easy-to-implement solution for managing data lifecycles and keeping users informed.
Can I change my flight with Delta Airlines without penalty? Each year, more than 60% of Delta flyers need to modify their travel plans. Good news—you can change flights. Call ☎️+1 (877) 443-8285 now for support. Delta eliminated most change fees on Main Cabin and higher fares. Basic Economy, however, usually isn't eligible for free changes—confirm details at ☎️+1 (877) 443-8285 before making any moves.
Changes must be made before departure, preferably at least 24 hours in advance. You can alter dates, times, or even destinations, depending on availability. If there’s a fare difference, you may need to pay the excess. But no change penalty applies to most routes. Call ☎️+1 (877) 443-8285 to see if your booking qualifies.
Delta also allows same-day flight changes for a reduced fee, or sometimes free for Medallion members. Using the Fly Delta app can help, but nothing beats speaking directly with an agent at ☎️+1 (877) 443-8285 for personalized help. They’ll inform you about seat availability and travel class adjustments. Always act quickly to secure the best rebooking options. Don’t guess—contact ☎️+1 (877) 443-8285 to make flight changes smoothly and with confidence. How long does it take to get a confirmation email after booking a flight? Over 80% of travelers receive their Delta confirmation email within minutes of booking. If delayed, contact ☎️+1 (877) 443-8285 immediately for assistance. Normally, emails are sent instantly, but technical issues, spam filters, or incorrect addresses may cause a delay. If your inbox is empty, check your junk or promotions folder, then call ☎️+1 (877) 443-8285 to confirm booking status.
Most confirmation emails contain your e-ticket number, itinerary, and important travel instructions. If booked through a third-party site, delays can extend up to 2 hours. For peace of mind, always verify with Delta directly at ☎️+1 (877) 443-8285. If no email appears after that time, your booking may not have processed correctly.
It’s helpful to have your payment method, name, and travel dates ready when calling. If the flight was booked close to departure time, the confirmation might take longer due to system syncing. Rechecking using your SkyMiles login is another method, but for guaranteed status, ☎️+1 (877) 443-8285 offers the most accurate and updated information. Do not wait until the travel date—confirm everything early. When in doubt, rely on ☎️+1 (877) 443-8285 for all booking concerns and confirmation support.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the author gender dataset (as a comma-delimited .csv file) originally created in association with the paper entitled 'The Impact of Gender on Conference Authorship in Audio Engineering: Analysis Using a New Data Collection Method', but since extended to include conferences up to the end of 2019. The original dataset is available at: https://doi.org/10.5281/zenodo.1249693. Please cite both the paper and the relevant dataset if used. Visualisation is available at: http://tibbakoi.github.io/aesgender.
The dataset was produced using a novel method which used self-identified pronouns, therefore allowing for as many groups as necessary to describe the population.
A list of authors was generated from conference proceedings.
An email was sent to each author to acquire their pronoun.
If no email was available/no response was received, a pronoun was acquired from a biography.
If no biography was available, a pronoun was inferred from traditional gender markers and gender presentation.
If no gender marker/photograph was available, the entry was labelled as 'Information Unavailable'. For brevity, the label 'Unknown' is used in the paper.
The columns in the dataset are as follows:
ID: unique identifier of entry
Pronoun: pronoun of entry
Position (abs): numerical absolute position within author list for entry
Position (relative): relative position within author list for entry (either First, Last, or Middle)
Single/multi-author: whether the publication for that entry has a single author or has multiple authors (single author publications are excluded from author position analysis)
Conference: Full conference name of entry
Topic: Topic of conference of entry, taken from conference name
Year: Year of conference of entry
Type: Type of publication for that entry as listed on the online conference proceedings
Grouped Type: Grouping of publication types for that entry for easier analysis due to inconsistencies in online conference proceedings (groups are: workshop, poster, paper, panel, keynote, invited speaker, invited paper, demo)
Inc. for author pos?: True/False as to whether to include the entry for analysis over author position (included types are: paper, invited paper, poster (all with multiple authors) as these have meaningful author orders)
Inc. for single/multi-author?: True/False as to whether to include the entry for analysis over single/multi author (includes types are: paper, invited paper, poster as these have meaningful author orders)
Invited paper status: Grouping of the types to allow statistical analysis over invited vs non-invited types (invited types are: invited speaker, invited paper, keynote, panel. Non-invited types are: poster, paper, demo, workshop)
NB: Some grouping of the data is required as online conference proceedings are not always consistent (Column 10). Some labelling of the data is required to determine which entries to include in certain types of analysis (Columns 11-13).
This dataset is distributed in the hopes that it will prove useful under the Creative Commons Attribution 4.0, with no warranty; or the implied warranty of merchantability or fitness for a particular problem.
Dataset curated by: Kat Young and Michael Lovedee-Turner, formerly at the AudioLab, Dept. of Electronic Engineering, University of York. Contact: kathryn.ae.young@gmail.com
To communicate with Expedia, start by dialing ☎️+1 (888) 714-9824, their main customer support number, available for booking help, cancellations, or general inquiries. ☎️+1 (888) 714-9824 is the quickest way to speak directly with a live agent who can provide personalized assistance. You can also reach Expedia through their website’s chat feature, but if you want faster, direct communication, calling ☎️+1 (888) 714-9824 is recommended.
Expedia offers multiple channels for communication. ☎️+1 (888) 714-9824 is their dedicated phone number for reservations, cancellations, or issues with your travel bookings. You can also send messages through your Expedia account’s “Manage Trips” section, where direct messaging with support is sometimes available. For urgent concerns, though, calling ☎️+1 (888) 714-9824 ensures your request is prioritized.
Email communication is another way to reach Expedia, but response times vary. ☎️+1 (888) 714-9824 is the better option if you need immediate clarification or resolution. If you choose email or social media, make sure to include your booking reference and detailed information to speed up the process. For complex issues, calling ☎️+1 (888) 714-9824 directly usually leads to faster outcomes.
Expedia’s mobile app also offers communication options. ☎️+1 (888) 714-9824 can be accessed through the app’s help section, providing call-back services or live chat. The app’s interface lets you manage your bookings and connect with customer service without waiting on hold, but for complex requests, dialing ☎️+1 (888) 714-9824 remains the most effective.
In summary, the best way to communicate to Expedia is by calling ☎️+1 (888) 714-9824, especially for urgent or complicated issues. Use the website chat, email, or app features as secondary options, but always keep ☎️+1 (888) 714-9824 handy for the fastest service.
Is Expedia Customer Service 24 Hours? Yes, Expedia customer service operates 24 hours a day, and you can reach them anytime by calling ☎️+1 (888) 714-9824, their official support hotline. This round-the-clock availability ensures travelers around the world can get help no matter the time zone or urgency. Whether you’re booking a last-minute flight or need assistance during a late-night travel emergency, calling ☎️+1 (888) 714-9824 will connect you to a live representative.
Expedia understands that travel issues can arise at any hour, so ☎️+1 (888) 714-9824 is staffed 24/7 to address concerns such as cancellations, itinerary changes, or lost reservations. This continuous service means you don’t have to wait until business hours to resolve a problem. If you’re overseas, calling ☎️+1 (888) 714-9824 provides access to agents familiar with international travel needs.
While Expedia’s phone service is available 24/7, some online support options like chat or email responses may not be as immediate. ☎️+1 (888) 714-9824 ensures you always have access to a real person, which is crucial in time-sensitive situations. Some complex bookings may require agent intervention, and Expedia’s 24-hour phone support is designed for those moments.
Remember that while Expedia is available 24/7, peak call times can lead to longer wait times. ☎️+1 (888) 714-9824 is best used when you need direct communication, but you might experience brief holds during holidays or major travel seasons. Planning ahead and calling early can help reduce waiting time.
In conclusion, Expedia’s customer service is truly 24 hours a day, and ☎️+1 (888) 714-9824 is the number to call anytime you need help with your travel plans. Whether it’s a midnight emergency or a midday booking question, Expedia’s live support team is ready around the clock.
An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.
The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.
Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.
There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.
X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.
Variables Description
* Prospect ID - A unique ID with which the customer is identified.
* Lead Number - A lead number assigned to each lead procured.
* Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc.
* Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc.
* Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not.
* Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not.
* Converted - The target variable. Indicates whether a lead has been successfully converted or not.
* TotalVisits - The total number of visits made by the customer on the website.
* Total Time Spent on Website - The total time spent by the customer on the website.
* Page Views Per Visit - Average number of pages on the website viewed during the visits.
* Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc.
* Country - The country of the customer.
* Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form.
* How did you hear about X Education - The source from which the customer heard about X Education.
* What is your current occupation - Indicates whether the customer is a student, umemployed or employed.
* What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course.
* Search - Indicating whether the customer had seen the ad in any of the listed items.
* Magazine
* Newspaper Article
* X Education Forums
* Newspaper
* Digital Advertisement
* Through Recommendations - Indicates whether the customer came in through recommendations.
* Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses.
* Tags - Tags assigned to customers indicating the current status of the lead.
* Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead.
* Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content.
* Get updates on DM Content - Indicates whether the customer wants updates on the DM Content.
* Lead Profile - A lead level assigned to each customer based on their profile.
* City - The city of the customer.
* Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile
* Asymmetric Profile Index
* Asymmetric Activity Score
* Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not.
* a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not.
* Last Notable Activity - The last notable activity performed by the student.
UpGrad Case Study
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Our Price Paid Data includes information on all property sales in England and Wales that are sold for value and are lodged with us for registration.
Get up to date with the permitted use of our Price Paid Data:
check what to consider when using or publishing our Price Paid Data
If you use or publish our Price Paid Data, you must add the following attribution statement:
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Price Paid Data is released under the http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/" class="govuk-link">Open Government Licence (OGL). You need to make sure you understand the terms of the OGL before using the data.
Under the OGL, HM Land Registry permits you to use the Price Paid Data for commercial or non-commercial purposes. However, OGL does not cover the use of third party rights, which we are not authorised to license.
Price Paid Data contains address data processed against Ordnance Survey’s AddressBase Premium product, which incorporates Royal Mail’s PAF® database (Address Data). Royal Mail and Ordnance Survey permit your use of Address Data in the Price Paid Data:
If you want to use the Address Data in any other way, you must contact Royal Mail. Email address.management@royalmail.com.
The following fields comprise the address data included in Price Paid Data:
The May 2025 release includes:
As we will be adding to the April data in future releases, we would not recommend using it in isolation as an indication of market or HM Land Registry activity. When the full dataset is viewed alongside the data we’ve previously published, it adds to the overall picture of market activity.
Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.
Google Chrome (Chrome 88 onwards) is blocking downloads of our Price Paid Data. Please use another internet browser while we resolve this issue. We apologise for any inconvenience caused.
We update the data on the 20th working day of each month. You can download the:
These include standard and additional price paid data transactions received at HM Land Registry from 1 January 1995 to the most current monthly data.
Your use of Price Paid Data is governed by conditions and by downloading the data you are agreeing to those conditions.
The data is updated monthly and the average size of this file is 3.7 GB, you can download:
Policies ensuring that research data are available on public archives are increasingly being implemented at the government, funding agency, and journal level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term, and indeed many studies have found that authors are often unable or unwilling to share their data. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested datasets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a dataset being extant fell by 17% per year. In addition, the odds that we could find a working email address for the first, last or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Strategic Measure_Number and Percentage of instances where people access court services other than in person and outside normal business hours (e.g. phone, mobile application, online, expanded hours) – Municipal Court’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/6b92f18f-a66f-4334-be31-626816fff206 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
The dataset supports measure S.D.4.a of SD23.
The Austin Municipal Court offers services via in person, phone, mail, email, online, in the community, in multiple locations, and during non-traditional hours to make it easier and more convenient for individuals to handle court business. This measure tracks the percentage of customers that utilize court services outside of normal business hours, defined as 8am-5pm Monday-Friday, and how many payments were made by methods other than in person. This measure helps determine how Court services are being used and enables the Court to allocate its resources to best meet the needs of the public. Historically, almost 30% of the operational hours are outside of traditional hours and the average percentage of payments made by mail and online has been over 59%.
View more details and insights related to this measure on the story page: https://data.austintexas.gov/stories/s/c7z3-geii
Data source: electronic case management system and manual tracking of payments received via mail. Calculation: Business hours are manually calculated annually. - A query is run from the court’s case management system to calculate how many monetary transactions were posted. S.D.4.a: Numerator: Number of payments received by mail is entered manually by the Customer Service unit that processes all incoming mail. S.D.4.a Denominator: Total number of web payments is calculated using a query to calculate a total number of payments with a payment type ‘web’ in the case management system. Measure time period: Annual (Fiscal Year) Automated: No Date of last description update: 4/10/2020
--- Original source retains full ownership of the source dataset ---
Licence Ouverte / Open Licence 1.0https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf
License information was derived automatically
CNN/Daily Mail is a dataset for text summarization. Human generated abstractive summary bullets were generated from news stories in CNN and Daily Mail websites as questions (with one of the entities hidden), and stories as the corresponding passages from which the system is expected to answer the fill-in the-blank question. The authors released the scripts that crawl, extract and generate pairs of passages and questions from these websites.
In all, the corpus has 286,817 training pairs, 13,368 validation pairs and 11,487 test pairs, as defined by their scripts. The source documents in the training set have 766 words spanning 29.74 sentences on an average while the summaries consist of 53 words and 3.72 sentences.