47 datasets found

Enron Email Time-Series Network
zenodo.org
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2020). Enron Email Time-Series Network [Dataset]. http://doi.org/10.5281/zenodo.1342353
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1342353
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

To build a graph G = (V, E), we use email addresses as nodes V. Every node v_i has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge e_ij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

'id-email.csv' is a file containing the actual email addresses.
o
The total number of mailboxes and number of active mailboxes every day
opendataumea.aws-ec2-eu-central-1.opendatasoft.com
opendata.umea.se
+1more
csv, excel, json
Updated Sep 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). The total number of mailboxes and number of active mailboxes every day [Dataset]. https://opendataumea.aws-ec2-eu-central-1.opendatasoft.com/explore/dataset/getmailboxusagemailboxcounts0/calendar/?flg=en
Explore at:
json, csv, excelAvailable download formats
Dataset updated
Sep 1, 2025
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The total number of user mailboxes in Umeå kommun and how many are active each day of the reporting period. A mailbox is considered active if the user sent or read any email.
h
cnn_dailymail
huggingface.co
Updated Aug 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abigail See (2023). cnn_dailymail [Dataset]. https://huggingface.co/datasets/abisee/cnn_dailymail
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Authors
Abigail See
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for CNN Dailymail Dataset

Dataset Summary

The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

Supported Tasks and Leaderboards

'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.
Twitter vs. Newsletter Impact
kaggle.com
Updated Sep 18, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachael Tatman (2017). Twitter vs. Newsletter Impact [Dataset]. https://www.kaggle.com/rtatman/twitter-vs-newsletter/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rachael Tatman
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context:

There are lots of really cool datasets getting added to Kaggle every day, and as part of my job I want to help people find them. I’ve been tweeting about datasets on my personal Twitter accounts @rctatman and also releasing a weekly newsletter of interesting datasets.

I wanted to know which method was more effective at getting the word out about new datasets: Twitter or the newsletter?

Content:

This dataset contains two .csv files. One has information on the impact of tweets with links to datasets, while the other has information on the impact of the newsletter.

Twitter:

The Twitter .csv has the following information:

month: The month of the tweet (1-12)

day: The day of the tweet (1-31)

hour: The hour of the tweet (1-24)

impressions: The number of impressions the tweet got

engagement: The number of total engagements

clicks: The number of URL clicks

Fridata Newsletter:

The Fridata .csv has the following information:

date: The Date the newsletter was sent out

month: The Month the newsletter was sent out (1-12)

day: The day the newsletter was sent out (1-31)

# of dataset links: How many links were in the newsletter

recipients: How many people received the email with the newsletter

total opens: How many times the newsletter was opened

unique opens: How many individuals opened the newsletter

total clicks: The total number of clicks on the newsletter

unique clicks: (unsure; provided by Tinyletter)

notes: notes on the newsletter

Acknowledgements:

This dataset was collected by the uploader, Rachael Tatman. It is released here under a CC-BY-SA license.

Inspiration:

Which format receives more views?

Which format receives more clicks?

Which receives more clicks/view?

What’s the best time of day to send a tweet?
Email CTR Prediction
kaggle.com
Updated Nov 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sk4467 (2022). Email CTR Prediction [Dataset]. https://www.kaggle.com/datasets/sk4467/email-ctr-prediction
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sk4467
Description
Most organizations today rely on email campaigns for effective communication with users. Email communication is one of the popular ways to pitch products to users and build trustworthy relationships with them. Email campaigns contain different types of CTA (Call To Action). The ultimate goal of email campaigns is to maximize the Click Through Rate (CTR). CTR = No. of users who clicked on at least one of the CTA / No. of emails delivered. This Dataset contains details of body length, sub length, mean paragraph , day of week, is weekend, etc.
Email Dataset for Automatic Response Suggestion within a University
figshare.com
pdf
Updated Feb 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Singh; Dibyendu Mishra; Sanchit Bansal; Vinayak Agarwal; Anjali Goyal; Ashish Sureka (2018). Email Dataset for Automatic Response Suggestion within a University [Dataset]. http://doi.org/10.6084/m9.figshare.5853057.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5853057.v1
Dataset updated
Feb 4, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Aditya Singh; Dibyendu Mishra; Sanchit Bansal; Vinayak Agarwal; Anjali Goyal; Ashish Sureka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We have developed an application and solution approach (using this dataset) for automatically generating and suggesting short email responses to support queries in a university environment. Our proposed solution can be used as one tap or one click solution for responding to various types of queries raised by faculty members and students in a university. Office of Academic Affairs (OAA), Office of Student Life (OSL) and Information Technology Helpdesk (ITD) are support functions within a university which receives hundreds of email messages on the daily basis. Email communication is still the most frequently used mode of communication by these departments. A large percentage of emails received by these departments are frequent and commonly used queries or request for information. Responding to every query by manually typing is a tedious and time consuming task. Furthermore a large percentage of emails and their responses are consists of short messages. For example, an IT support department in our university receives several emails on Wi-Fi not working or someone needing help with a projector or requires an HDMI cable or remote slide changer. Another example is emails from students requesting the office of academic affairs to add and drop courses which they cannot do it directly. The dataset consists of emails messages which are generally received by ITD, OAA and OSL in Ashoka University. The dataset also contains intermediate results while conducting machine learning experiments.
h
cnn_dailymail
huggingface.co
tensorflow.org
+1more
Updated Dec 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ccdv (2021). cnn_dailymail [Dataset]. https://huggingface.co/datasets/ccdv/cnn_dailymail
Explore at:
Dataset updated
Dec 18, 2021
Authors
ccdv
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
CNN/DailyMail non-anonymized summarization dataset.

There are two features: - article: text of news article, used as the document to be summarized - highlights: joined text of highlights with and around each highlight, which is the target summary
Telecom Churn Analysis Dataset
kaggle.com
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geet Mukherjee (2023). Telecom Churn Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/geetmukherjee/telecom-churn-analysis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Geet Mukherjee
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset is about telecom industry which tells about the number of customers who churned the service. It consists of 3333 observations having 21 variables. We have to predict which customer is going to churn the service.

Account.Length: how long account has been active.

VMail.Message: Number of voice mail messages send by the customer.

Day.Mins: Time spent on day calls.

Eve.Mins: Time spent on evening calls.

Night.Mins: Time spent on night calls.

Intl. Mins: Time spent on international calls.

Day.Calls: Number of day calls by customers.

Eve.Calls: Number of evening calls by customers.

Intl.Calls: Number of international calls.

Night.Calls: Number of night calls by customer.

Day.Charge: Charges of Day Calls.

Night.Charge: Charges of Night Calls.

Eve.Charge: Charges of evening Calls.

Intl.Charge: Charges of international calls.

VMail.Plan: Voice mail plan taken by the customer or not.

State: State in Area of study.

Phone: Phone number of the customer.

Area.Code: Area Code of customer.

Int.l.Plan: Does customer have international plan or not.

CustServ.Calls: Number of customer service calls by customer.

Churn : Customers who churned the telecom service or who doesn’t(0=“Churner”, 1=“Non-Churner”)
d
Global Cyber Risk Data | Email Address Validation | Drive Decisions on...
datarade.ai
.json, .csv
Updated Nov 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datazag (2024). Global Cyber Risk Data | Email Address Validation | Drive Decisions on Domain Security and Email Deliverability [Dataset]. https://datarade.ai/data-products/datazag-global-cyber-risk-data-email-address-validation-datazag
Explore at:
.json, .csvAvailable download formats
Dataset updated
Nov 2, 2024
Dataset authored and provided by
Datazag
Area covered
Romania, Ethiopia, Iceland, Japan, Tajikistan, Sao Tome and Principe, Slovakia, Greece, El Salvador, Ecuador
Description
DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.

The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.

DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.

Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email validation applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.
Plastic Object Detection Dataset
kaggle.com
gts.ai
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataCluster Labs (2024). Plastic Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/plastic-object-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DataCluster Labs
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains images of various plastic objects commonly found in everyday life. Each image is annotated with bounding boxes around the plastic items, allowing for object detection tasks in computer vision applications. With a diverse range of items such as milk packets, ketchup pouches, pens, plastic bottles, polythene bags, shampoo bottles and pouches, chips packets, cleaning spray bottles, handwash bottles, and more, this dataset offers rich training material for developing object detection models.

This dataset is collected by DataCluster Labs, India. To download the full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

The dataset is an extremely challenging set of over 4000+ original Plastic object images captured and crowdsourced from over 1000+ urban and rural areas, where each image is ** manually reviewed and verified** by computer vision professionals at Datacluster Labs.

Optimized for Generative AI, Visual Question Answering, Image Classification, and LMM development, this dataset provides a strong basis for achieving robust model performance.

Dataset Features

Dataset size: 4000+

Captured by: Over 1000+ crowdsource contributors

Resolution: 99.9% images HD and above (1920x1080 and above)

Location: Captured across 500+ cities

Diversity: Various lighting conditions like day, night, varied distances, different material view points etc.

Device used: Captured using mobile phones in 2020-2022

Usage: Plastic object detection, Recycling and waste management, Garbage segregation, Trash segregation, etc.

Available Annotation formats

COCO, YOLO, PASCAL-VOC, Tf-Record

The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.
d
Medallion Drivers - Active
catalog.data.gov
data.cityofnewyork.us
+6more
Updated Sep 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Medallion Drivers - Active [Dataset]. https://catalog.data.gov/dataset/medallion-drivers-active
Explore at:
Dataset updated
Sep 20, 2025
Dataset provided by
data.cityofnewyork.us
Description
PLEASE NOTE: This dataset, which includes all TLC Licensed Drivers who are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_medallion_drivers_active.csv This is a list of drivers with a current TLC Driver License, which authorizes drivers to operate NYC TLC licensed yellow and green taxicabs and for-hire vehicles (FHVs). This list is accurate as of the date and time shown in the Last Date Updated and Last Time Updated fields. Questions about the contents of this dataset can be sent by email to: licensinginquiries@tlc.nyc.gov.
2025 Municipal Primary Election Mail Ballot Requests Department of State NO...
data.pa.gov
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of State (2025). 2025 Municipal Primary Election Mail Ballot Requests Department of State NO FURTHER UPDATES [Dataset]. https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2025-Municipal-Primary-Election-Mail-Ballot-Reques/ih4x-yb7a
Explore at:
kmz, csv, application/geo+json, xlsx, xml, kmlAvailable download formats
Dataset updated
Jun 11, 2025
Dataset provided by
United States Department of Statehttp://state.gov/
Authors
Department of State
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
This dataset describes the current state of mail ballot requests for the 2025 Municipal Primary Election. It’s a snapshot in time of the current volume of ballot requests across the Commonwealth. The file contains all mail ballot requests except ballot applications that are declined as duplicate.

This point-in-time transactional data is being published for informational purposes to provide detailed data pertaining to the processing of absentee and mail-in ballots by county election offices. This data is extracted once per day from the Statewide Uniform Registry of Electors (SURE system), and it reflects activity recorded by the counties in the SURE system at the time of the data extraction.

Please note that county election offices will continue to process ballot applications (as applicable), record ballots, reconcile ballot data, and make corrections when necessary, and this will continue through, and even after, Election Day. Administrative practices for recording transactions in the system will vary by county. For example, some counties record individual transactions as they occur, while others record transactions in batches at specific intervals. These activities may result in substantial changes to a county's reported data from one day to the next. County practices also differ on when cancelled ballot data is entered into the database (i.e., before or after the election). Some counties do not enter cancelled ballot data entirely.

Additional notes specific to this dataset: • Counties can enter cancellation codes without entering a ballot returned date. • Some cancellation codes are a result of administrative processes, meaning the ballot was never mailed to the voter before it was cancelled (e.g., there was an error when the label was printed). • Confidential and protected voters are not included in this file. • Counties can only enter one cancel code per ballot, even if there are multiple errors. Different counties may vary in what code they choose to use when this arises, or they may choose to use the catch-all category of 'CANC - OTHER'.

Type of data included in this file: This data includes all mail ballot applications processed by counties, which includes voters on the permanent mail-in and absentee ballot lists. Multiple rows in this data may correspond to the same voter if they submitted more than one application or had a(n) cancelled ballot(s). A deidentified voter ID has been provided to allow data users to identify when rows correspond to the same voter. This ID is randomized and cannot be used to match to SURE, the Full Voter Export, or previous iterations of the Statewide Mail Ballot File. All application types in this file are considered a type of mail ballot. Some of the applications are considered UOCAVA (Uniformed and Overseas Citizens Absentee Voting Act) or UMOVA (Uniform Military and Overseas Voters Act) ballots. These are listed below:

• CRI - Civilian - Remote/Isolated • CVO - Civilian Overseas • F - Federal (Unregistered) • M - Military • MRI - Military - Remote/Isolated • V - Veteran • BV - Bedridden Veteran • BVRI - Bedridden Veteran - Remote/Isolated *We may not have all application types in the file for every election.
MERRA-2 statD_2d_slv_Nx: 2d,Daily,Aggregated...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MERRA-2 statD_2d_slv_Nx: 2d,Daily,Aggregated Statistics,Single-Level,Assimilation,Single-Level Diagnostics 0.625 x 0.5 degree V5.12.4 (M2SDNXSLV) at GES DISC - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/merra-2-statd-2d-slv-nx-2ddailyaggregated-statisticssingle-levelassimilationsingle-level-d-fb2ad
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
M2SDNXSLV (or statD_2d_slv_Nx) is a 2-dimensional daily data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of daily statistics, such as daily mean (or daily minimum and maximum) air temperature at 2-meter, and maximum precipitation rate during the period. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov).
w
Immigration system statistics data tables
gov.uk
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
Explore at:
Dataset updated
Aug 21, 2025
Dataset provided by
GOV.UK
Authors
Home Office
Description
List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

Accessible file formats

The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.

Related content

Immigration system statistics, year ending June 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives

Passenger arrivals

https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)

‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

Electronic travel authorisation

https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

Entry clearance visas granted outside the UK

https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)

https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

Additional data relating to in country and overseas Visa applications can be fo
d
JRII-S Dataset
datasets.ai
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+1more
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). JRII-S Dataset [Dataset]. https://datasets.ai/datasets/jrii-s-dataset
Explore at:
Dataset updated
Aug 6, 2024
Dataset authored and provided by
U.S. Environmental Protection Agency
Description
The sonic data within the building array is composed of 26 days of 30-minute average data from 30 sonic anemometers. The unobstructed tower sonic data is also the same, but of the 5 heights of the tower. The data files have 48 columns associated with date and time identifiers as well as meteorological turbulence measurements. This dataset is not publicly accessible because: The data were not collected by EPA and are hosted external to the agency. It can be accessed through the following means: The detailed sonic dataset is freely available to others wishing to perform additional analysis however, it is large and not readily posted. The complete dataset is included in the comprehensive JR II data archive set up by the DHS Science and Technology (S&T) Directorate, Chemical Security Analysis Center (CSAC). To obtain the data, an email request can be sent to JackRabbit@st.dhs.gov. The user can then access the archive on the Homeland Security Information Network (HSIN). Format: The sonic data within the Jack Rabbit II (JRII) mock-urban building array are in 30-minute averaged daily excel files separated by each sonic anemometer with numerous variables. The unobstructed, raw 10Hz tower data are in .dat files and processed into 30-minute average daily csv files by sonic height.

This dataset is associated with the following publication: Pirhalla, M., D. Heist, S. Perry, S. Hanna, T. Mazzola, S.P. Arya, and V. Aneja. Urban Wind Field Analysis from the Jack Rabbit II Special Sonic Anemometer Study. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 243: 14, (2020).
SAPFLUXNET: A global database of sap flow measurements
zenodo.org
data.niaid.nih.gov
zip
Updated Sep 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta (2020). SAPFLUXNET: A global database of sap flow measurements [Dataset]. http://doi.org/10.5281/zenodo.3697807
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3697807
Dataset updated
Sep 26, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General description

SAPFLUXNET contains a global database of sap flow and environmental data, together with metadata at different levels.
SAPFLUXNET is a harmonised database, compiled from contributions from researchers worldwide. This version (0.1.4) contains more than 200 datasets, from all over the World, covering a broad range of bioclimatic conditions.
More information on the coverage can be found here: http://sapfluxnet.creaf.cat/shiny/sfn_progress_dashboard/.

The SAPFLUXNET project has been developed by researchers at CREAF and other institutions (http://sapfluxnet.creaf.cat/#team), coordinated by Rafael Poyatos (CREAF, http://www.creaf.cat/staff/rafael-poyatos-lopez), and funded by two Spanish Young Researcher's Grants (SAPFLUXNET, CGL2014-55883-JIN; DATAFORUSE, RTI2018-095297-J-I00 ) and an Alexander von Humboldt Research Fellowship for Experienced Researchers).

Variables and units

SAPFLUXNET contains whole-plant sap flow and environmental variables at sub-daily temporal resolution. Both sap flow and environmental time series have accompanying flags in a data frame, one for sap flow and another for environmental
variables. These flags store quality issues detected during the quality control process and can be used to add further quality flags.

Metadata contain relevant variables informing about site conditions, stand characteristics, tree and species attributes, sap flow methodology and details on environmental measurements. To learn more about variables, units and data flags please use the functionalities implemented in the sapfluxnetr package (https://github.com/sapfluxnet/sapfluxnetr). In particular, have a look at the package vignettes using R:

# remotes::install_github( # 'sapfluxnet/sapfluxnetr', # build_opts = c("--no-resave-data", "--no-manual", "--build-vignettes") # ) library(sapfluxnetr) # to list all vignettes vignette(package='sapfluxnetr') # variables and units vignette('metadata-and-data-units', package='sapfluxnetr') # data flags vignette('data-flags', package='sapfluxnetr')

Data formats

SAPFLUXNET data can be found in two formats: 1) RData files belonging to the custom-built 'sfn_data' class and 2) Text files in .csv format. We recommend using the sfn_data objects together with the sapfluxnetr package, although we also provide the text files for convenience. For each dataset, text files are structured in the same way as the slots of sfn_data objects; if working with text files, we recommend that you check the data structure of 'sfn_data' objects in the corresponding vignette.

Working with sfn_data files

To work with SAPFLUXNET data, first they have to be downloaded from Zenodo, maintaining the folder structure. A first level in the folder hierarchy corresponds to file format, either RData files or csv's. A second level corresponds to how sap flow is expressed: per plant, per sapwood area or per leaf area. Please note that interconversions among the magnitudes have been performed whenever possible. Below this level, data have been organised per dataset. In the case of RData files, each dataset is contained in a sfn_data object, which stores all data and metadata in different slots (see the vignette 'sfn-data-classes'). In the case of csv files, each dataset has 9 individual files, corresponding to metadata (5), sap flow and environmental data (2) and their corresponding data flags (2).

After downloading the entire database, the sapfluxnetr package can be used to:
- Work with data from a single site: data access, plotting and time aggregation.
- Select the subset datasets to work with.
- Work with data from multiple sites: data access, plotting and time aggregation.

Please check the following package vignettes to learn more about how to work with sfn_data files:

Quick guide

Metadata and data units

sfn_data classes

Custom aggregation

Memory and parallelization

Working with text files

We recommend to work with sfn_data objects using R and the sapfluxnetr package and we do not currently provide code to work with text files.

Data issues and reporting

Please report any issue you may find in the database by sending us an email: sapfluxnet@creaf.uab.cat.

Temporary data fixes, detected but not yet included in released versions will be published in SAPFLUXNET main web page ('Known data errors').

Data access, use and citation

This version of the SAPFLUXNET database is open access. We are working on a data paper describing the database, but, before its publication, please cite this Zenodo entry if SAPFLUXNET is used in any publication.
c
ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-reminder
Explore at:
Dataset updated
Jun 4, 2025
Description
The Reminder extension for CKAN enhances data management by providing automated email notifications based on dataset expiry dates and update subscriptions. Designed to work with CKAN versions 2.2 and up, but tested on 2.5.2, this extension offers a straightforward mechanism for keeping users informed about dataset updates and expirations, promoting better data governance and engagement. The extension leverages a daily cron job to check expiry dates and trigger emails. Key Features: Data Expiry Notifications: Sends email notifications when datasets reach their specified expiry date. A daily cronjob process determines when to send these emails. Note that failure of the cronjob will prevent email delivery for that day. Dataset Update Subscriptions: Allows users to subscribe to specific datasets to receive notifications upon updates via a subscription form snippet that can be included in dataset templates. Unsubscribe Functionality: Includes an unsubscribe link in each notification email, enabling users to easily manage their subscriptions. Configuration Settings: Supports at least one recipient for reminder emails via configuration settings in the CKAN config file. Bootstrap Styling: Intended for use with Bootstrap 3+ for styling, but may still work with Bootstrap 2 with potential style inconsistencies. Technical Integration: The Reminder extension integrates into CKAN via plugins, necessitating the addition of reminder to the ckan.plugins setting in the CKAN configuration file. The extension requires database initialization using paster commands to support the subscription functionality. Setting up a daily cronjob is necessary for the automated sending of reminder and notification emails. Benefits & Impact: By implementing the Reminder extension, CKAN installations can improve data management and user engagement. Automated notifications ensure that stakeholders are aware of dataset expirations and updates, leading to better data governance, and more active user involvement in data ecosystems. This extension provides an easy-to-implement solution for managing data lifecycles and keeping users informed.
Aggregated Virtual Patient Model Dataset
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantinos Deltouzos; Konstantinos Deltouzos (2020). Aggregated Virtual Patient Model Dataset [Dataset]. http://doi.org/10.5281/zenodo.2670048
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2670048
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Konstantinos Deltouzos; Konstantinos Deltouzos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is a collection of aggregated clinical parameters for the participants (such as clinical scores), parameters extracted from the utilized devices (such as average heart rate per day, average gait speed etc.), and coupled events about them (such as falls, loss of orientation etc.). It contains information which was collected during the clinical evaluation of the older people from medical experts.This information represents the clinical status of the older person across different domains, e.g. physical, psychological, cognitive etc.

The dataset contains several medical features which are used by clinicians to assess the overall state of the older people.

The purpose of the Virtual Patient Model is to assess the overall state of the older people based on their medical parameters, and to find associations between these parameters and frailty status.

A list of the recorded clinical parameters and their description is shown below:

- part_id: The user ID, which should be a 4-digit number

- q_date: The recording timestamp, which follows the “YYYY-MM-DDTHH:mm:ss.fffZ” format (eg. 14 September 2017 12:23:34.567, is formatted as 2019-09-14T12:23:34.567Z)

- clinical_visit: As several clinical evaluations were performed to each older adult, this number shows for which clinical evaluation these measurements refer to

- fried: Ordinal categorization of frailty level according to Fried operational definition of frailty

- hospitalization_one_year: Number of nonscheduled hospitalizations in the last year

- hospitalization_three_years: Number of nonscheduled hospitalizations in the last three years

- ortho_hypotension: Presence of orthostatic hypotension

- vision: Visual difficulty (qualitative ordinal evaluation)

- audition: Hearing difficulty (qualitative ordinal evaluation)

- weight_loss: Unintentional weight loss >4.5 kg in the past year (categorical answer)

- exhaustion_score: Self-reported exhaustion (categorical answer)

- raise_chair_time: Time in seconds to perform a lower limb strength clinical test

- balance_single: Single foot station (Balance) (categorical answer)

- gait_get_up: Time in seconds to perform the 3meters’ Timed Get Up And Go Test

- gait_speed_4m: Speed for 4 meters’ straight walk

- gait_optional_binary: Gait optional evaluation (qualitative evaluation by the investigator)

- gait_speed_slower: Slowed walking speed (categorical answer)

- grip_strength_abnormal: Grip strength outside the norms (categorical answer)

- low_physical_activity: Low physical activity (categorical answer)

- falls_one_year: Number of falls in the last year

- fractures_three_years: Number of fractures during the last 3 years

- fried_clinician: Fried’s categorization according to clinician’s estimation (when missing data for answering the Fried’s operational frailty definition questionnaire)

- bmi_score: Body Mass Index (in Kg/m²)

- bmi_body_fat: Body Fat (%)

- waist: Waist circumference (in cm)

- lean_body_mass: Lean Body Mass (%)

- screening_score: Mini Nutritional Assessment (MNA) screening score

- cognitive_total_score: Montreal Cognitive Assessment (MoCA) test score

- memory_complain: Memory complain (categorical answer)

- mmse_total_score: Folstein Mini-Mental State Exam score

- sleep: Reported sleeping problems (qualitative ordinal evaluation)

- depression_total_score: 15-item Geriatric Depression Scale (GDS-15)

- anxiety_perception: Anxiety auto-evaluation (visual analogue scale 0-10)

- living_alone: Living Conditions (categorical answer)

- leisure_out: Leisure activities (number of leisure activities per week)

- leisure_club: Membership of a club (categorical answer)

- social_visits: Number of visits and social interactions per week

- social_calls: Number of telephone calls exchanged per week

- social_phone: Approximate time spent on phone per week

- social_skype: Approximate time spent on videoconference per week

- social_text: Number of written messages (SMS and emails) sent by the participant per week

- house_suitable_participant: Subjective suitability of the housing environment according to participant’s evaluation (categorical answer)

- house_suitable_professional: Subjective suitability of the housing environment according to investigator’s evaluation (categorical answer)

- stairs_number: Number of steps to access house (without possibility to use elevator)

- life_quality: Quality of life self-rating (visual analogue scale 0-10)

- health_rate: Self-rated health status (qualitative ordinal evaluation)

- health_rate_comparison: Self-assessed change since last year (qualitative ordinal evaluation)

- pain_perception: Self-rated pain (visual analogue scale 0-10)

- activity_regular: Regular physical activity (ordinal answer)

- smoking: Smoking (categorical answer)

- alcohol_units: Alcohol Use (average alcohol units consumption per week)

- katz_index: Katz Index of ADL score

- iadl_grade: Instrumental Activities of Daily Living score

- comorbidities_count: Number of comorbidities

- comorbidities_significant_count: Number of comorbidities which affect significantly the person’s functional status

- medication_count: Number of active substances taken on a regular basis
h
phishing-email
huggingface.co
Updated Aug 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luong NGUYEN (2025). phishing-email [Dataset]. https://huggingface.co/datasets/luongnv89/phishing-email
Explore at:
Dataset updated
Aug 26, 2025
Authors
Luong NGUYEN
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CEAS-08 Email Phishing Detection Instruction Dataset

This dataset contains instruction-following conversations for email phishing detection, generated from the CEAS-08 email dataset using multiple large language models. It's designed for fine-tuning conversational AI models on cybersecurity tasks.

Dataset Details Dataset Description

This dataset transforms raw email data into structured instruction-following conversations where an AI security analyst analyzes… See the full description on the dataset page: https://huggingface.co/datasets/luongnv89/phishing-email.
Household Mailstream Study, 1977
icpsr.umich.edu
ascii, sas, spss
Updated Jan 18, 2006
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kallick, Maureen (2006). Household Mailstream Study, 1977 [Dataset]. http://doi.org/10.3886/ICPSR08412.v1
Explore at:
sas, spss, asciiAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR08412.v1
Dataset updated
Jan 18, 2006
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Kallick, Maureen
License
https://www.icpsr.umich.edu/web/ICPSR/studies/8412/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8412/terms
Time period covered
Dec 6, 1976 - Dec 31, 1977
Area covered
United States
Description
The primary purpose of this survey was to develop a description of the United States household mailstream for the United States Postal Service (USPS) and to provide annualized, nationwide estimates of the volume of mail received and sent by households in the United States. To this end, the survey gathered information on the characteristics of every USPS letter and package that was sent or received by each sampled household on every day of a preassigned week in the survey period. Daily accounts of items not handled by the USPS were also gathered, e.g., United Parcel Service, telegrams, long-distance telephone calls, newspapers, magazines, advertisements, free samples, campaign literature, and utility bills. In addition to providing mailstream information, respondents answered questions pertaining to their mail delivery and mailing practices, their knowledge of mail and other means of communications, and their opinions on both the performance of the USPS and on proposed changes in mail service and rates. They also supplied information on any stamp collectors living in their household, the age and sex of the collectors, the kinds of stamps they collected, and their expenditures on United States commemorative stamps and corner stamps from sheets of new USPS issues. The dataset includes data on the location of the household, length of residence in the current dwelling unit, family income, the age of each household member, and the age, sex, race, education, occupation, and employment status of the respondent and the head of household.

Facebook

Twitter

Click to copy link

Link copied

Cite

Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2020). Enron Email Time-Series Network [Dataset]. http://doi.org/10.5281/zenodo.1342353

Enron Email Time-Series Network

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1342353

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

To build a graph G = (V, E), we use email addresses as nodes V. Every node v_i has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge e_ij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

'id-email.csv' is a file containing the actual email addresses.

Clear search

Close search

Google apps

Main menu

Enron Email Time-Series Network

The total number of mailboxes and number of active mailboxes every day

cnn_dailymail

Twitter vs. Newsletter Impact

Context:

Content:

Acknowledgements:

Inspiration:

Email CTR Prediction

Email Dataset for Automatic Response Suggestion within a University

cnn_dailymail

Telecom Churn Analysis Dataset

Global Cyber Risk Data | Email Address Validation | Drive Decisions on...

Plastic Object Detection Dataset

This dataset is collected by DataCluster Labs, India. To download the full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

Dataset Features

Available Annotation formats

Medallion Drivers - Active

2025 Municipal Primary Election Mail Ballot Requests Department of State NO...

MERRA-2 statD_2d_slv_Nx: 2d,Daily,Aggregated...

Immigration system statistics data tables

Accessible file formats

Related content

Passenger arrivals

Electronic travel authorisation

Entry clearance visas granted outside the UK

JRII-S Dataset

SAPFLUXNET: A global database of sap flow measurements

ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta

Aggregated Virtual Patient Model Dataset

phishing-email

Household Mailstream Study, 1977

Enron Email Time-Series Network