47 datasets found
  1. Enron Email Time-Series Network

    • zenodo.org
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2020). Enron Email Time-Series Network [Dataset]. http://doi.org/10.5281/zenodo.1342353
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

    To build a graph G = (V, E), we use email addresses as nodes V. Every node vi has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge eij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

    Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

    The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

    'id-email.csv' is a file containing the actual email addresses.

  2. o

    The total number of mailboxes and number of active mailboxes every day

    • opendataumea.aws-ec2-eu-central-1.opendatasoft.com
    • opendata.umea.se
    • +1more
    csv, excel, json
    Updated Sep 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). The total number of mailboxes and number of active mailboxes every day [Dataset]. https://opendataumea.aws-ec2-eu-central-1.opendatasoft.com/explore/dataset/getmailboxusagemailboxcounts0/calendar/?flg=en
    Explore at:
    json, csv, excelAvailable download formats
    Dataset updated
    Sep 1, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The total number of user mailboxes in Umeå kommun and how many are active each day of the reporting period. A mailbox is considered active if the user sent or read any email.

  3. h

    cnn_dailymail

    • huggingface.co
    Updated Aug 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abigail See (2023). cnn_dailymail [Dataset]. https://huggingface.co/datasets/abisee/cnn_dailymail
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Authors
    Abigail See
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for CNN Dailymail Dataset

      Dataset Summary
    

    The CNN / DailyMail Dataset is an English-language dataset containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering.

      Supported Tasks and Leaderboards
    

    'summarization': Versions… See the full description on the dataset page: https://huggingface.co/datasets/abisee/cnn_dailymail.

  4. Twitter vs. Newsletter Impact

    • kaggle.com
    Updated Sep 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael Tatman (2017). Twitter vs. Newsletter Impact [Dataset]. https://www.kaggle.com/rtatman/twitter-vs-newsletter/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rachael Tatman
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context:

    There are lots of really cool datasets getting added to Kaggle every day, and as part of my job I want to help people find them. I’ve been tweeting about datasets on my personal Twitter accounts @rctatman and also releasing a weekly newsletter of interesting datasets.

    I wanted to know which method was more effective at getting the word out about new datasets: Twitter or the newsletter?

    Content:

    This dataset contains two .csv files. One has information on the impact of tweets with links to datasets, while the other has information on the impact of the newsletter.

    Twitter:

    The Twitter .csv has the following information:

    • month: The month of the tweet (1-12)
    • day: The day of the tweet (1-31)
    • hour: The hour of the tweet (1-24)
    • impressions: The number of impressions the tweet got
    • engagement: The number of total engagements
    • clicks: The number of URL clicks

    Fridata Newsletter:

    The Fridata .csv has the following information:

    • date: The Date the newsletter was sent out
    • month: The Month the newsletter was sent out (1-12)
    • day: The day the newsletter was sent out (1-31)
    • # of dataset links: How many links were in the newsletter
    • recipients: How many people received the email with the newsletter
    • total opens: How many times the newsletter was opened
    • unique opens: How many individuals opened the newsletter
    • total clicks: The total number of clicks on the newsletter
    • unique clicks: (unsure; provided by Tinyletter)
    • notes: notes on the newsletter

    Acknowledgements:

    This dataset was collected by the uploader, Rachael Tatman. It is released here under a CC-BY-SA license.

    Inspiration:

    • Which format receives more views?
    • Which format receives more clicks?
    • Which receives more clicks/view?
    • What’s the best time of day to send a tweet?
  5. Email CTR Prediction

    • kaggle.com
    Updated Nov 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sk4467 (2022). Email CTR Prediction [Dataset]. https://www.kaggle.com/datasets/sk4467/email-ctr-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sk4467
    Description

    Most organizations today rely on email campaigns for effective communication with users. Email communication is one of the popular ways to pitch products to users and build trustworthy relationships with them. Email campaigns contain different types of CTA (Call To Action). The ultimate goal of email campaigns is to maximize the Click Through Rate (CTR). CTR = No. of users who clicked on at least one of the CTA / No. of emails delivered. This Dataset contains details of body length, sub length, mean paragraph , day of week, is weekend, etc.

  6. Email Dataset for Automatic Response Suggestion within a University

    • figshare.com
    pdf
    Updated Feb 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Singh; Dibyendu Mishra; Sanchit Bansal; Vinayak Agarwal; Anjali Goyal; Ashish Sureka (2018). Email Dataset for Automatic Response Suggestion within a University [Dataset]. http://doi.org/10.6084/m9.figshare.5853057.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 4, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Aditya Singh; Dibyendu Mishra; Sanchit Bansal; Vinayak Agarwal; Anjali Goyal; Ashish Sureka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have developed an application and solution approach (using this dataset) for automatically generating and suggesting short email responses to support queries in a university environment. Our proposed solution can be used as one tap or one click solution for responding to various types of queries raised by faculty members and students in a university. Office of Academic Affairs (OAA), Office of Student Life (OSL) and Information Technology Helpdesk (ITD) are support functions within a university which receives hundreds of email messages on the daily basis. Email communication is still the most frequently used mode of communication by these departments. A large percentage of emails received by these departments are frequent and commonly used queries or request for information. Responding to every query by manually typing is a tedious and time consuming task. Furthermore a large percentage of emails and their responses are consists of short messages. For example, an IT support department in our university receives several emails on Wi-Fi not working or someone needing help with a projector or requires an HDMI cable or remote slide changer. Another example is emails from students requesting the office of academic affairs to add and drop courses which they cannot do it directly. The dataset consists of emails messages which are generally received by ITD, OAA and OSL in Ashoka University. The dataset also contains intermediate results while conducting machine learning experiments.

  7. h

    cnn_dailymail

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Dec 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ccdv (2021). cnn_dailymail [Dataset]. https://huggingface.co/datasets/ccdv/cnn_dailymail
    Explore at:
    Dataset updated
    Dec 18, 2021
    Authors
    ccdv
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    CNN/DailyMail non-anonymized summarization dataset.

    There are two features: - article: text of news article, used as the document to be summarized - highlights: joined text of highlights with and around each highlight, which is the target summary

  8. Telecom Churn Analysis Dataset

    • kaggle.com
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geet Mukherjee (2023). Telecom Churn Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/geetmukherjee/telecom-churn-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Geet Mukherjee
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset is about telecom industry which tells about the number of customers who churned the service. It consists of 3333 observations having 21 variables. We have to predict which customer is going to churn the service.

    Account.Length: how long account has been active.

    VMail.Message: Number of voice mail messages send by the customer.

    Day.Mins: Time spent on day calls.

    Eve.Mins: Time spent on evening calls.

    Night.Mins: Time spent on night calls.

    Intl. Mins: Time spent on international calls.

    Day.Calls: Number of day calls by customers.

    Eve.Calls: Number of evening calls by customers.

    Intl.Calls: Number of international calls.

    Night.Calls: Number of night calls by customer.

    Day.Charge: Charges of Day Calls.

    Night.Charge: Charges of Night Calls.

    Eve.Charge: Charges of evening Calls.

    Intl.Charge: Charges of international calls.

    VMail.Plan: Voice mail plan taken by the customer or not.

    State: State in Area of study.

    Phone: Phone number of the customer.

    Area.Code: Area Code of customer.

    Int.l.Plan: Does customer have international plan or not.

    CustServ.Calls: Number of customer service calls by customer.

    Churn : Customers who churned the telecom service or who doesn’t(0=“Churner”, 1=“Non-Churner”)

  9. d

    Global Cyber Risk Data | Email Address Validation | Drive Decisions on...

    • datarade.ai
    .json, .csv
    Updated Nov 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datazag (2024). Global Cyber Risk Data | Email Address Validation | Drive Decisions on Domain Security and Email Deliverability [Dataset]. https://datarade.ai/data-products/datazag-global-cyber-risk-data-email-address-validation-datazag
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Nov 2, 2024
    Dataset authored and provided by
    Datazag
    Area covered
    Romania, Ethiopia, Iceland, Japan, Tajikistan, Sao Tome and Principe, Slovakia, Greece, El Salvador, Ecuador
    Description

    DomainIQ is a comprehensive global Domain Name dataset for organizations that want to build cyber security, data cleaning and email marketing applications. The dataset consists of the DNS records for over 267 million domains, updated daily, representing more than 90% of all public domains in the world.

    The data is enriched by over thirty unique data points, including identifying the mailbox provider for each domain and using AI based predictive analytics to identify elevated risk domains from both a cyber security and email sending reputation perspective.

    DomainIQ from Datazag offers layered intelligence through a highly flexible API and as a dataset, available for both cloud and on-premises applications. Standard formats include CSV, JSON, Parquet, and DuckDB.

    Custom options are available for any other file or database format. With daily updates and constant research from Datazag, organizations can develop their own market leading cyber security, data cleaning and email validation applications supported by comprehensive and accurate data from Datazag. Data updates available on a daily, weekly and monthly basis. API data is updated on a daily basis.

  10. Plastic Object Detection Dataset

    • kaggle.com
    • gts.ai
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataCluster Labs (2024). Plastic Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/plastic-object-detection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DataCluster Labs
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains images of various plastic objects commonly found in everyday life. Each image is annotated with bounding boxes around the plastic items, allowing for object detection tasks in computer vision applications. With a diverse range of items such as milk packets, ketchup pouches, pens, plastic bottles, polythene bags, shampoo bottles and pouches, chips packets, cleaning spray bottles, handwash bottles, and more, this dataset offers rich training material for developing object detection models.

    This dataset is collected by DataCluster Labs, India. To download the full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

    The dataset is an extremely challenging set of over 4000+ original Plastic object images captured and crowdsourced from over 1000+ urban and rural areas, where each image is ** manually reviewed and verified** by computer vision professionals at Datacluster Labs.

    Optimized for Generative AI, Visual Question Answering, Image Classification, and LMM development, this dataset provides a strong basis for achieving robust model performance.

    Dataset Features

    • Dataset size: 4000+
    • Captured by: Over 1000+ crowdsource contributors
    • Resolution: 99.9% images HD and above (1920x1080 and above)
    • Location: Captured across 500+ cities
    • Diversity: Various lighting conditions like day, night, varied distances, different material view points etc.
    • Device used: Captured using mobile phones in 2020-2022
    • Usage: Plastic object detection, Recycling and waste management, Garbage segregation, Trash segregation, etc.

    Available Annotation formats

    COCO, YOLO, PASCAL-VOC, Tf-Record

    The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.

  11. d

    Medallion Drivers - Active

    • catalog.data.gov
    • data.cityofnewyork.us
    • +6more
    Updated Sep 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Medallion Drivers - Active [Dataset]. https://catalog.data.gov/dataset/medallion-drivers-active
    Explore at:
    Dataset updated
    Sep 20, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    PLEASE NOTE: This dataset, which includes all TLC Licensed Drivers who are in good standing and able to drive, is updated every day in the evening between 4-7pm. Please check the 'Last Update Date' field to make sure the list has updated successfully. 'Last Update Date' should show either today or yesterday's date, depending on the time of day. If the list is outdated, please download the most recent list from the link below. http://www1.nyc.gov/assets/tlc/downloads/datasets/tlc_medallion_drivers_active.csv This is a list of drivers with a current TLC Driver License, which authorizes drivers to operate NYC TLC licensed yellow and green taxicabs and for-hire vehicles (FHVs). This list is accurate as of the date and time shown in the Last Date Updated and Last Time Updated fields. Questions about the contents of this dataset can be sent by email to: licensinginquiries@tlc.nyc.gov.

  12. 2025 Municipal Primary Election Mail Ballot Requests Department of State NO...

    • data.pa.gov
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of State (2025). 2025 Municipal Primary Election Mail Ballot Requests Department of State NO FURTHER UPDATES [Dataset]. https://data.pa.gov/Government-Efficiency-Citizen-Engagement/2025-Municipal-Primary-Election-Mail-Ballot-Reques/ih4x-yb7a
    Explore at:
    kmz, csv, application/geo+json, xlsx, xml, kmlAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Authors
    Department of State
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    This dataset describes the current state of mail ballot requests for the 2025 Municipal Primary Election. It’s a snapshot in time of the current volume of ballot requests across the Commonwealth. The file contains all mail ballot requests except ballot applications that are declined as duplicate.

    This point-in-time transactional data is being published for informational purposes to provide detailed data pertaining to the processing of absentee and mail-in ballots by county election offices. This data is extracted once per day from the Statewide Uniform Registry of Electors (SURE system), and it reflects activity recorded by the counties in the SURE system at the time of the data extraction.

    Please note that county election offices will continue to process ballot applications (as applicable), record ballots, reconcile ballot data, and make corrections when necessary, and this will continue through, and even after, Election Day. Administrative practices for recording transactions in the system will vary by county. For example, some counties record individual transactions as they occur, while others record transactions in batches at specific intervals. These activities may result in substantial changes to a county's reported data from one day to the next. County practices also differ on when cancelled ballot data is entered into the database (i.e., before or after the election). Some counties do not enter cancelled ballot data entirely.

    Additional notes specific to this dataset: • Counties can enter cancellation codes without entering a ballot returned date. • Some cancellation codes are a result of administrative processes, meaning the ballot was never mailed to the voter before it was cancelled (e.g., there was an error when the label was printed). • Confidential and protected voters are not included in this file. • Counties can only enter one cancel code per ballot, even if there are multiple errors. Different counties may vary in what code they choose to use when this arises, or they may choose to use the catch-all category of 'CANC - OTHER'.

    Type of data included in this file: This data includes all mail ballot applications processed by counties, which includes voters on the permanent mail-in and absentee ballot lists. Multiple rows in this data may correspond to the same voter if they submitted more than one application or had a(n) cancelled ballot(s). A deidentified voter ID has been provided to allow data users to identify when rows correspond to the same voter. This ID is randomized and cannot be used to match to SURE, the Full Voter Export, or previous iterations of the Statewide Mail Ballot File. All application types in this file are considered a type of mail ballot. Some of the applications are considered UOCAVA (Uniformed and Overseas Citizens Absentee Voting Act) or UMOVA (Uniform Military and Overseas Voters Act) ballots. These are listed below:

    • CRI - Civilian - Remote/Isolated • CVO - Civilian Overseas • F - Federal (Unregistered) • M - Military • MRI - Military - Remote/Isolated • V - Veteran • BV - Bedridden Veteran • BVRI - Bedridden Veteran - Remote/Isolated *We may not have all application types in the file for every election.

  13. MERRA-2 statD_2d_slv_Nx: 2d,Daily,Aggregated...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). MERRA-2 statD_2d_slv_Nx: 2d,Daily,Aggregated Statistics,Single-Level,Assimilation,Single-Level Diagnostics 0.625 x 0.5 degree V5.12.4 (M2SDNXSLV) at GES DISC - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/merra-2-statd-2d-slv-nx-2ddailyaggregated-statisticssingle-levelassimilationsingle-level-d-fb2ad
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    M2SDNXSLV (or statD_2d_slv_Nx) is a 2-dimensional daily data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of daily statistics, such as daily mean (or daily minimum and maximum) air temperature at 2-meter, and maximum precipitation rate during the period. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov).

  14. w

    Immigration system statistics data tables

    • gov.uk
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
    Explore at:
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    GOV.UK
    Authors
    Home Office
    Description

    List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

    If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

    Accessible file formats

    The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
    If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
    Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Immigration system statistics, year ending June 2025
    Immigration system statistics quarterly release
    Immigration system statistics user guide
    Publishing detailed data tables in migration statistics
    Policy and legislative changes affecting migration to the UK: timeline
    Immigration statistics data archives

    Passenger arrivals

    https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)

    ‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

    Electronic travel authorisation

    https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
    ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

    Entry clearance visas granted outside the UK

    https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)

    https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
    Vis_D01: Entry clearance visa applications, by nationality and visa type
    Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

    Additional data relating to in country and overseas Visa applications can be fo

  15. d

    JRII-S Dataset

    • datasets.ai
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +1more
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2024). JRII-S Dataset [Dataset]. https://datasets.ai/datasets/jrii-s-dataset
    Explore at:
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    U.S. Environmental Protection Agency
    Description

    The sonic data within the building array is composed of 26 days of 30-minute average data from 30 sonic anemometers. The unobstructed tower sonic data is also the same, but of the 5 heights of the tower. The data files have 48 columns associated with date and time identifiers as well as meteorological turbulence measurements. This dataset is not publicly accessible because: The data were not collected by EPA and are hosted external to the agency. It can be accessed through the following means: The detailed sonic dataset is freely available to others wishing to perform additional analysis however, it is large and not readily posted. The complete dataset is included in the comprehensive JR II data archive set up by the DHS Science and Technology (S&T) Directorate, Chemical Security Analysis Center (CSAC). To obtain the data, an email request can be sent to JackRabbit@st.dhs.gov. The user can then access the archive on the Homeland Security Information Network (HSIN). Format: The sonic data within the Jack Rabbit II (JRII) mock-urban building array are in 30-minute averaged daily excel files separated by each sonic anemometer with numerous variables. The unobstructed, raw 10Hz tower data are in .dat files and processed into 30-minute average daily csv files by sonic height.

    This dataset is associated with the following publication: Pirhalla, M., D. Heist, S. Perry, S. Hanna, T. Mazzola, S.P. Arya, and V. Aneja. Urban Wind Field Analysis from the Jack Rabbit II Special Sonic Anemometer Study. ATMOSPHERIC ENVIRONMENT. Elsevier Science Ltd, New York, NY, USA, 243: 14, (2020).

  16. SAPFLUXNET: A global database of sap flow measurements

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta (2020). SAPFLUXNET: A global database of sap flow measurements [Dataset]. http://doi.org/10.5281/zenodo.3697807
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 26, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rafael Poyatos; Rafael Poyatos; Víctor Granda; Víctor Granda; Víctor Flo; Víctor Flo; Roberto Molowny-Horas; Roberto Molowny-Horas; Kathy Steppe; Kathy Steppe; Maurizio Mencuccini; Maurizio Mencuccini; Jordi Martínez-Vilalta; Jordi Martínez-Vilalta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General description

    SAPFLUXNET contains a global database of sap flow and environmental data, together with metadata at different levels.
    SAPFLUXNET is a harmonised database, compiled from contributions from researchers worldwide. This version (0.1.4) contains more than 200 datasets, from all over the World, covering a broad range of bioclimatic conditions.
    More information on the coverage can be found here: http://sapfluxnet.creaf.cat/shiny/sfn_progress_dashboard/.


    The SAPFLUXNET project has been developed by researchers at CREAF and other institutions (http://sapfluxnet.creaf.cat/#team), coordinated by Rafael Poyatos (CREAF, http://www.creaf.cat/staff/rafael-poyatos-lopez), and funded by two Spanish Young Researcher's Grants (SAPFLUXNET, CGL2014-55883-JIN; DATAFORUSE, RTI2018-095297-J-I00 ) and an Alexander von Humboldt Research Fellowship for Experienced Researchers).

    Variables and units

    SAPFLUXNET contains whole-plant sap flow and environmental variables at sub-daily temporal resolution. Both sap flow and environmental time series have accompanying flags in a data frame, one for sap flow and another for environmental
    variables. These flags store quality issues detected during the quality control process and can be used to add further quality flags.

    Metadata contain relevant variables informing about site conditions, stand characteristics, tree and species attributes, sap flow methodology and details on environmental measurements. To learn more about variables, units and data flags please use the functionalities implemented in the sapfluxnetr package (https://github.com/sapfluxnet/sapfluxnetr). In particular, have a look at the package vignettes using R:

    # remotes::install_github(
    #  'sapfluxnet/sapfluxnetr',
    #  build_opts = c("--no-resave-data", "--no-manual", "--build-vignettes")
    # )
    library(sapfluxnetr)
    # to list all vignettes
    vignette(package='sapfluxnetr')
    # variables and units
    vignette('metadata-and-data-units', package='sapfluxnetr')
    # data flags
    vignette('data-flags', package='sapfluxnetr')

    Data formats

    SAPFLUXNET data can be found in two formats: 1) RData files belonging to the custom-built 'sfn_data' class and 2) Text files in .csv format. We recommend using the sfn_data objects together with the sapfluxnetr package, although we also provide the text files for convenience. For each dataset, text files are structured in the same way as the slots of sfn_data objects; if working with text files, we recommend that you check the data structure of 'sfn_data' objects in the corresponding vignette.

    Working with sfn_data files

    To work with SAPFLUXNET data, first they have to be downloaded from Zenodo, maintaining the folder structure. A first level in the folder hierarchy corresponds to file format, either RData files or csv's. A second level corresponds to how sap flow is expressed: per plant, per sapwood area or per leaf area. Please note that interconversions among the magnitudes have been performed whenever possible. Below this level, data have been organised per dataset. In the case of RData files, each dataset is contained in a sfn_data object, which stores all data and metadata in different slots (see the vignette 'sfn-data-classes'). In the case of csv files, each dataset has 9 individual files, corresponding to metadata (5), sap flow and environmental data (2) and their corresponding data flags (2).

    After downloading the entire database, the sapfluxnetr package can be used to:
    - Work with data from a single site: data access, plotting and time aggregation.
    - Select the subset datasets to work with.
    - Work with data from multiple sites: data access, plotting and time aggregation.

    Please check the following package vignettes to learn more about how to work with sfn_data files:

    Quick guide

    Metadata and data units

    sfn_data classes

    Custom aggregation

    Memory and parallelization

    Working with text files

    We recommend to work with sfn_data objects using R and the sapfluxnetr package and we do not currently provide code to work with text files.

    Data issues and reporting

    Please report any issue you may find in the database by sending us an email: sapfluxnet@creaf.uab.cat.

    Temporary data fixes, detected but not yet included in released versions will be published in SAPFLUXNET main web page ('Known data errors').

    Data access, use and citation

    This version of the SAPFLUXNET database is open access. We are working on a data paper describing the database, but, before its publication, please cite this Zenodo entry if SAPFLUXNET is used in any publication.

  17. c

    ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-reminder - Extensions - CKAN Ecosystem Catalog Beta [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-reminder
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The Reminder extension for CKAN enhances data management by providing automated email notifications based on dataset expiry dates and update subscriptions. Designed to work with CKAN versions 2.2 and up, but tested on 2.5.2, this extension offers a straightforward mechanism for keeping users informed about dataset updates and expirations, promoting better data governance and engagement. The extension leverages a daily cron job to check expiry dates and trigger emails. Key Features: Data Expiry Notifications: Sends email notifications when datasets reach their specified expiry date. A daily cronjob process determines when to send these emails. Note that failure of the cronjob will prevent email delivery for that day. Dataset Update Subscriptions: Allows users to subscribe to specific datasets to receive notifications upon updates via a subscription form snippet that can be included in dataset templates. Unsubscribe Functionality: Includes an unsubscribe link in each notification email, enabling users to easily manage their subscriptions. Configuration Settings: Supports at least one recipient for reminder emails via configuration settings in the CKAN config file. Bootstrap Styling: Intended for use with Bootstrap 3+ for styling, but may still work with Bootstrap 2 with potential style inconsistencies. Technical Integration: The Reminder extension integrates into CKAN via plugins, necessitating the addition of reminder to the ckan.plugins setting in the CKAN configuration file. The extension requires database initialization using paster commands to support the subscription functionality. Setting up a daily cronjob is necessary for the automated sending of reminder and notification emails. Benefits & Impact: By implementing the Reminder extension, CKAN installations can improve data management and user engagement. Automated notifications ensure that stakeholders are aware of dataset expirations and updates, leading to better data governance, and more active user involvement in data ecosystems. This extension provides an easy-to-implement solution for managing data lifecycles and keeping users informed.

  18. Aggregated Virtual Patient Model Dataset

    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantinos Deltouzos; Konstantinos Deltouzos (2020). Aggregated Virtual Patient Model Dataset [Dataset]. http://doi.org/10.5281/zenodo.2670048
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Konstantinos Deltouzos; Konstantinos Deltouzos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a collection of aggregated clinical parameters for the participants (such as clinical scores), parameters extracted from the utilized devices (such as average heart rate per day, average gait speed etc.), and coupled events about them (such as falls, loss of orientation etc.). It contains information which was collected during the clinical evaluation of the older people from medical experts.This information represents the clinical status of the older person across different domains, e.g. physical, psychological, cognitive etc.

    The dataset contains several medical features which are used by clinicians to assess the overall state of the older people.

    The purpose of the Virtual Patient Model is to assess the overall state of the older people based on their medical parameters, and to find associations between these parameters and frailty status.

    A list of the recorded clinical parameters and their description is shown below:

    - part_id: The user ID, which should be a 4-digit number

    - q_date: The recording timestamp, which follows the “YYYY-MM-DDTHH:mm:ss.fffZ” format (eg. 14 September 2017 12:23:34.567, is formatted as 2019-09-14T12:23:34.567Z)

    - clinical_visit: As several clinical evaluations were performed to each older adult, this number shows for which clinical evaluation these measurements refer to

    - fried: Ordinal categorization of frailty level according to Fried operational definition of frailty

    - hospitalization_one_year: Number of nonscheduled hospitalizations in the last year

    - hospitalization_three_years: Number of nonscheduled hospitalizations in the last three years

    - ortho_hypotension: Presence of orthostatic hypotension

    - vision: Visual difficulty (qualitative ordinal evaluation)

    - audition: Hearing difficulty (qualitative ordinal evaluation)

    - weight_loss: Unintentional weight loss >4.5 kg in the past year (categorical answer)

    - exhaustion_score: Self-reported exhaustion (categorical answer)

    - raise_chair_time: Time in seconds to perform a lower limb strength clinical test

    - balance_single: Single foot station (Balance) (categorical answer)

    - gait_get_up: Time in seconds to perform the 3meters’ Timed Get Up And Go Test

    - gait_speed_4m: Speed for 4 meters’ straight walk

    - gait_optional_binary: Gait optional evaluation (qualitative evaluation by the investigator)

    - gait_speed_slower: Slowed walking speed (categorical answer)

    - grip_strength_abnormal: Grip strength outside the norms (categorical answer)

    - low_physical_activity: Low physical activity (categorical answer)

    - falls_one_year: Number of falls in the last year

    - fractures_three_years: Number of fractures during the last 3 years

    - fried_clinician: Fried’s categorization according to clinician’s estimation (when missing data for answering the Fried’s operational frailty definition questionnaire)

    - bmi_score: Body Mass Index (in Kg/m²)

    - bmi_body_fat: Body Fat (%)

    - waist: Waist circumference (in cm)

    - lean_body_mass: Lean Body Mass (%)

    - screening_score: Mini Nutritional Assessment (MNA) screening score

    - cognitive_total_score: Montreal Cognitive Assessment (MoCA) test score

    - memory_complain: Memory complain (categorical answer)

    - mmse_total_score: Folstein Mini-Mental State Exam score

    - sleep: Reported sleeping problems (qualitative ordinal evaluation)

    - depression_total_score: 15-item Geriatric Depression Scale (GDS-15)

    - anxiety_perception: Anxiety auto-evaluation (visual analogue scale 0-10)

    - living_alone: Living Conditions (categorical answer)

    - leisure_out: Leisure activities (number of leisure activities per week)

    - leisure_club: Membership of a club (categorical answer)

    - social_visits: Number of visits and social interactions per week

    - social_calls: Number of telephone calls exchanged per week

    - social_phone: Approximate time spent on phone per week

    - social_skype: Approximate time spent on videoconference per week

    - social_text: Number of written messages (SMS and emails) sent by the participant per week

    - house_suitable_participant: Subjective suitability of the housing environment according to participant’s evaluation (categorical answer)

    - house_suitable_professional: Subjective suitability of the housing environment according to investigator’s evaluation (categorical answer)

    - stairs_number: Number of steps to access house (without possibility to use elevator)

    - life_quality: Quality of life self-rating (visual analogue scale 0-10)

    - health_rate: Self-rated health status (qualitative ordinal evaluation)

    - health_rate_comparison: Self-assessed change since last year (qualitative ordinal evaluation)

    - pain_perception: Self-rated pain (visual analogue scale 0-10)

    - activity_regular: Regular physical activity (ordinal answer)

    - smoking: Smoking (categorical answer)

    - alcohol_units: Alcohol Use (average alcohol units consumption per week)

    - katz_index: Katz Index of ADL score

    - iadl_grade: Instrumental Activities of Daily Living score

    - comorbidities_count: Number of comorbidities

    - comorbidities_significant_count: Number of comorbidities which affect significantly the person’s functional status

    - medication_count: Number of active substances taken on a regular basis

  19. h

    phishing-email

    • huggingface.co
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luong NGUYEN (2025). phishing-email [Dataset]. https://huggingface.co/datasets/luongnv89/phishing-email
    Explore at:
    Dataset updated
    Aug 26, 2025
    Authors
    Luong NGUYEN
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CEAS-08 Email Phishing Detection Instruction Dataset

    This dataset contains instruction-following conversations for email phishing detection, generated from the CEAS-08 email dataset using multiple large language models. It's designed for fine-tuning conversational AI models on cybersecurity tasks.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    This dataset transforms raw email data into structured instruction-following conversations where an AI security analyst analyzes… See the full description on the dataset page: https://huggingface.co/datasets/luongnv89/phishing-email.

  20. Household Mailstream Study, 1977

    • icpsr.umich.edu
    ascii, sas, spss
    Updated Jan 18, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kallick, Maureen (2006). Household Mailstream Study, 1977 [Dataset]. http://doi.org/10.3886/ICPSR08412.v1
    Explore at:
    sas, spss, asciiAvailable download formats
    Dataset updated
    Jan 18, 2006
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Kallick, Maureen
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/8412/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8412/terms

    Time period covered
    Dec 6, 1976 - Dec 31, 1977
    Area covered
    United States
    Description

    The primary purpose of this survey was to develop a description of the United States household mailstream for the United States Postal Service (USPS) and to provide annualized, nationwide estimates of the volume of mail received and sent by households in the United States. To this end, the survey gathered information on the characteristics of every USPS letter and package that was sent or received by each sampled household on every day of a preassigned week in the survey period. Daily accounts of items not handled by the USPS were also gathered, e.g., United Parcel Service, telegrams, long-distance telephone calls, newspapers, magazines, advertisements, free samples, campaign literature, and utility bills. In addition to providing mailstream information, respondents answered questions pertaining to their mail delivery and mailing practices, their knowledge of mail and other means of communications, and their opinions on both the performance of the USPS and on proposed changes in mail service and rates. They also supplied information on any stamp collectors living in their household, the age and sex of the collectors, the kinds of stamps they collected, and their expenditures on United States commemorative stamps and corner stamps from sheets of new USPS issues. The dataset includes data on the location of the household, length of residence in the current dwelling unit, family income, the age of each household member, and the age, sex, race, education, occupation, and employment status of the respondent and the head of household.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst (2020). Enron Email Time-Series Network [Dataset]. http://doi.org/10.5281/zenodo.1342353
Organization logo

Enron Email Time-Series Network

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
csvAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst; Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

To build a graph G = (V, E), we use email addresses as nodes V. Every node vi has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge eij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

Column 'Count' in 'edges.csv' file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

'id-email.csv' is a file containing the actual email addresses.

Search
Clear search
Close search
Google apps
Main menu