1 dataset found
  1. W

    Webis-Gmane-19

    • webis.de
    3766984
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein (2019). Webis-Gmane-19 [Dataset]. http://doi.org/10.5281/zenodo.3766984
    Explore at:
    3766984Available download formats
    Dataset updated
    2019
    Dataset provided by
    Bauhaus-Universität Weimar
    The Web Technology & Information Systems Network
    University of Groningen
    University of Kassel, hessian.AI, and ScaDS.AI
    Bauhaus-Universität Weimar and Leipzig University
    Authors
    Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A large-scale corpus of over 153 million fully-segmented emails from 14.635 public mailing lists.

    The Webis Gmane Email Corpus 2019 is a dataset of more than 153 million parsed and segmented emails crawled between February and May 2019 from gmane.io covering more than 20 years of public mailing lists. The dataset has been published as a resource at ACL 2020.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein (2019). Webis-Gmane-19 [Dataset]. http://doi.org/10.5281/zenodo.3766984

Webis-Gmane-19

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
3766984Available download formats
Dataset updated
2019
Dataset provided by
Bauhaus-Universität Weimar
The Web Technology & Information Systems Network
University of Groningen
University of Kassel, hessian.AI, and ScaDS.AI
Bauhaus-Universität Weimar and Leipzig University
Authors
Janek Bevendorff; Khalid Al-Khatib; Martin Potthast; Benno Stein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A large-scale corpus of over 153 million fully-segmented emails from 14.635 public mailing lists.

The Webis Gmane Email Corpus 2019 is a dataset of more than 153 million parsed and segmented emails crawled between February and May 2019 from gmane.io covering more than 20 years of public mailing lists. The dataset has been published as a resource at ACL 2020.

Search
Clear search
Close search
Google apps
Main menu