Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
6 datasets found
  1. Paderborn Genre Analysis Corpus 2012 (PaGA-12)

    • zenodo.org
    zip
    Updated Jan 1, 2012
  2. Paderborn Genre Analysis Corpus 2012

    • webis.de
    Updated 2012
  3. e

    Entretenimento — Atividades multisexo: Entrada paga. Anos de 2006-2014

    • data.europa.eu
    zip
  4. e

    Paga grossa fix-xahar, ċittadini Żvizzeri — is-settur privat u pubbliku...

    • data.europa.eu
    excel xls, html
    Updated Aug 21, 2021
  5. e

    Entretenimento — Atividades de teatro: Entrada paga. Anos de 2006-2014

    • data.europa.eu
    zip
  6. e

    Paga grossa fix-xahar, ċittadini Żvizzeri — settur privat — Majjistral...

    • data.europa.eu
    excel xls, html
    Updated Apr 13, 2020
  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Baumann, Michael; Lettmann, Theodor; Stein, Benno (2012). Paderborn Genre Analysis Corpus 2012 (PaGA-12) [Dataset]. http://doi.org/10.5281/zenodo.3250070
Organization logoOrganization logo

Paderborn Genre Analysis Corpus 2012 (PaGA-12)

zipAvailable download formats
Dataset updated Jan 1, 2012
Dataset provided by
Bauhaus-Universität Weimarhttp://www.uni-weimar.de/
Paderborn Universityhttp://www.uni-paderborn.de/
Authors
Baumann, Michael; Lettmann, Theodor; Stein, Benno
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Paderborn Genre Analysis 2012 corpus (PaGA-12) contains 1,639 HTML documents of 26 genres. All documents were collected from 2009-10-18 to 2009-11-20, and each document is manually assigned to exactly one genre. For each genre, the corpus provides at least 50 documents.

All HTML documents contain German text only, and framesets are removed. The corpus is delivered in form of a MySQL database dump; the database structure is detailed in a README file delivered with the corpus.

Search
Clear search
Close search
Google apps
Main menu