2 datasets found
  1. Z

    PAN14 Author Identification: Verification

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verhoeven, Ben (2023). PAN14 Author Identification: Verification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3716032
    Explore at:
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    Stein, Benno
    Juola, Patrick
    A. Sanchez-Perez, Miguel
    Potthast, Martin
    Verhoeven, Ben
    Stamatatos, Efstathios
    Barrón-Cedeño, Alberto
    Daelemans, Walter
    Description

    We provide you with a training corpus that comprises a set of author verification problems in several languages/genres. Each problem consists of some (up to five) known documents by a single person and exactly one questioned document. All documents within a single problem instance will be in the same language and best efforts are applied to assure that within-problem documents are matched for genre, register, theme, and date of writing. The document lengths vary from a few hundred to a few thousand words.

    More information: Link

  2. Z

    PAN15 Author Identification: Verification

    • data.niaid.nih.gov
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juola, Patrick (2023). PAN15 Author Identification: Verification [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_3737562
    Explore at:
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Stein, Benno
    Juola, Patrick
    López-López, Aurelio
    Potthast, Martin
    Stamatatos, Efstathios
    Daelemans Daelemans amd Ben Verhoeven, Walter
    Description

    We provide you with a training corpus that comprises a set of author verification problems in several languages/genres. Each problem consists of some (up to five) known documents by a single person and exactly one questioned document. All documents within a single problem instance will be in the same language. However, their genre and/or topic may differ significantly. The document lengths vary from a few hundred to a few thousand words.

    The documents of each problem are located in a separate folder, the name of which (problem ID) encodes the language of the documents. The following list shows the available sub-corpora, including their language, type (cross-genre or cross-topic), code, and examples of problem IDs:

    Language; Type; Code; Problem IDs Dutch; Cross-genre; DU; DU001, DU002, DU003, etc. English; Cross-topic; EN; EN001, EN002, EN003, etc. Greek; Cross-topic; GR; GR001, GR002, GR003, etc. Spanish; Cross-genre; SP; SP001, SP002, SP003, etc.

    The ground truth data of the training corpus found in the file truth.txt include one line per problem with problem ID and the correct binary answer (Y means the known and the questioned documents are by the same author and N means the opposite). For example:

    EN001 N EN002 Y EN003 N ...

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Verhoeven, Ben (2023). PAN14 Author Identification: Verification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3716032

PAN14 Author Identification: Verification

Explore at:
Dataset updated
Nov 13, 2023
Dataset provided by
Stein, Benno
Juola, Patrick
A. Sanchez-Perez, Miguel
Potthast, Martin
Verhoeven, Ben
Stamatatos, Efstathios
Barrón-Cedeño, Alberto
Daelemans, Walter
Description

We provide you with a training corpus that comprises a set of author verification problems in several languages/genres. Each problem consists of some (up to five) known documents by a single person and exactly one questioned document. All documents within a single problem instance will be in the same language and best efforts are applied to assure that within-problem documents are matched for genre, register, theme, and date of writing. The document lengths vary from a few hundred to a few thousand words.

More information: Link

Search
Clear search
Close search
Google apps
Main menu