1 dataset found

Z
PAN18 Author Identification: Attribution
data.niaid.nih.gov
nde-dev.biothings.io
Updated Jun 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kestemont, Mike; Tschuggnall, Michael; Stamatatos, Efstathios; Daelemans, Walter; Specht, Günther; Stein, Benno; Potthast, Martin (2022). PAN18 Author Identification: Attribution [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3737684
Explore at:
Dataset updated
Jun 11, 2022
Dataset provided by
Universität Leipzig
Bauhaus-Universität Weimar
Authors
Kestemont, Mike; Tschuggnall, Michael; Stamatatos, Efstathios; Daelemans, Walter; Specht, Günther; Stein, Benno; Potthast, Martin
Description
We provide a corpus which comprises a set of cross-domain authorship attribution problems in each of the following 5 languages: English, French, Italian, Polish, and Spanish. Note that we specifically avoid to use the term 'training corpus' because the sets of candidate authors of the development and the evaluation corpora are not overlapping. Therefore, your approach should not be designed to particularly handle the candidate authors of the development corpus.

Each problem consists of a set of known fanfics by each candidate author and a set of unknown fanfics located in separate folders. The file problem-info.json that can be found in the main folder of each problem, shows the name of folder of unknown documents and the list of names of candidate author folders.

The true author of each unknown document can be seen in the file ground-truth.json, also found in the main folder of each problem.

In addition, to handle a collection of such problems, the file collection-info.jsonincludes all relevant information. In more detail, for each problem it lists its main folder, the language (either "en", "fr", "it", "pl", or "sp") and encoding (always UTF-8) of its documents.

More information: Link
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kestemont, Mike; Tschuggnall, Michael; Stamatatos, Efstathios; Daelemans, Walter; Specht, Günther; Stein, Benno; Potthast, Martin (2022). PAN18 Author Identification: Attribution [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3737684

PAN18 Author Identification: Attribution

Explore at:

Dataset updated

Jun 11, 2022

Dataset provided by

Universität Leipzig
Bauhaus-Universität Weimar

Authors

Kestemont, Mike; Tschuggnall, Michael; Stamatatos, Efstathios; Daelemans, Walter; Specht, Günther; Stein, Benno; Potthast, Martin

Description

We provide a corpus which comprises a set of cross-domain authorship attribution problems in each of the following 5 languages: English, French, Italian, Polish, and Spanish. Note that we specifically avoid to use the term 'training corpus' because the sets of candidate authors of the development and the evaluation corpora are not overlapping. Therefore, your approach should not be designed to particularly handle the candidate authors of the development corpus.

Each problem consists of a set of known fanfics by each candidate author and a set of unknown fanfics located in separate folders. The file problem-info.json that can be found in the main folder of each problem, shows the name of folder of unknown documents and the list of names of candidate author folders.

The true author of each unknown document can be seen in the file ground-truth.json, also found in the main folder of each problem.

In addition, to handle a collection of such problems, the file collection-info.jsonincludes all relevant information. In more detail, for each problem it lists its main folder, the language (either "en", "fr", "it", "pl", or "sp") and encoding (always UTF-8) of its documents.

More information: Link

Clear search

Close search

Google apps

Main menu