4 datasets found

PAN25 Multi-Author Writing Style Analysis
zenodo.org
zip
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein (2025). PAN25 Multi-Author Writing Style Analysis [Dataset]. http://doi.org/10.5281/zenodo.15053260
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15053260
Dataset updated
Mar 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the dataset for the shared task on Multi-Author Writing Style Analysis PAN@CLEF2025. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.

Task

The goal of the style change detection task is to identify text positions within a given multi-author document at which the author switches. Hence, a fundamental question is the following: If multiple authors together have written a text, can we find evidence for this fact; do we have a means to detect variations in the writing style? Answering this question belongs to the most difficult and most interesting challenges in author identification: Style change detection is the only means to detect plagiarism in a document if no comparison texts are given; likewise, style change detection can help to uncover gift authorships, to verify a claimed authorship, or to develop new technology for writing support.

Previous editions of the multi-author writing style analysis task aim at e.g., detecting whether a document is single- or multi-authored (2018), the actual number of authors within a document (2019), whether there was a style change between two consecutive paragraphs (2020, 2021, 2022), and where the actual style changes were located (2021, 2022). In 2022, style changes also had to be detected on the sentence level. The previously used datasets exhibited high topic diversity, which allowed the participants to leverage topic information as a style change signal. In the 2023 and 2024 editions of the writing style analysis task, special attention is paid to this issue.

We ask participants to solve the following intrinsic style change detection task: for a given text, find all positions of writing style change on the sentence-level (i.e., for each pair of consecutive sentences, assess whether there was a style change). The simultaneous change of authorship and topic will be carefully controlled and we will provide participants with datasets of three difficulty levels:

Easy: The sentences of a document cover a variety of topics, allowing approaches to make use of topic information to detect authorship changes.

Medium: The topical variety in a document is small (though still present) forcing the approaches to focus more on style to effectively solve the detection task.

Hard: All sentences in a document are on the same topic.

All documents are provided in English and may contain an arbitrary number of style changes. However, style changes may only occur between sentences (i.e., a single sentence is always authored by a single author and contains no style changes).

Data

To develop and then test your algorithms, three datasets including ground truth information are provided (easy for the easy task, medium for the medium task, and hard for the hard task).

Each dataset is split into three parts:

training set: Contains 70% of the whole dataset and includes ground truth data. Use this set to develop and train your models.

validation set: Contains 15% of the whole dataset and includes ground truth data. Use this set to evaluate and optimize your models.

test set: Contains 15% of the whole dataset, no ground truth data is given. This set is used for evaluation.

You are free to use additional external data for training your models. However, we ask you to make the additional data utilized freely available under a suitable license.
g
Current Turboveg Data Dictionary and Panarctic Species List (PASL) -...
arcticatlas.geobotany.org
Updated Sep 1, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Current Turboveg Data Dictionary and Panarctic Species List (PASL) - Datasets - Alaska Arctic Geoecological Atlas [Dataset]. https://arcticatlas.geobotany.org/catalog/dataset/current-turboveg-data-dictionary-and-panarctic-species-list-pasl
Explore at:
Dataset updated
Sep 1, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Arctic
Description
These are the most recent Data Dictionary (pop-ups) and Panarctic Species List (PASL) zip files for all the vegetation plot data entered into Turboveg for the Alaska AVA. These files are necessary to correctly use the Turboveg data with regards to coded data. The Data Dictionary file will be updated when new datasets are entered into Turboveg which result in additions to coded data such as references, author code, habitat type, surficial geology, etc. Updates to the PASL will occur less frequently. Check the dates in the file names to be certain that you are using the most current files. Our data model is a set of tables that comprise our relational database. The Excel spreadsheet included in the resources below provides information about each field in our database, such as data type, description, if it is a required field, whether the information within the field is selected from a pop-up list, and whether the field is a standard within Turboveg or is specific to the AVA. Using Turboveg: 1) Download the installation file available through the link at Alaska Arctic Geoecological Atlas portal from the official Turboveg webpage (general installation file for worldwide users, however, some adjustments will be needed when using data from AAVA after installation of this program). 2) Open the Turboveg program and restore the most recent Data Dictionary and PASL zipped files into the Turboveg program by using the function 'Database-Backup/Restore-Restore.' All the previous versions of data dictionary files and PASL that are already in program will be overwritten. 3) Use the Alaska-AVA following the manual for Turboveg for Windows which is available at http://www.synbiosys.alterra.nl/turboveg/tvwin.pdf
o
Data from: An essay on the East-India-trade by the author of The essay upon...
llds.ling-phil.ox.ac.uk
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Davenant (2024). An essay on the East-India-trade by the author of The essay upon wayes and means. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A37163
Explore at:
Dataset updated
Jun 24, 2024
Authors
Charles Davenant
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
(:unav)...........................................
d
Data from: Saying Thanks and Meaning It: Expressing Gratitude for Social...
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Author, Anonymous 1; Author, Anonymous 2 (2023). Saying Thanks and Meaning It: Expressing Gratitude for Social Gain [Dataset]. http://doi.org/10.7910/DVN/PBMCMZ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PBMCMZ
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Author, Anonymous 1; Author, Anonymous 2
Description
People sometimes give thanks as a true expression of their feeling but also sometimes because they know gratitude expression helps to make a certain social impression. That is, some gratitude is expressed because of intrinsic motivations or extrinsic motivations. Such motivations affect the outcomes of behavior. The present work assessed gratitude, trait tendency to manage socially desirable expressions, and well-being across two studies (combined n = 398). Motivations to express gratitude were also measured and impression management goals were manipulated in study 2. Results show that gratitude expression is highest when people want to make a good impression and extrinsic motives to express gratitude can moderate the relationship between gratitude and well-being. Implications for the measurement of gratitude and theoretical understanding of gratitude’s social function are discussed.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein (2025). PAN25 Multi-Author Writing Style Analysis [Dataset]. http://doi.org/10.5281/zenodo.15053260

PAN25 Multi-Author Writing Style Analysis

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15053260

Dataset updated

Mar 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the dataset for the shared task on Multi-Author Writing Style Analysis PAN@CLEF2025. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.

Task

The goal of the style change detection task is to identify text positions within a given multi-author document at which the author switches. Hence, a fundamental question is the following: If multiple authors together have written a text, can we find evidence for this fact; do we have a means to detect variations in the writing style? Answering this question belongs to the most difficult and most interesting challenges in author identification: Style change detection is the only means to detect plagiarism in a document if no comparison texts are given; likewise, style change detection can help to uncover gift authorships, to verify a claimed authorship, or to develop new technology for writing support.

Previous editions of the multi-author writing style analysis task aim at e.g., detecting whether a document is single- or multi-authored (2018), the actual number of authors within a document (2019), whether there was a style change between two consecutive paragraphs (2020, 2021, 2022), and where the actual style changes were located (2021, 2022). In 2022, style changes also had to be detected on the sentence level. The previously used datasets exhibited high topic diversity, which allowed the participants to leverage topic information as a style change signal. In the 2023 and 2024 editions of the writing style analysis task, special attention is paid to this issue.

We ask participants to solve the following intrinsic style change detection task: for a given text, find all positions of writing style change on the sentence-level (i.e., for each pair of consecutive sentences, assess whether there was a style change). The simultaneous change of authorship and topic will be carefully controlled and we will provide participants with datasets of three difficulty levels:

Easy: The sentences of a document cover a variety of topics, allowing approaches to make use of topic information to detect authorship changes.
Medium: The topical variety in a document is small (though still present) forcing the approaches to focus more on style to effectively solve the detection task.
Hard: All sentences in a document are on the same topic.

All documents are provided in English and may contain an arbitrary number of style changes. However, style changes may only occur between sentences (i.e., a single sentence is always authored by a single author and contains no style changes).

Data

To develop and then test your algorithms, three datasets including ground truth information are provided (easy for the easy task, medium for the medium task, and hard for the hard task).

Each dataset is split into three parts:

training set: Contains 70% of the whole dataset and includes ground truth data. Use this set to develop and train your models.
validation set: Contains 15% of the whole dataset and includes ground truth data. Use this set to evaluate and optimize your models.
test set: Contains 15% of the whole dataset, no ground truth data is given. This set is used for evaluation.

You are free to use additional external data for training your models. However, we ask you to make the additional data utilized freely available under a suitable license.

Clear search

Close search

Google apps

Main menu

PAN25 Multi-Author Writing Style Analysis

Task

Data

Current Turboveg Data Dictionary and Panarctic Species List (PASL) -...

Data from: An essay on the East-India-trade by the author of The essay upon...

Data from: Saying Thanks and Meaning It: Expressing Gratitude for Social...

PAN25 Multi-Author Writing Style Analysis

Task

Data