4 datasets found
  1. PAN25 Multi-Author Writing Style Analysis

    • zenodo.org
    zip
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein (2025). PAN25 Multi-Author Writing Style Analysis [Dataset]. http://doi.org/10.5281/zenodo.15053260
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset for the shared task on Multi-Author Writing Style Analysis PAN@CLEF2025. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.

    Task

    The goal of the style change detection task is to identify text positions within a given multi-author document at which the author switches. Hence, a fundamental question is the following: If multiple authors together have written a text, can we find evidence for this fact; do we have a means to detect variations in the writing style? Answering this question belongs to the most difficult and most interesting challenges in author identification: Style change detection is the only means to detect plagiarism in a document if no comparison texts are given; likewise, style change detection can help to uncover gift authorships, to verify a claimed authorship, or to develop new technology for writing support.

    Previous editions of the multi-author writing style analysis task aim at e.g., detecting whether a document is single- or multi-authored (2018), the actual number of authors within a document (2019), whether there was a style change between two consecutive paragraphs (2020, 2021, 2022), and where the actual style changes were located (2021, 2022). In 2022, style changes also had to be detected on the sentence level. The previously used datasets exhibited high topic diversity, which allowed the participants to leverage topic information as a style change signal. In the 2023 and 2024 editions of the writing style analysis task, special attention is paid to this issue.

    We ask participants to solve the following intrinsic style change detection task: for a given text, find all positions of writing style change on the sentence-level (i.e., for each pair of consecutive sentences, assess whether there was a style change). The simultaneous change of authorship and topic will be carefully controlled and we will provide participants with datasets of three difficulty levels:

    1. Easy: The sentences of a document cover a variety of topics, allowing approaches to make use of topic information to detect authorship changes.
    2. Medium: The topical variety in a document is small (though still present) forcing the approaches to focus more on style to effectively solve the detection task.
    3. Hard: All sentences in a document are on the same topic.

    All documents are provided in English and may contain an arbitrary number of style changes. However, style changes may only occur between sentences (i.e., a single sentence is always authored by a single author and contains no style changes).

    Data

    To develop and then test your algorithms, three datasets including ground truth information are provided (easy for the easy task, medium for the medium task, and hard for the hard task).

    Each dataset is split into three parts:

    1. training set: Contains 70% of the whole dataset and includes ground truth data. Use this set to develop and train your models.
    2. validation set: Contains 15% of the whole dataset and includes ground truth data. Use this set to evaluate and optimize your models.
    3. test set: Contains 15% of the whole dataset, no ground truth data is given. This set is used for evaluation.

    You are free to use additional external data for training your models. However, we ask you to make the additional data utilized freely available under a suitable license.

  2. g

    Current Turboveg Data Dictionary and Panarctic Species List (PASL) -...

    • arcticatlas.geobotany.org
    Updated Sep 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Current Turboveg Data Dictionary and Panarctic Species List (PASL) - Datasets - Alaska Arctic Geoecological Atlas [Dataset]. https://arcticatlas.geobotany.org/catalog/dataset/current-turboveg-data-dictionary-and-panarctic-species-list-pasl
    Explore at:
    Dataset updated
    Sep 1, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Arctic
    Description

    These are the most recent Data Dictionary (pop-ups) and Panarctic Species List (PASL) zip files for all the vegetation plot data entered into Turboveg for the Alaska AVA. These files are necessary to correctly use the Turboveg data with regards to coded data. The Data Dictionary file will be updated when new datasets are entered into Turboveg which result in additions to coded data such as references, author code, habitat type, surficial geology, etc. Updates to the PASL will occur less frequently. Check the dates in the file names to be certain that you are using the most current files. Our data model is a set of tables that comprise our relational database. The Excel spreadsheet included in the resources below provides information about each field in our database, such as data type, description, if it is a required field, whether the information within the field is selected from a pop-up list, and whether the field is a standard within Turboveg or is specific to the AVA. Using Turboveg: 1) Download the installation file available through the link at Alaska Arctic Geoecological Atlas portal from the official Turboveg webpage (general installation file for worldwide users, however, some adjustments will be needed when using data from AAVA after installation of this program). 2) Open the Turboveg program and restore the most recent Data Dictionary and PASL zipped files into the Turboveg program by using the function 'Database-Backup/Restore-Restore.' All the previous versions of data dictionary files and PASL that are already in program will be overwritten. 3) Use the Alaska-AVA following the manual for Turboveg for Windows which is available at http://www.synbiosys.alterra.nl/turboveg/tvwin.pdf

  3. o

    Data from: An essay on the East-India-trade by the author of The essay upon...

    • llds.ling-phil.ox.ac.uk
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Davenant (2024). An essay on the East-India-trade by the author of The essay upon wayes and means. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A37163
    Explore at:
    Dataset updated
    Jun 24, 2024
    Authors
    Charles Davenant
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    (:unav)...........................................

  4. d

    Data from: Saying Thanks and Meaning It: Expressing Gratitude for Social...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Author, Anonymous 1; Author, Anonymous 2 (2023). Saying Thanks and Meaning It: Expressing Gratitude for Social Gain [Dataset]. http://doi.org/10.7910/DVN/PBMCMZ
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Author, Anonymous 1; Author, Anonymous 2
    Description

    People sometimes give thanks as a true expression of their feeling but also sometimes because they know gratitude expression helps to make a certain social impression. That is, some gratitude is expressed because of intrinsic motivations or extrinsic motivations. Such motivations affect the outcomes of behavior. The present work assessed gratitude, trait tendency to manage socially desirable expressions, and well-being across two studies (combined n = 398). Motivations to express gratitude were also measured and impression management goals were manipulated in study 2. Results show that gratitude expression is highest when people want to make a good impression and extrinsic motives to express gratitude can moderate the relationship between gratitude and well-being. Implications for the measurement of gratitude and theoretical understanding of gratitude’s social function are discussed.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein (2025). PAN25 Multi-Author Writing Style Analysis [Dataset]. http://doi.org/10.5281/zenodo.15053260
Organization logo

PAN25 Multi-Author Writing Style Analysis

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Mar 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eva Zangerle; Eva Zangerle; Maximilian Mayerl; Maximilian Mayerl; Martin Potthast; Martin Potthast; Benno Stein; Benno Stein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the dataset for the shared task on Multi-Author Writing Style Analysis PAN@CLEF2025. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.

Task

The goal of the style change detection task is to identify text positions within a given multi-author document at which the author switches. Hence, a fundamental question is the following: If multiple authors together have written a text, can we find evidence for this fact; do we have a means to detect variations in the writing style? Answering this question belongs to the most difficult and most interesting challenges in author identification: Style change detection is the only means to detect plagiarism in a document if no comparison texts are given; likewise, style change detection can help to uncover gift authorships, to verify a claimed authorship, or to develop new technology for writing support.

Previous editions of the multi-author writing style analysis task aim at e.g., detecting whether a document is single- or multi-authored (2018), the actual number of authors within a document (2019), whether there was a style change between two consecutive paragraphs (2020, 2021, 2022), and where the actual style changes were located (2021, 2022). In 2022, style changes also had to be detected on the sentence level. The previously used datasets exhibited high topic diversity, which allowed the participants to leverage topic information as a style change signal. In the 2023 and 2024 editions of the writing style analysis task, special attention is paid to this issue.

We ask participants to solve the following intrinsic style change detection task: for a given text, find all positions of writing style change on the sentence-level (i.e., for each pair of consecutive sentences, assess whether there was a style change). The simultaneous change of authorship and topic will be carefully controlled and we will provide participants with datasets of three difficulty levels:

  1. Easy: The sentences of a document cover a variety of topics, allowing approaches to make use of topic information to detect authorship changes.
  2. Medium: The topical variety in a document is small (though still present) forcing the approaches to focus more on style to effectively solve the detection task.
  3. Hard: All sentences in a document are on the same topic.

All documents are provided in English and may contain an arbitrary number of style changes. However, style changes may only occur between sentences (i.e., a single sentence is always authored by a single author and contains no style changes).

Data

To develop and then test your algorithms, three datasets including ground truth information are provided (easy for the easy task, medium for the medium task, and hard for the hard task).

Each dataset is split into three parts:

  1. training set: Contains 70% of the whole dataset and includes ground truth data. Use this set to develop and train your models.
  2. validation set: Contains 15% of the whole dataset and includes ground truth data. Use this set to evaluate and optimize your models.
  3. test set: Contains 15% of the whole dataset, no ground truth data is given. This set is used for evaluation.

You are free to use additional external data for training your models. However, we ask you to make the additional data utilized freely available under a suitable license.

Search
Clear search
Close search
Google apps
Main menu