Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-Editorials-16 corpus is a novel corpus with 300 news editorials evenly selected from three diverse online news portals: Al Jazeera, Fox News, and The Guardian. The aim of the corpus is to study (1) the mining and classification of fine-grained types of argumentative discourse units and (2) the analysis of argumentation strategies pursued in editorials to achieve persuasion. To this end, each editorial contains manual type annotations of all units that capture the role that a unit plays in the argumentative discourse, such as assumption or statistics. The corpus consists of 14,313 units of six different types, each annotated by three professional annotators from the crowdsourcing platform upwork.com.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Newer Version of this corpus in the 2018 version can be found here: https://doi.org/10.5281/zenodo.1340629
The Webis-Editorials-16 corpus is a novel corpus with 300 news editorials evenly selected from three diverse online news portals: Al Jazeera, Fox News, and The Guardian. The aim of the corpus is to study (1) the mining and classification of fine-grained types of argumentative discourse units and (2) the analysis of argumentation strategies pursued in editorials to achieve persuasion. To this end, each editorial contains manual type annotations of all units that capture the role that a unit plays in the argumentative discourse, such as assumption or statistics. The corpus consists of 14,313 units of six different types, each annotated by three professional annotators from the crowdsourcing platform upwork.com.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-CMV-20 dataset comprises all available posts and comments in the ChangeMyView subreddit from the foundation of the subreddit in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.Dataset specificationAll files are in bzip2-compressed JSON Lines format.
threads.jsonl: contains all the selected discussion threads from CMVpairs.jsonl: each record contains submission, delta-comment and nondelta-comment and the comments' similarity scoreposts-malleability.jsonl: contains posts for opinion mallebility prediction, in the format provided in the original Reddit Crawl datasetauthor_entity_category.jsonl: each record contains the author and list of Wikipedia entities mentioned by the author in the messages across all subreddits. For each mentioned entity we provide the following data:
[title, wikidata_id, wikipedia_page_id, mentioned_entity_title, wikifier_score, subreddit_name, subreddit_id, subreddit_category_name, subreddit_topcategory_name]
author_liwc.jsonl: personality traits features computed with LIWC for the authors from pairs.jsonl and post_malleability.jsonl datasetsauthor_subreddit.jsonl: for each author statistics of all number of all posts (submissions/comments) across all subreddits are providedauthor_subreddit_category.jsonl: similar to author_subreddit.jsonl, the statistics of all author posts is grouped by top-categories and categories according to snoopsnoo.com
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-Editorials-16 corpus is a novel corpus with 300 news editorials evenly selected from three diverse online news portals: Al Jazeera, Fox News, and The Guardian. The aim of the corpus is to study (1) the mining and classification of fine-grained types of argumentative discourse units and (2) the analysis of argumentation strategies pursued in editorials to achieve persuasion. To this end, each editorial contains manual type annotations of all units that capture the role that a unit plays in the argumentative discourse, such as assumption or statistics. The corpus consists of 14,313 units of six different types, each annotated by three professional annotators from the crowdsourcing platform upwork.com.