Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interface, questionnaires, and collected data for the paper "Investigating Expectations for Voice-based and Conversational Argument Search on the Web".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-CMV-20 dataset comprises all available posts and comments in the ChangeMyView subreddit from the foundation of the subreddit in 2005, until September 2017. From these, we have derived two sub-datasets for the tasks of persuasiveness prediction, and opinion malleability prediction. In addition, the corpus comprises historical posts by CMV authors, and derived personal characteristics.Dataset specificationAll files are in bzip2-compressed JSON Lines format.
threads.jsonl: contains all the selected discussion threads from CMVpairs.jsonl: each record contains submission, delta-comment and nondelta-comment and the comments' similarity scoreposts-malleability.jsonl: contains posts for opinion mallebility prediction, in the format provided in the original Reddit Crawl datasetauthor_entity_category.jsonl: each record contains the author and list of Wikipedia entities mentioned by the author in the messages across all subreddits. For each mentioned entity we provide the following data:
[title, wikidata_id, wikipedia_page_id, mentioned_entity_title, wikifier_score, subreddit_name, subreddit_id, subreddit_category_name, subreddit_topcategory_name]
author_liwc.jsonl: personality traits features computed with LIWC for the authors from pairs.jsonl and post_malleability.jsonl datasetsauthor_subreddit.jsonl: for each author statistics of all number of all posts (submissions/comments) across all subreddits are providedauthor_subreddit_category.jsonl: similar to author_subreddit.jsonl, the statistics of all author posts is grouped by top-categories and categories according to snoopsnoo.com
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interface, questionnaires, and collected data for the paper "Investigating Expectations for Voice-based and Conversational Argument Search on the Web".