Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-Editorials-18 corpus comprises 1000 news editorials that are annotated in accordance with a new notion for argumentation quality. The notion regards whether an editorial brings readers of opposing beliefs closer together or rather increases the gap between them. In particular, we label each editorial in the corpus as challenging, reinforcing, or no-effect. To account for the political ideology of the target readers, each editorial is labelled by three liberals and three conservatives.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Challenge or Empower: Revisiting Argumentation Quality in a News Editorial Corpus.The Webis-Editorial-Quality-18 corpus is a novel corpus with 1000 news editorials. The aim of this Corpus is to study a new notion for news editorials quality. It contains the quality assessments of 1000 news editorials, each annotated by three liberals and three conservatives. The annotators also reported free-text reasons for the effects they observed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.The files are organized as follows:corpus.csv - Contains all the editorials and their acquired summariesNote: (X = [1,5] for five summaries)- article_id : Article ID in the corpus- title : Title of the editorial- article_text : Plain text of the editorial- summary_{X}_text : Plain text of the corresponding summary- thesis_{X}_text : Plain text of the thesis from the corresponding summary- lead : top 15% of the editorial's segments- body : segments between lead and conclusion sections- conclusion : bottom 15% of the editorial's segments- article_segments: Collection of paragraphs, each further divided into collection of segments containing: { "number": segment order in the editorial, "text" : segment text, "label": ADU type }- summary_{X}_segments: Collection of summary segments containing:{ "number": segment order in the editorial, "text" : segment text, "adu_label": ADU type from the editorial, "summary_label": can be 'thesis' or 'justification'}quality-groups.csv - Contains the IDs for high(and low)-quality summaries for each quality dimension per editorialFor example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.The summary texts can be obtained from corpus.csv respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis Abstractive Snippet 2020 (Webis-Snippete-20) comprises four abstractive snippet dataset from ClueWeb09, Clueweb12, and DMOZ descriptions. More than 10 million
https://elrc-share.eu/terms/publicDomain.htmlhttps://elrc-share.eu/terms/publicDomain.html
EN-FR Bilingual COVID-19-related corpus acquired from the website (https://www.cdc.gov/) of the Centers for Disease Control and Prevention of US government (25th April 2020)
https://elrc-share.eu/terms/publicDomain.htmlhttps://elrc-share.eu/terms/publicDomain.html
Multilingual (EN, ES, FR, PT, IT, DE, KO, RU, ZH, UK, VI) COVID-19-related corpus acquired from the website (https://www.cdc.gov/) of the Centers for Disease Control and Prevention of US government (11th August 2020). It contains 51202 TUs in total.
https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html
tmx file, 2718 TUs, bilingual German/English, texts from the website of the Federal Ministry of Transport and Digital Infrastructure (BMVI) on transport issues. Original tmx file corrected and stripped
https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html
German-French texts extracted from the website of the Federal Foreign Office Berlin. This includes 11,852 pairs that were translated between October 2013 and the beginning of November 2015 and converted into a .TMX file format.
https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html
German-Portuguese texts extracted from the website of the Federal Foreign Office Berlin. This includes 415 pairs that were translated between September 2013 and the beginning of December 2015 and converted into a .TMX file format.
Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (https://ribarstvo.agr.hr/)
https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html
Bilingual tmx file of German to English translations of the Federal Ministry of the Interior's website and brochures. Topics include terrorism, cyber security, asylum, cultural property, public administration and sport.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Webis-Editorials-18 corpus comprises 1000 news editorials that are annotated in accordance with a new notion for argumentation quality. The notion regards whether an editorial brings readers of opposing beliefs closer together or rather increases the gap between them. In particular, we label each editorial in the corpus as challenging, reinforcing, or no-effect. To account for the political ideology of the target readers, each editorial is labelled by three liberals and three conservatives.