11 datasets found
  1. W

    Webis-Editorial-Quality-18

    • webis.de
    • anthology.aicmu.ac.cn
    1340629
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roxanne El Baff; Henning Wachsmuth; Khalid Al-Khatib; Benno Stein (2018). Webis-Editorial-Quality-18 [Dataset]. http://doi.org/10.5281/zenodo.1340629
    Explore at:
    1340629Available download formats
    Dataset updated
    2018
    Dataset provided by
    University of Groningen
    Bauhaus-Universität Weimar
    Leibniz Universität Hannover
    The Web Technology & Information Systems Network
    Deutsches Zentrum für Luft- und Raumfahrt
    Authors
    Roxanne El Baff; Henning Wachsmuth; Khalid Al-Khatib; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis-Editorials-18 corpus comprises 1000 news editorials that are annotated in accordance with a new notion for argumentation quality. The notion regards whether an editorial brings readers of opposing beliefs closer together or rather increases the gap between them. In particular, we label each editorial in the corpus as challenging, reinforcing, or no-effect. To account for the political ideology of the target readers, each editorial is labelled by three liberals and three conservatives.

  2. E

    Webis-Editorial-Quality-18 corpus

    • live.european-language-grid.eu
    csv
    Updated Apr 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Webis-Editorial-Quality-18 corpus [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7424
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 30, 2024
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Challenge or Empower: Revisiting Argumentation Quality in a News Editorial Corpus.The Webis-Editorial-Quality-18 corpus is a novel corpus with 1000 news editorials. The aim of this Corpus is to study a new notion for news editorials quality. It contains the quality assessments of 1000 news editorials, each annotated by three liberals and three conservatives. The annotators also reported free-text reasons for the effects they observed.

  3. E

    Webis EditorialSum Corpus 2020

    • live.european-language-grid.eu
    csv
    Updated Oct 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Webis EditorialSum Corpus 2020 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7658
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 19, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis EditorialSum Corpus consists of 1330 manually curated extractive summaries for 266 news editorials spanning three diverse portals: Al-Jazeera, Guardian and Fox News. Each editorial has 5 summaries, each labeled for overall quality and fine grained properties such as thesis-relevance, persuasiveness, reasonableness, self-containedness.The files are organized as follows:corpus.csv - Contains all the editorials and their acquired summariesNote: (X = [1,5] for five summaries)- article_id : Article ID in the corpus- title : Title of the editorial- article_text : Plain text of the editorial- summary_{X}_text : Plain text of the corresponding summary- thesis_{X}_text : Plain text of the thesis from the corresponding summary- lead : top 15% of the editorial's segments- body : segments between lead and conclusion sections- conclusion : bottom 15% of the editorial's segments- article_segments: Collection of paragraphs, each further divided into collection of segments containing: { "number": segment order in the editorial, "text" : segment text, "label": ADU type }- summary_{X}_segments: Collection of summary segments containing:{ "number": segment order in the editorial, "text" : segment text, "adu_label": ADU type from the editorial, "summary_label": can be 'thesis' or 'justification'}quality-groups.csv - Contains the IDs for high(and low)-quality summaries for each quality dimension per editorialFor example: article_id 2 has four high_quality summaries (summary_1, summary_2, summary_3, summary_4) and one low_quality summary (summary_5) in terms of overall quality.The summary texts can be obtained from corpus.csv respectively.

  4. E

    Webis Abstractive Snippet Corpus 2020

    • live.european-language-grid.eu
    json
    Updated Aug 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Webis Abstractive Snippet Corpus 2020 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7817
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Aug 19, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Webis Abstractive Snippet 2020 (Webis-Snippete-20) comprises four abstractive snippet dataset from ClueWeb09, Clueweb12, and DMOZ descriptions. More than 10 million

  5. E

    COVID-19 CDC dataset v1. Bilingual (EN-FR)

    • live.european-language-grid.eu
    tmx
    Updated Apr 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). COVID-19 CDC dataset v1. Bilingual (EN-FR) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/21051
    Explore at:
    tmxAvailable download formats
    Dataset updated
    Apr 25, 2020
    License

    https://elrc-share.eu/terms/publicDomain.htmlhttps://elrc-share.eu/terms/publicDomain.html

    Description

    EN-FR Bilingual COVID-19-related corpus acquired from the website (https://www.cdc.gov/) of the Centers for Disease Control and Prevention of US government (25th April 2020)

  6. E

    COVID-19 CDC dataset v2. Multilingual (EN, ES, FR, PT, IT, DE, KO, RU, ZH,...

    • live.european-language-grid.eu
    tmx
    Updated Aug 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). COVID-19 CDC dataset v2. Multilingual (EN, ES, FR, PT, IT, DE, KO, RU, ZH, UK, VI) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/21340
    Explore at:
    tmxAvailable download formats
    Dataset updated
    Aug 15, 2020
    License

    https://elrc-share.eu/terms/publicDomain.htmlhttps://elrc-share.eu/terms/publicDomain.html

    Description

    Multilingual (EN, ES, FR, PT, IT, DE, KO, RU, ZH, UK, VI) COVID-19-related corpus acquired from the website (https://www.cdc.gov/) of the Centers for Disease Control and Prevention of US government (11th August 2020). It contains 51202 TUs in total.

  7. E

    BMVI Website (Processed)

    • live.european-language-grid.eu
    • data.europa.eu
    tmx
    Updated Mar 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). BMVI Website (Processed) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/3028
    Explore at:
    tmxAvailable download formats
    Dataset updated
    Mar 1, 2018
    License

    https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html

    Description

    tmx file, 2718 TUs, bilingual German/English, texts from the website of the Federal Ministry of Transport and Digital Infrastructure (BMVI) on transport issues. Original tmx file corrected and stripped

  8. E

    German-French website parallel corpus from the Federal Foreign Office Berlin...

    • live.european-language-grid.eu
    • data.europa.eu
    tmx
    Updated Jan 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). German-French website parallel corpus from the Federal Foreign Office Berlin [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/2874
    Explore at:
    tmxAvailable download formats
    Dataset updated
    Jan 11, 2022
    License

    https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html

    Area covered
    French
    Description

    German-French texts extracted from the website of the Federal Foreign Office Berlin. This includes 11,852 pairs that were translated between October 2013 and the beginning of November 2015 and converted into a .TMX file format.

  9. E

    German-Portuguese website parallel corpus from the Federal Foreign Office...

    • live.european-language-grid.eu
    • data.europa.eu
    tmx
    Updated Jan 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). German-Portuguese website parallel corpus from the Federal Foreign Office Berlin [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/2875
    Explore at:
    tmxAvailable download formats
    Dataset updated
    Jan 12, 2022
    License

    https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html

    Description

    German-Portuguese texts extracted from the website of the Federal Foreign Office Berlin. This includes 415 pairs that were translated between September 2013 and the beginning of December 2015 and converted into a .TMX file format.

  10. E

    Croatian-English parallel corpus from the website of the Croatian Journal of...

    • live.european-language-grid.eu
    • catalog.elra.info
    • +1more
    tmx
    Updated Nov 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (Processed) [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/3178
    Explore at:
    tmxAvailable download formats
    Dataset updated
    Nov 19, 2018
    License

    Attribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
    License information was derived automatically

    Description

    Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (https://ribarstvo.agr.hr/)

  11. E

    BMI Brochures and Website 2016

    • live.european-language-grid.eu
    • data.europa.eu
    tmx
    Updated Jan 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). BMI Brochures and Website 2016 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/2886
    Explore at:
    tmxAvailable download formats
    Dataset updated
    Jan 16, 2022
    License

    https://elrc-share.eu/terms/openUnderPSI.htmlhttps://elrc-share.eu/terms/openUnderPSI.html

    Description

    Bilingual tmx file of German to English translations of the Federal Ministry of the Interior's website and brochures. Topics include terrorism, cyber security, asylum, cultural property, public administration and sport.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Roxanne El Baff; Henning Wachsmuth; Khalid Al-Khatib; Benno Stein (2018). Webis-Editorial-Quality-18 [Dataset]. http://doi.org/10.5281/zenodo.1340629

Webis-Editorial-Quality-18

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
1340629Available download formats
Dataset updated
2018
Dataset provided by
University of Groningen
Bauhaus-Universität Weimar
Leibniz Universität Hannover
The Web Technology & Information Systems Network
Deutsches Zentrum für Luft- und Raumfahrt
Authors
Roxanne El Baff; Henning Wachsmuth; Khalid Al-Khatib; Benno Stein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis-Editorials-18 corpus comprises 1000 news editorials that are annotated in accordance with a new notion for argumentation quality. The notion regards whether an editorial brings readers of opposing beliefs closer together or rather increases the gap between them. In particular, we label each editorial in the corpus as challenging, reinforcing, or no-effect. To account for the political ideology of the target readers, each editorial is labelled by three liberals and three conservatives.

Search
Clear search
Close search
Google apps
Main menu