Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Cost to access
Described as free to access or have a license that allows redistribution.
4 datasets found
  1. corpus

    Updated Apr 1, 2020
  2. corpus

    Updated 2019
  3. corpus

    Updated Oct 27, 2020
  4. webis-argquality-20

    Updated 2020
  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Click to copy link
Link copied
Yamen Ajjour; Henning Wachsmuth; Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein (2020). corpus [Dataset].
Organization logoOrganization logoOrganization logoOrganization logo corpus

59 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Apr 1, 2020
Dataset provided by
Leipzig University
Paderborn University
Martin-Luther-University Halle-Wittenberg
Bauhaus-Universität Weimar
Yamen Ajjour; Henning Wachsmuth; Johannes Kiesel; Martin Potthast; Matthias Hagen; Benno Stein

Attribution 4.0 (CC BY 4.0)
License information was derived automatically


The corpus comprises 387 740 arguments. They are crawled from the debate portals Debatewise (14 353 arguments), (13 522 arguments), Debatepedia (21 197 arguments), and (338 620 arguments). Moreover, the corpus contains 48 arguments from Canadian Parliament discussions. The arguments are extracted using heuristics that are designed for each debate portal.

These arguments are the ones currently provided through the search engine. Note that the args API does not return the sourceText (which is indexed by an included in this dataset) due to its size.

Cite as Henning Wachsmuth, Martin Potthast, Khalid Al-Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, and Benno Stein. Building an Argument Search Engine for the Web. In 4th Workshop on Argument Mining (ArgMining 2017) at EMNLP, pages 49-59, September 2017. Association for Computational Linguistics.

Cite this dataset as Yamen Ajjour, Henning Wachsmuth, Johannes Kiesel, Martin Potthast, Matthias Hagen, and Benno Stein. Data Acquisition for Argument Search: The corpus. In 42nd German Conference on Artificial Intelligence (KI 2019), September 2019. Springer. and with the DOI of Zenodo.

The development for is hosted in our Gitlab.

This collection is licensed with the Creative Commons Attribution 4.0 International. Individual rights to the content still apply.

Clear search
Close search
Google apps
Main menu