Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. Webis-Sentences-17

    • webis.de
    Updated Feb 27, 2017
  2. Webis-Simple-Sentences-17 Corpus

    • zenodo.org
    • search.datacite.org
    gz
    Updated Feb 27, 2017
  3. Webis-Mnemonics-17

    • webis.de
    • zenodo.org
    Updated 2017
  4. T

    reddit

    • www.tensorflow.org
    • tensorflow.google.cn
    Updated Oct 9, 2020
  5. ChEMBL

    • www.ebi.ac.uk
    txt, xml +3
  6. w

    Race and the criminal justice system statistics 2018

    • www.gov.uk
    Updated Nov 28, 2019
  7. w

    Women and the criminal justice system 2017

    • www.gov.uk
    Updated Nov 29, 2018
  8. g

    Statistics Canada, Cases in adult criminal court by type of sentence:...

    • geocommons.com
    Updated Jun 25, 2008
  9. Contract History

    • open.canada.ca
    • data.wu.ac.at
    xml, csv
    Updated Nov 21, 2020
  10. d

    Occupancy Permits in the Last 30 Days

    • opendata.dc.gov
    • beta-dcgis.opendata.arcgis.com
    Updated Jun 17, 2015
  11. Bilingual English-Danish parallel corpus from National Museum of Denmark...

    • data.europa.eu
    • sprogteknologi.dk
    Updated 24. 2. 2020
  12. d

    Construction Permits in 2015

    • opendata.dc.gov
    • esri-dc-office.hub.arcgis.com
    Updated Dec 17, 2015
  13. Bilingual English-Danish parallel corpus from Danish Ministry of Foreign...

    • data.europa.eu
    • sprogteknologi.dk
    • +1more
    Updated 24. 2. 2020
  14. d

    Occupancy Permits in 2013

    • opendata.dc.gov
    Updated Jun 17, 2015
  15. Sentence examples per language and per condition.

    • plos.figshare.com
    xls
    Updated Oct 3, 2017
  16. Shows types of interaction, example of verbs, representative sentences and...

    • plos.figshare.com
    xls
    Updated Feb 18, 2016
  17. Complaints about the supervision of postgraduate students Discussion paper...

    • data.nsw.gov.au
    pdf
    Updated Jun 25, 2019
  18. PAN17 Multi-Author Analysis: Style-Change-Detection

    • zenodo.org
    Updated Sep 11, 2017
  19. WHO Coronavirus disease (COVID-19) situation reports

    • www.who.int
    pdf
  20. Impression differences from photograph only and photograph with negotiation...

    • plos.figshare.com
    xls
    Updated Apr 6, 2018
  21. Bilingual English-Icelandic parallel corpus from Nordisk eTax website

    • data.europa.eu
    Updated 24 Φεβ 2020
  22. H

    Health and Retirement Study (HRS)

    • dataverse.harvard.edu
    Updated May 30, 2013
  23. /

    Data from: A new method for analyzing sustainability performance of global...

    • data.mendeley.com
    • search.datacite.org
    Updated Aug 17, 2019
  24. d

    Ambulance Services

    • digital.nhs.uk
    zip, xlsx, pdf
    Updated Jun 17, 2015
  25. O

    Major Drainage Basin Set

    • data.ct.gov
    • ct-deep-gis-open-data-website-ctdeep.hub.arcgis.com
    xml, tsv +4
    Updated Nov 17, 2020
  26. Bilingual English-Danish parallel corpus from Odense Municipality website

    • data.europa.eu
    • sprogteknologi.dk
    Updated 24.02.2020 г.
  27. d

    Occupancy Permits in 2015

    • opendata.dc.gov
    Updated Dec 17, 2015
  28. Bilingual English-Danish parallel corpus from Danish Maritime Authority...

    • data.europa.eu
    • sprogressource.digst.govcloud.dk
    • +2more
    Updated 24.02.2020 г.
  29. Bilingual English-Danish parallel corpus from Visit Vejle website

    • data.europa.eu
    • sprogteknologi.dk
    Updated 24.02.2020 г.
  30. Bilingual English-Danish parallel corpus from The Viking Ship Museum website...

    • data.europa.eu
    Updated 24.02.2020 г.
  31. Global mobile data traffic 2017-2022

    • www.statista.com
    Updated Feb 28, 2020
  32. Bilingual English-Danish parallel corpus from The Danish Environmental...

    • data.europa.eu
    Updated 24.02.2020 г.
  33. Bilingual English-Danish parallel corpus from The Danish Gambling Authority...

    • data.europa.eu
    • sprogressource.digst.govcloud.dk
    • +2more
    Updated 24.02.2020 г.
  34. Bilingual English-Danish parallel corpus from Danish FSA website

    • data.europa.eu
    Updated 24.02.2020 г.
  35. Bilingual English-Danish parallel corpus from The Danish Medicines Agency...

    • data.europa.eu
    Updated 24.02.2020 г.
  36. O

    Soil Survey Geographic Database (SSURGO) Soil Drainage Class

    • data.ct.gov
    • ct-deep-gis-open-data-website-ctdeep.hub.arcgis.com
    tsv +5
    Updated Nov 17, 2020
  37. Bilingual English-Lithuanian parallel corpus from the Bank of Lithuania...

    • data.europa.eu
    Updated 24.02.2020 г.
  38. Example of the open coding procedure, sentence by sentence.

    • plos.figshare.com
    xls
    Updated Feb 1, 2017
  39. O

    Soil Survey Geographic Database (SSURGO) Soil Potential Ratings for...

    • data.ct.gov
    • ct-deep-gis-open-data-website-ctdeep.hub.arcgis.com
    tsv, xml +4
    Updated Nov 17, 2020
  40. A Canadian French Emotional Speech Dataset

    • zenodo.org
    zip
    Updated Apr 17, 2018
  41. Overview of the sentences as used in this study.

    • figshare.com
    xls
    Updated Oct 31, 2016
  42. Pre-trained Word Vectors for Spanish

    • www.kaggle.com
    zip
    Updated Aug 9, 2017
  43. Bilingual English-Danish parallel corpus from VisitDenmark - The official...

    • data.europa.eu
    • sprogteknologi.dk
    Updated 24. 2. 2020
  44. Bilingual hr-en parallel corpus from Croatian National Bank website...

    • data.europa.eu
    • www.europeandataportal.eu
    Updated 24.02.2020 г.
  45. /

    Data for: Language Models, Surprisal and Fantasy in Slavic...

    • data.mendeley.com
    • search.datacite.org
    Updated Aug 29, 2018
  46. O

    Surficial Aquifer Texture

    • data.ct.gov
    • ct-deep-gis-open-data-website-ctdeep.hub.arcgis.com
    tsv +5
    Updated Nov 17, 2020
  47. C

    Story Map Basic

    • data.cityofdenton.com
    • data-salemva.opendata.arcgis.com
    html, esri rest
    Updated Dec 10, 2019
  48. Analyses of RTs and error rates: correct vs. incorrect sentence flankers.

    • plos.figshare.com
    xls
    Updated Mar 10, 2017
  49. O

    Drainage Basin Set

    • data.ct.gov
    • ct-deep-gis-open-data-website-ctdeep.hub.arcgis.com
    xml +5
    Updated Nov 17, 2020
  50. O

    MnIReport2015 Counties

    • opendata.utah.gov
    • hub.arcgis.com
    application/rssxml +5
    Updated Oct 16, 2020
  51. EU Open Data Portal - API

    • data.europa.eu
    • www.europeandataportal.eu
    • +1more
    Updated 17.06.2020 г.
  52. O

    Data from: Connecticut Surficial Stratified Drift

    • data.ct.gov
    • ct-deep-gis-open-data-website-ctdeep.hub.arcgis.com
    xml +5
    Updated Nov 17, 2020
  53. a

    MnIReport2018 Counties

    • dwre-utahdnr.opendata.arcgis.com
    Updated Feb 12, 2020
  54. a

    MnIReport2018 CulinaryWaterSuppliers

    • dwre-utahdnr.opendata.arcgis.com
    Updated Feb 12, 2020
  55. a

    MnIReport2018 Basins

    • hub.arcgis.com
    Updated Feb 12, 2020
  56. Bilingual English-Danish parallel corpus from The Agency for Culture and...

    • data.europa.eu
    • sprogressource.digst.govcloud.dk
    • +2more
    Updated 24.02.2020 г.
  57. a

    MnIReport2018 Statewide

    • hub.arcgis.com
    Updated Feb 12, 2020
  58. Impression differences from photograph only and photograph with negotiation...

    • figshare.com
    xls
    Updated Apr 6, 2018
  59. a

    MnIReport2016 Counties

    • hub.arcgis.com
    Updated Jan 15, 2019
  60. a

    2018 Municipal and Industrial Water Use

    • dwre-utahdnr.opendata.arcgis.com
    Updated Feb 12, 2020
  61. a

    MnIReport2016 CulinaryWaterSuppliers

    • hub.arcgis.com
    Updated Jan 15, 2019
  62. a

    MnIReport2016 Statewide

    • hub.arcgis.com
    Updated Jan 15, 2019
  63. a

    MnIReport2016 Basins

    • hub.arcgis.com
    • opendata.utah.gov
    Updated Jan 15, 2019
  64. f

    2016 Municipal and Industrial Water Use Databases

    • geodata.freshfromflorida.com
    • hub.arcgis.com
    Updated Jan 15, 2019
  65. Romanian Parliament Transcripts 1996-2018 (Processed)

    • data.europa.eu
    Updated Feb 24, 2020
  66. O

    MnIReport2017 CulinaryWaterSuppliers

    • opendata.utah.gov
    • dwre-utahdnr.opendata.arcgis.com
    application/rssxml +5
    Updated Jan 2, 2020
  67. O

    MnIReport2017 Statewide

    • opendata.utah.gov
    • hub.arcgis.com
    application/rssxml +5
    Updated Jan 2, 2020
  68. Number of examples from each category in each test for training and testing,...

    • plos.figshare.com
    xls
    Updated Jul 8, 2019
  69. H

    Common Ownership Data: Scraped SEC form 13F filings for 1999-2017

    • dataverse.harvard.edu
    txt +4
    Updated Aug 17, 2020
  70. Simulated examples for Section 2.2.

    • plos.figshare.com
    xls
    Updated Mar 27, 2019
  71. Bilingual English-Danish parallel corpus from Danish Ministry of Transport,...

    • data.europa.eu
    • www.europeandataportal.eu
    Updated 24.02.2020 г.
  72. Bilingual English-Danish parallel corpus from The Geological Survey of...

    • data.europa.eu
    • sprogteknologi.dk
    Updated 24.02.2020 г.
  73. Bilingual English-Danish parallel corpus from The Danish Nature Agency...

    • data.europa.eu
    • sprogteknologi.dk
    • +1more
    Updated 24. 2. 2020
  74. Data from: Oxidation of Tricarbonylmolybdacarborane. 1. First Examples of...

    • acs.figshare.com
    txt
    Updated Aug 17, 2016
  75. Bilingual English-Danish parallel corpus from Danmarks Statistik website

    • data.europa.eu
    • sprogteknologi.dk
    Updated 24. 2. 2020
  76. Bilingual hr-en parallel corpus from the National and University Library in...

    • data.europa.eu
    Updated 24.02.2020 г.
  77. Bilingual hr-en parallel corpus from the Journal of the Croatian Association...

    • data.europa.eu
    Updated 24.02.2020 г.
  78. Bilingual English-Danish parallel corpus from Danish Ministry of Higher...

    • data.europa.eu
    Updated 24.02.2020 г.
  79. a

    City Annexations

    • hub.arcgis.com
    Updated Nov 17, 2015
  80. Bilingual English-Lithuanian parallel corpus from Seimas of the Republic of...

    • data.europa.eu
    Updated 24.02.2020 г.
  81. Bilingual English-Lithuanian parallel corpus from the Ministry of National...

    • data.europa.eu
    Updated 24.02.2020 г.
  82. Basic negotiation sentences.

    • figshare.com
    xls
    Updated Apr 6, 2018
  83. d

    Potential Groundwater Dependent Ecosystems for West Gippsland Catchment...

    • data.gov.au
    • data.wu.ac.at
    zip
    Updated Nov 19, 2019
  84. Bilingual English-Danish parallel corpus from Aarhus 2017 - European Capital...

    • data.europa.eu
    • sprogteknologi.dk
    Updated 24.02.2020 г.
  85. Specification of the toy examples.

    • plos.figshare.com
    xls
    Updated Feb 17, 2016
  86. Default chunking result for one sentence.

    • figshare.com
    xls
    Updated Jun 24, 2017
  87. a

    Occupancy Permits in 2015

    • hub.arcgis.com
    Updated Dec 17, 2015
  88. FIFA 18 Complete Player Dataset

    • www.kaggle.com
    • wb.n3ncloud.co.kr
    zip
    Updated Oct 30, 2017
  89. Two examples that correspond with Fig 2.

    • plos.figshare.com
    xls
    Updated May 31, 2017
  90. Four examples (a-d) adapted from example (2) in Table 8.

    • plos.figshare.com
    xls
    Updated Jul 18, 2019
  91. a

    Significant Ecological Area (SEA)

    • egis-lacounty.hub.arcgis.com
    • geohub.lacity.org
    Updated Jan 17, 2019
  92. u

    Data from: Supplementary Material for 'Relevance-based Interactive...

    • pub.uni-bielefeld.de
    • search.datacite.org
    Updated 06.02.2019
  93. Examples of control, minor violation, and major violation scenarios and...

    • plos.figshare.com
    xls
    Updated Sep 4, 2019
  94. DBpedia

    • datahub.io
    • datahub.ckan.io
    • +1more
    api/sparql, rdf, html +6
    Updated Jun 20, 2017
  95. Examples of expected, unexpected and control questions that require a YES or...

    • plos.figshare.com
    xls
    Updated May 19, 2017
  96. Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

    • www.openicpsr.org
    • search.datacite.org
    • +1more
    Updated Oct 20, 2020
  97. C

    Story Map Series

    • data.cityofdenton.com
    • hub.arcgis.com
    html, esri rest
    Updated Dec 10, 2019
  98. Examples of vague descriptions across different classes.

    • plos.figshare.com
    xls
    Updated Oct 16, 2018
  99. Examples using the Lagrangian relaxation based heuristic.

    • plos.figshare.com
    xls
    Updated Jul 11, 2017
  100. Simulated examples for Section 2.1.

    • figshare.com
    xls
    Updated Mar 27, 2019
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kiesel, Johannes; Stein, Benno; Lucks, Stefan (2017) Webis-Sentences-17. [Dataset] http://doi.org/10.5281/zenodo.205950
Organization logo

Webis-Sentences-17

Dataset updated Feb 27, 2017
Dataset provided by
Bauhaus-Universität Weimarhttps://www.uni-weimar.de/
The Web Technology & Information Systems Network
Authors
Kiesel, Johannes; Stein, Benno; Lucks, Stefan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Webis-Sentences-17 corpus is a collection of 3,369,618,811 sentences extracted from the ClueWeb12 web crawl. It is designed to allow for statistical analyses of human-written sentences. More details on the sentence extraction can be found in the associated publication. The Webis-Simple-Sentences-17 corpus contains 471,085,690 English sentences from the Webis-Sentences-17 corpus. The sentences were sampled to achieve a level of sentence complexity similar to the one of sentences that humans make up as a memory aid for remembering passwords. Sentence complexity was determined by syllables per word. Both corpora are split in training and test set as they are used in the associated publication. The test set is extracted from part 00 of the ClueWeb12, while the training set is extracted from the other parts.

Search
Clear search
Close search
Google apps
Main menu