2 datasets found
  1. o

    PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12)

    • explore.openaire.eu
    • live.european-language-grid.eu
    • +1more
    Updated Sep 20, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maik Anderka; Benno Stein; Michael Völske (2012). PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12) [Dataset]. http://doi.org/10.5281/zenodo.3250134
    Explore at:
    Dataset updated
    Sep 20, 2012
    Authors
    Maik Anderka; Benno Stein; Michael Völske
    Description

    The PAN Wikipedia Quality Flaw Corpus 2012, PAN-WQF-12, provides human-labeled English Wikipedia articles that contain specific quality flaws. The corpus comprises 1,592,226 articles extracted from the English Wikipedia snapshot from January 4th, 2012. A subset of 208,228 articles is labled with ten specific quality flaws, which are listed in the following table. The labeling is based on human-defined cleanup tags. In addition, the corpus comprises 1,383,998 articles that have not been tagged with any cleanup tag. {"references": ["Maik Anderka and Benno Stein. Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia. In Pamela Forner, Jussi Karlgren, and Christa Womser-Hacker, editors, Working Notes Papers of the CLEF 2012 Evaluation Labs, September 2012. ISBN 978-88-904810-3-1. ISSN 2038-4963."]}

  2. W

    PAN-WQF-12

    • webis.de
    3250135
    Updated 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maik Anderka; Benno Stein (2012). PAN-WQF-12 [Dataset]. http://doi.org/10.5281/zenodo.3250135
    Explore at:
    3250135Available download formats
    Dataset updated
    2012
    Dataset provided by
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Diebold Nixdorf
    Authors
    Maik Anderka; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The PAN Wikipedia Quality Flaw Corpus 2012, PAN-WQF-12, provides human-labeled English Wikipedia articles that contain specific quality flaws.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Maik Anderka; Benno Stein; Michael Völske (2012). PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12) [Dataset]. http://doi.org/10.5281/zenodo.3250134

PAN Wikipedia Quality Flaw Corpus 2012 (PAN-WQF-12)

Explore at:
26 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 20, 2012
Authors
Maik Anderka; Benno Stein; Michael Völske
Description

The PAN Wikipedia Quality Flaw Corpus 2012, PAN-WQF-12, provides human-labeled English Wikipedia articles that contain specific quality flaws. The corpus comprises 1,592,226 articles extracted from the English Wikipedia snapshot from January 4th, 2012. A subset of 208,228 articles is labled with ten specific quality flaws, which are listed in the following table. The labeling is based on human-defined cleanup tags. In addition, the corpus comprises 1,383,998 articles that have not been tagged with any cleanup tag. {"references": ["Maik Anderka and Benno Stein. Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia. In Pamela Forner, Jussi Karlgren, and Christa Womser-Hacker, editors, Working Notes Papers of the CLEF 2012 Evaluation Labs, September 2012. ISBN 978-88-904810-3-1. ISSN 2038-4963."]}

Search
Clear search
Close search
Google apps
Main menu