Saved datasets
Last updated
Download format
Croissant
Croissant is a format for Machine Learning datasets
Learn more about this at mlcommons.org/croissant.
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. h

    WikipediaUpdated

    • huggingface.co
    Updated May 4, 2023
    + more versions
  2. T

    wikipedia

    • tensorflow.org
    • huggingface.co
    Updated Aug 9, 2019
  3. P

    Wiki-en Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jul 25, 2019
  4. Data from: Wikipedia Citations: A comprehensive dataset of citations with...

    • zenodo.org
    zip
    Updated Nov 12, 2020
  5. h

    ner-wikipedia-dataset

    • huggingface.co
    Updated Jul 25, 2023
  6. T

    wiki40b

    • tensorflow.org
    • opendatalab.com
    • +1more
    Updated Aug 30, 2023
  7. P

    French Wikipedia Dataset

    • paperswithcode.com
    • opendatalab.com
  8. h

    wikipedia-tr

    • huggingface.co
  9. wikipedia-22-12-simple-embeddings

    • huggingface.co
    • opendatalab.com
    Updated Mar 29, 2023
    + more versions
  10. P

    Wiki Squirrel Dataset

    • paperswithcode.com
    Updated Jun 18, 2023
    + more versions
  11. P

    Wikidata-Disamb Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 10, 2021
  12. c

    Plaintext Wikipedia dump 2018

    • lindat.mff.cuni.cz
    • live.european-language-grid.eu
    Updated Feb 25, 2018
  13. Wikipedia Edits

    • kaggle.com
    Updated Aug 20, 2017
  14. f

    English Wikipedia Quality Asssessment Dataset

    • figshare.com
    application/bzip2
    Updated May 31, 2023
  15. Wikipedia Talk Labels: Personal Attacks

    • figshare.com
    txt
    Updated Feb 22, 2017
    + more versions
  16. f

    Wikipedia Article Topics for All Languages (based on article outlinks)

    • figshare.com
    bz2
    Updated Jul 20, 2021
  17. T

    wikipedia_toxicity_subtypes

    • tensorflow.org
    Updated Dec 6, 2022
  18. Persian Wikipedia Dataset

    • kaggle.com
    zip
    Updated Aug 18, 2020
  19. f

    Data from: Wiki-Reliability: A Large Scale Dataset for Content Reliability...

    • figshare.com
    txt
    Updated Mar 14, 2021
  20. Wikipedia Knowledge Graph dataset

    • zenodo.org
    pdf, tsv
    Updated Nov 17, 2022
    + more versions
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
jojo jenkins (2023). WikipediaUpdated [Dataset]. https://huggingface.co/datasets/luciferxf/WikipediaUpdated

WikipediaUpdated

luciferxf/WikipediaUpdated

Explore at:
Dataset updated
May 4, 2023
Authors
jojo jenkins
Description

Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

Search
Clear search
Close search
Google apps
Main menu