7 datasets found
  1. h

    WikipediaUpdated

    • huggingface.co
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jojo jenkins (2023). WikipediaUpdated [Dataset]. https://huggingface.co/datasets/luciferxf/WikipediaUpdated
    Explore at:
    Dataset updated
    May 4, 2023
    Authors
    jojo jenkins
    Description

    Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

  2. h

    eu_wikipedias

    • huggingface.co
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Hall (2023). eu_wikipedias [Dataset]. https://huggingface.co/datasets/dlwh/eu_wikipedias
    Explore at:
    Dataset updated
    Jan 26, 2023
    Authors
    David Hall
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

  3. h

    wikipedia_zh

    • huggingface.co
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WXY (2023). wikipedia_zh [Dataset]. https://huggingface.co/datasets/dirtycomputer/wikipedia_zh
    Explore at:
    Dataset updated
    Feb 21, 2023
    Authors
    WXY
    Description

    Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

  4. h

    wikipedia_markdown

    • huggingface.co
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomaž Savodnik (2025). wikipedia_markdown [Dataset]. https://huggingface.co/datasets/zidsi/wikipedia_markdown
    Explore at:
    Dataset updated
    Jan 20, 2025
    Authors
    Tomaž Savodnik
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

  5. h

    wikipedia

    • huggingface.co
    • tensorflow.org
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Online Language Modelling (2023). wikipedia [Dataset]. https://huggingface.co/datasets/olm/wikipedia
    Explore at:
    Dataset updated
    Feb 21, 2023
    Dataset authored and provided by
    Online Language Modelling
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

  6. O

    Wikipedia

    • opendatalab.com
    zip
    Updated Apr 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Wikipedia [Dataset]. https://opendatalab.com/OpenDataLab/Wikipedia
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 9, 2019
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

  7. h

    banglawiki

    • huggingface.co
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KhulnaSoft, Ltd. (2025). banglawiki [Dataset]. https://huggingface.co/datasets/khulnasoft/banglawiki
    Explore at:
    Dataset updated
    Jun 24, 2025
    Authors
    KhulnaSoft, Ltd.
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
jojo jenkins (2023). WikipediaUpdated [Dataset]. https://huggingface.co/datasets/luciferxf/WikipediaUpdated

WikipediaUpdated

luciferxf/WikipediaUpdated

Explore at:
Dataset updated
May 4, 2023
Authors
jojo jenkins
Description

Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

Search
Clear search
Close search
Google apps
Main menu