7 datasets found

h
WikipediaUpdated
huggingface.co
Updated May 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jojo jenkins (2023). WikipediaUpdated [Dataset]. https://huggingface.co/datasets/luciferxf/WikipediaUpdated
Explore at:
Dataset updated
May 4, 2023
Authors
jojo jenkins
Description
Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
h
eu_wikipedias
huggingface.co
Updated Jan 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Hall (2023). eu_wikipedias [Dataset]. https://huggingface.co/datasets/dlwh/eu_wikipedias
Explore at:
Dataset updated
Jan 26, 2023
Authors
David Hall
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
h
wikipedia_zh
huggingface.co
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WXY (2023). wikipedia_zh [Dataset]. https://huggingface.co/datasets/dirtycomputer/wikipedia_zh
Explore at:
Dataset updated
Feb 21, 2023
Authors
WXY
Description
Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
h
wikipedia_markdown
huggingface.co
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomaž Savodnik (2025). wikipedia_markdown [Dataset]. https://huggingface.co/datasets/zidsi/wikipedia_markdown
Explore at:
Dataset updated
Jan 20, 2025
Authors
Tomaž Savodnik
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
h
wikipedia
huggingface.co
tensorflow.org
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Online Language Modelling (2023). wikipedia [Dataset]. https://huggingface.co/datasets/olm/wikipedia
Explore at:
Dataset updated
Feb 21, 2023
Dataset authored and provided by
Online Language Modelling
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
O
Wikipedia
opendatalab.com
zip
Updated Apr 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Wikipedia [Dataset]. https://opendatalab.com/OpenDataLab/Wikipedia
Explore at:
zipAvailable download formats
Dataset updated
Apr 9, 2019
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
h
banglawiki
huggingface.co
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KhulnaSoft, Ltd. (2025). banglawiki [Dataset]. https://huggingface.co/datasets/khulnasoft/banglawiki
Explore at:
Dataset updated
Jun 24, 2025
Authors
KhulnaSoft, Ltd.
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

jojo jenkins (2023). WikipediaUpdated [Dataset]. https://huggingface.co/datasets/luciferxf/WikipediaUpdated

WikipediaUpdated

luciferxf/WikipediaUpdated

Explore at:

Dataset updated

May 4, 2023

Authors

jojo jenkins

Description

Wikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).

Clear search

Close search

Google apps

Main menu

WikipediaUpdated

eu_wikipedias

wikipedia_zh

wikipedia_markdown

wikipedia

Wikipedia

banglawiki

WikipediaUpdated

luciferxf/WikipediaUpdated