The most viewed English-language article on Wikipedia in 2023 was Deaths in 2024, with a total of 44.4 million views. Political topics also dominated the list, with articles related to the 2024 U.S. presidential election and key political figures like Kamala Harris and Donald Trump ranking among the top ten most viewed pages. Wikipedia's language diversity As of December 2024, the English Wikipedia subdomain contained approximately 6.91 million articles, making it the largest in terms of content and registered active users. Interestingly, the Cebuano language ranked second with around 6.11 million entries, although many of these articles are reportedly generated by bots. German and French followed as the next most populous European language subdomains, each with over 18,000 active users. Compared to the rest of the internet, as of January 2024, English was the primary language for over 52 percent of websites worldwide, far outpacing Spanish at 5.5 percent and German at 4.8 percent. Global traffic to Wikipedia.org Hosted by the Wikimedia Foundation, Wikipedia.org saw around 4.4 billion unique global visits in March 2024, a slight decrease from 4.6 billion visitors in January. In addition, as of January 2024, Wikipedia ranked amongst the top ten websites with the most referring subnets worldwide.
As of March 2020, the most visited Wikipedia page in the United States was "2020 Democratic party presidential primaries" with * million visits during the month. The second-most visited page was "2019-20 coronavirus pandemic" with *** million visits. A significant portion of the top visited Wikipedia pages in March are related to the global coronavirus pandemic.
As of December 2023, the English subdomain of Wikipedia had around 6.91 million articles published, being the largest subdomain of the website by number of entries and registered active users. German and French ranked third and fourth, with over 29.6 million and 26.5 million entries. Being the only Asian language figuring among the top 10, Cebuano was the language with the second-most articles on the portal, amassing around 6.11 million entries. However, while most Wikipedia articles in English and other European languages are written by humans, entries in Cebuano are reportedly mostly generated by bots.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
It contains the text of an article and also all the images from that article along with metadata such as image titles and descriptions. From Wikipedia, we selected featured articles, which are just a small subset of all available ones, because they are manually reviewed and protected from edits. Thus it's the best theoretical quality human editors on Wikipedia can offer.
You can find more details in "Image Recommendation for Wikipedia Articles" thesis.
The high-level structure of the dataset is as follows:
.
+-- page1
| +-- text.json
| +-- img
| +-- meta.json
+-- page2
| +-- text.json
| +-- img
| +-- meta.json
:
+-- pageN
| +-- text.json
| +-- img
| +-- meta.json
label | description |
---|---|
pageN | is the title of N-th Wikipedia page and contains all information about the page |
text.json | text of the page saved as JSON. Please refer to the details of JSON schema below. |
meta.json | a collection of all images of the page. Please refer to the details of JSON schema below. |
imageN | is the N-th image of an article, saved in jpg format where the width of each image is set to 600px. Name of the image is md5 hashcode of original image title. |
Below you see an example of how data is stored:
{
"title": "Naval Battle of Guadalcanal",
"id": 405411,
"url": "https://en.wikipedia.org/wiki/Naval_Battle_of_Guadalcanal",
"html": "...
...", "wikitext": "... The '''Naval Battle of Guadalcanal''', sometimes referred to as ...", }
key | description |
---|---|
title | page title |
id | unique page id |
url | url of a page on Wikipedia |
html | HTML content of the article |
wikitext | wikitext content of the article |
Please note that @html and @wikitext properties represent the same information in different formats, so just choose the one which is easier to parse in your circumstances.
{
"img_meta": [
{
"filename": "702105f83a2aa0d2a89447be6b61c624.jpg",
"title": "IronbottomSound.jpg",
"parsed_title": "ironbottom sound",
"url": "https://en.wikipedia.org/wiki/File%3AIronbottomSound.jpg",
"is_icon": False,
"on_commons": True,
"description": "A U.S. destroyer steams up what later became known as ...",
"caption": "Ironbottom Sound. The majority of the warship surface ...",
"headings": ['Naval Battle of Guadalcanal', 'First Naval Battle of Guadalcanal', ...],
"features": ['4.8618264', '0.49436468', '7.0841103', '2.7377882', '2.1305492', ...],
},
...
]
}
key | description |
---|---|
filename | unique image id, md5 hashcode of original image title |
title | image title retrieved from Commons, if applicable |
parsed_title | image title split into words, i.e. "helloWorld.jpg" -> "hello world" |
url | url of an image on Wikipedia |
is_icon | True if image is an icon, e.g. category icon. We assume that image is an icon if you cannot load a preview on Wikipedia after clicking on it |
on_commons | True if image is available from Wikimedia Commons dataset |
description | description of an image parsed from Wikimedia Commons page, if available |
caption | caption of an image parsed from Wikipedia article, if available |
headings | list of all nested headings of location where article is placed in Wikipedia article. The first element is top-most heading |
features | output of 5-th convolutional layer of ResNet152 trained on ImageNet dataset. That output of shape (19, 24, 2048) is then max-pooled to a shape (2048,). Features taken from original images downloaded in jpeg format with fixed width of 600px. Practically, it is a list of floats with len = 2048 |
Data was collected by fetching featured articles text&image content with pywikibot library and then parsing out a lot of additional metadata from HTML pages from Wikipedia and Commons.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Wikipedia, the world's largest encyclopedia, is a crowdsourced open knowledge project and website with millions of individual web pages. This dataset is a grab of the title of every article on Wikipedia as of September 20, 2017.
This dataset is a simple newline () delimited list of article titles. No distinction is made between redirects (like
Schwarzenegger
) and actual article pages (like Arnold Schwarzenegger
).
This dataset was created by scraping Special:AllPages on Wikipedia. It was originally shared here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https://dumps.wikimedia.org/enwiki/20230220/).
Version 1: en_citations.zip is a dataset of extracted citations
Version 2: en_final.zip is the same dataset with classified citations augmented with identifiers
The fields are as follows:
The source code to extract citations can be found here: https://github.com/albatros13/wikicite.
The code is a fork of the earlier project on Wikipedia citation extraction: https://github.com/Harshdeep1996/cite-classifications-wiki.
In March 2024, close to 4.4 billion unique global visitors had visited Wikipedia.org, slightly down from 4.4 billion visitors since August of the same year. Wikipedia is a free online encyclopedia with articles generated by volunteers worldwide. The platform is hosted by the Wikimedia Foundation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Many digital libraries recommend literature to their users considering the similarity between a query document and their repository. However, they often fail to distinguish what is the relationship that makes two documents alike. In this paper, we model the problem of finding the relationship between two documents as a pairwise document classification task. To find the semantic relation between documents, we apply a series of techniques, such as GloVe, Paragraph-Vectors, BERT, and XLNet under different configurations (e.g., sequence length, vector concatenation scheme), including a Siamese architecture for the Transformer-based systems. We perform our experiments on a newly proposed dataset of 32,168 Wikipedia article pairs and Wikidata properties that define the semantic document relations. Our results show vanilla BERT as the best performing system with an F1-score of 0.93,
which we manually examine to better understand its applicability to other domains. Our findings suggest that classifying semantic relations between documents is a solvable task and motivates the development of recommender systems based on the evaluated techniques. The discussions in this paper serve as first steps in the exploration of documents through SPARQL-like queries such that one could find documents that are similar in one aspect but dissimilar in another.
Additional information can be found on GitHub.
The following data is supplemental to the experiments described in our research paper. The data consists of:
This package consists of the Dataset part.
Dataset
The Wikipedia article corpus is available in enwiki-20191101-pages-articles.weighted.10k.jsonl.bz2
. The original data have been downloaded as XML dump, and the corresponding articles were extracted as plain-text with gensim.scripts.segment_wiki. The archive contains only articles that are available in training or test data.
The actual dataset is provided as used in the stratified k-fold with k=4
in train_testdata_4folds.tar.gz
.
├── 1 │ ├── test.csv │ └── train.csv ├── 2 │ ├── test.csv │ └── train.csv ├── 3 │ ├── test.csv │ └── train.csv └── 4 ├── test.csv └── train.csv
4 directories, 8 files
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
== wikiproject_to_template.halfak_20191202.yaml The mapping of the canonical names of WikiProjects to all the templates that might be used to tag an article with this WikiProject that was used for generating this dump. For instance, the line 'WikiProject Trade: ["WikiProject Trade", "WikiProject trade", "Wptrade"]' indicates that WikiProject Trade (https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Trade) is associated with the following templates:* https://en.wikipedia.org/wiki/Template:WikiProject_Trade* https://en.wikipedia.org/wiki/Template:WikiProject_trade* https://en.wikipedia.org/wiki/Template:Wptrade wikiproject_taxonomy.halfak_20191202.yaml A proposed mapping of WikiProjects to higher-level categories. This mapping has not been applied to the JSON dump contained here. It is based on the WikiProjects' canonical names. gather_wikiprojects_per_article.py Old Python script that built the JSON dump described below for English Wikipedia based on wikitext/wikidata dumps (slow and more prone to errors). gather_wikiprojects_per_article_pageassessments.py New Python script to build the JSON dump described below that uses the PageAssessments Mediawiki table in MariaDB and so is much faster and can handle languages beyond Enlgihs much more easily. labeled_wiki_with_topics_metadata.json.bz2 ==Each line of this bzipped JSON file corresponds with a Wikipedia article in that language (currently Arabic, English, French, Hungarian, Turkish). The intended usage of this JSON file is to build topic classification models for Wikipedia articles.While the English file has good coverage because a more or less complete mapping exists between WikiProjects and topics, the other languages are much more sparse in their labels because they do not cover any WikiProjects in that language that don't have English equivalents (per Wikidata). The other languages are probably best used for supplementation of the English labels or a separate test set that might have a different topic distribution.The following properties are recorded:* title: Wikipedia article title in that language* article_revid: Most recent revision ID associated with the article for which a WikiProject asssessment was made (might not be current revision ID)* talk_pid: Page ID corresponding with the talk page for the Wikipedia article* talk_revid: Most recent revision ID associated with the talk page for which a WikiProject asssessment was made (might not be current revision ID)* wp_templates: List of WikiProject templates from the page_assessments table.* qid: Wikidata ID corresponding to the Wikipedia article* sitelinks: Based on Wikidata, the other languages in which this article exists and the corresponding page IDs.* topics: topic labels associated with the article based on its WikiProject templates and the WikiProjectLabel mapping (wikiproject_taxonomy)This version is based on the 24 May 2020 page_assessment tables and 4 May 2020 Wikidata item_page_link table. Articles with no associated WikiProject templates are not included. Of note in comparison to previous versions of this file, the revision IDs are now that revision IDs that were most recently assessed by a WikiProject, not the current versions of the page. The sitelinks are now as page IDs, which are more stable and less prone to encoding issues etc. The WikiProject templates are now pulled via the Mediawiki page_assessments table and so are in a different format than the templates that were extracted from the raw talk pages.For example, here is the line for Agatha Christie from the English JSON file:{'title': 'Agatha_Christie','article_revid': 958377791, 'talk_pid': 1001, 'talk_revid': 958103309, 'wp_templates': ["Women","Women's History","Women writers","Biography","Novels/Crime task force","Novels","Biography/science and academia work group","Biography/arts and entertainment work group","Devon","Archaeology/Women in archaeology task force","Archaeology"], 'qid': 'Q35064', 'sitelinks': { 'afwiki': 19274, 'amwiki': 47582, 'anwiki': 115127, 'arwiki': 12886, ...'enwiki': 984,... 'zhwiki': 10983, 'zh_min_nanwiki': 21828, 'zh_yuewiki': 131652}}
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 19 million articles (over 3.6 million in English) have been written collaboratively by volunteers around the world, and almost all of its articles can be edited by anyone with access to the site. As of July 2011, there were editions of Wikipedia in 282 languages. Wikipedia was launched in 2001 by Jimmy Wales and Larry Sanger and has become the largest and most popular general reference work on the Internet, ranking around seventh among all websites on Alexa and having 365 million readers. The name Wikipedia was coined by Larry Sanger and is a combination of wiki (a technology for creating collaborative websites, from the Hawaiian word wiki, meaning quick) and encyclopedia. Wikipedia''s departure from the expert-driven style of encyclopedia building and the large presence of unacademic content has been noted several times. Some have noted the importance of Wikipedia not only as an encyclopedic reference but also as a frequently updated news resource because of how quickly articles about recent events appear. Although the policies of Wikipedia strongly espouse verifiability and a neutral point of view, critics of Wikipedia accuse it of systemic bias and inconsistencies (including undue weight given to popular culture), and allege that it favors consensus over credentials in its editorial processes. Its reliability and accuracy are also targeted. A 2005 investigation in Nature showed that the science articles they compared came close to the level of accuracy of Encyclopedia Britannica and had a similar rate of serious errors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We report only the models which performed better (see Table 3 for the best models). For each model, we report the page name, the shortest-path distance DI between the page and the corresponding “Influenza” page, and the Pearson Correlation Coefficient (PCC) measured against the influenza incidence. We also report the corresponding page in the English Wikipedia in parentheses. We used the value NE to specify when a page has no English equivalent. The value DI > 3 indicates that the page is more than three hops away from the “Influenza” page.
Wikipedia Generation is a dataset for article generation from Wikipedia from references at the end of Wikipedia page and the top 10 search results for the Wikipedia topic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Presents Figures S1, S2, S3 in SI file showing comparison between probability distributions over activity fields and language for top 30 and 100 persons for EN, IT, NK respectively; tables S1, S2, … S27 in SI file showing top 30 persons in PageRank, CheiRank and 2DRank for all 9 Wikipedia editions. All names are given in English. Supplementary methods, tables, ranking lists and figures are available at http://www.quantware.ups-tlse.fr/QWLIB/wikiculturenetwork/; data sets of 9 hyperlink networks are available at [29] by a direct request addressed to S.Vigna. (PDF)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example of list of top 10 persons by PageRank for English Wikipedia with their field of activity and native language.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Wikipedia Category Granularity (WikiGrain)" data consists of three files that contain information about articles of the English-language version of Wikipedia (https://en.wikipedia.org).
The data has been generated from the database dump dated 20 October 2016 provided by the Wikimedia foundation licensed under the GNU Free Documentation License (GFDL) and the Creative Commons Attribution-Share-Alike 3.0 License.
WikiGrain provides information on all 5,006,601 Wikipedia articles (that is, pages in Namespace 0 that are not redirects) that are assigned to at least one category.
The WikiGrain Data is analyzed in the paper
Jürgen Lerner and Alessandro Lomi: Knowledge categorization affects popularity and quality of Wikipedia articles. PLoS ONE, 13(1):e0190674, 2018.
===============================================================
Individual files (tables in comma-separated-values-format):
---------------------------------------------------------------
* article_info.csv contains the following variables:
- "id"
(integer) Unique identifier for articles; identical with the page_id in the Wikipedia database.
- "granularity"
(decimal) The granularity of an article A is defined to be the average (mean) granularity of the categories of A, where the granularity of a category C is the shortest path distance in the parent-child subcategory network from the root category (Category:Articles) to C. Higher granularity values indicate articles whose topics are less general, narrower, more specific.
- "is.FA"
(boolean) True ('1') if the article is a featured article; false ('0') else.
- "is.FA.or.GA"
(boolean) True ('1') if the article is a featured article or a good article; false ('0') else.
- "is.top.importance"
(boolean) True ('1') if the article is listed as a top importance article by at least one WikiProject; false ('0') else.
- "number.of.revisions"
(integer) Number of times a new version of the article has been uploaded.
---------------------------------------------------------------
* article_to_tlc.csv
is a list of links from articles to the closest top-level categories (TLC) they are contained in. We say that an article A is a member of a TLC C if A is in a category that is a descendant of C and the distance from C to A (measured by the number of parent-child category links) is minimal over all TLC. An article can thus be member of several TLC.
The file contains the following variables:
- "id"
(integer) Unique identifier for articles; identical with the page_id in the Wikipedia database.
- "id.of.tlc"
(integer) Unique identifier for TLC in which the article is contained; identical with the page_id in the Wikipedia database.
- "title.of.tlc"
(string) Title of the TLC in which the article is contained.
---------------------------------------------------------------
* article_info_normalized.csv
contains more variables associated with articles than article_info.csv. All variables, except "id" and "is.FA" are normalized to standard deviation equal to one. Variables whose name has prefix "log1p." have been transformed by the mapping x --> log(1+x) to make distributions that are skewed to the right 'more normal'.
The file contains the following variables:
- "id"
Article id.
- "is.FA"
Boolean indicator for whether the article is featured.
- "log1p.length"
Length measured by the number of bytes.
- "age"
Age measured by the time since the first edit.
- "log1p.number.of.edits"
Number of times a new version of the article has been uploaded.
- "log1p.number.of.reverts"
Number of times a revision has been reverted to a previous one.
- "log1p.number.of.contributors"
Number of unique contributors to the article.
- "number.of.characters.per.word"
Average number of characters per word (one component of 'reading complexity').
- "number.of.words.per.sentence"
Average number of words per sentence (second component of 'reading complexity').
- "number.of.level.1.sections"
Number of first level sections in the article.
- "number.of.level.2.sections"
Number of second level sections in the article.
- "number.of.categories"
Number of categories the article is in.
- "log1p.average.size.of.categories"
Average size of the categories the article is in.
- "log1p.number.of.intra.wiki.links"
Number of links to pages in the English-language version of Wikipedia.
- "log1p.number.of.external.references"
Number of external references given in the article.
- "log1p.number.of.images"
Number of images in the article.
- "log1p.number.of.templates"
Number of templates that the article uses.
- "log1p.number.of.inter.language.links"
Number of links to articles in different language edition of Wikipedia.
- "granularity"
As in article_info.csv (but normalized to standard deviation one).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About This dataset contains counts of (referer, article) pairs extracted from the request logs of English Wikipedia. When a client requests a resource by following a link or performing a search, the URI of the webpage that linked to the resource is included in the request in an HTTP header called the "referer". This data captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. Data Preparation- The dataset only includes requests to articles in the main namespace of the desktop version of English Wikipedia (see https://en.wikipedia.org/wiki/Wikipedia:Namespace) - Requests to MediaWiki redirects are excluded - Spider traffic was excluded using the ua-parser library (https://github.com/tobie/ua-parser) - Referers were mapped to a fixed set of values corresponding to internal traffic or external traffic from one of the top 5 global traffic sources of English Wikipedia, based on this scheme: - an article in the main namespace of English Wikipedia -> the article title - any Wikipedia page that is not in the main namespace of English Wikipedia -> 'other-wikipedia' - an empty referer -> 'other-empty' - a page from any other Wikimedia project -> 'other-internal' - Google -> 'other-google' - Yahoo -> 'other-yahoo' - Bing -> 'other-bing' - Facebook -> 'other-facebook' - Twitter -> 'other-twitter' - anything else -> 'other' For the exact mapping see https://github.com/ewulczyn/wmf/blob/master/mc/oozie/hive_query.sql#L30-L48 - (referer, article) pairs with 10 or fewer observations were removed from the dataset Note: When a user requests a page through the search bar, the page the user searched from is listed as a referer. Hence, the data contains '(referer, article)' pairs for which the referer does not contain a link to the article. For an example, consider the '(Wikipedia, Chris_Kyle)' pair. Users went to the 'Wikipedia' article to search for Chris Kyle within English Wikipedia. ApplicationsThis data can be used for various purposes: - determining the most frequent links people click on for a given article- determining the most common links people followed to an article- determining how much of the total traffic to an article clicked on a link in that article- generating a Markov chain over English Wikipedia Format:- prev_id: if the referer does not correspond to an article in the main namespace of English Wikipedia, this value will be empty. Otherwise, it contains the unique MediaWiki page ID of the article corresponding to the referer i.e. the previous article the client was on- curr_id: the MediaWiki unique page ID of the article the client requested- n: the number of occurrences of the '(referer, article)' pair- prev_title: the result of mapping the referer URL to the fixed set of values described above- curr_title: the title of the article the client requested
LicenseAll files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/ Source codehttps://github.com/ewulczyn/wmf/blob/master/mc/oozie/hive_query.sql (MIT license)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compiles educational activities in higher education that incorporate Wikipedia as a pedagogical tool between 2003 and 2024. Each activity includes detailed information such as:
Wikipedia version used
Activity title
Description of the activity
Related discipline and course code (if applicable)
Dates of implementation (start and end)
City and country where the activity took place
Educational level and university
National Council for Scientific and Technological Development (CNPq) knowledge area classification
Names and usernames of supporting staff and responsible professors
Partnerships with other institutions or initiatives
Links to relevant outputs on Wikipedia, Wikimedia Commons, and external sites
Its goal is to organize these experiences to facilitate comparative analysis, identify best practices, and support the development of new educational projects involving open knowledge and active learning strategies.
The mapping table available on this page is the result of an extensive two-year research effort, involving the analysis of over 20,000 educational projects related to the use of Wikipedia in higher education. Despite the systematic effort and methodological rigor applied, the volume, diversity, and limitations in accessing consistent data ultimately compromised the final consolidation of the table, especially after the research funding ended. Therefore, we recommend that the table be consulted with caution and critical thinking, bearing in mind that some information may be incomplete or inaccurate. A broader contextualization of the results, as well as reflections on the challenges faced, can be found in the project's final report: https://w.wiki/DeBS .
The most viewed English-language article on Wikipedia in 2023 was Deaths in 2024, with a total of 44.4 million views. Political topics also dominated the list, with articles related to the 2024 U.S. presidential election and key political figures like Kamala Harris and Donald Trump ranking among the top ten most viewed pages. Wikipedia's language diversity As of December 2024, the English Wikipedia subdomain contained approximately 6.91 million articles, making it the largest in terms of content and registered active users. Interestingly, the Cebuano language ranked second with around 6.11 million entries, although many of these articles are reportedly generated by bots. German and French followed as the next most populous European language subdomains, each with over 18,000 active users. Compared to the rest of the internet, as of January 2024, English was the primary language for over 52 percent of websites worldwide, far outpacing Spanish at 5.5 percent and German at 4.8 percent. Global traffic to Wikipedia.org Hosted by the Wikimedia Foundation, Wikipedia.org saw around 4.4 billion unique global visits in March 2024, a slight decrease from 4.6 billion visitors in January. In addition, as of January 2024, Wikipedia ranked amongst the top ten websites with the most referring subnets worldwide.