42 datasets found

w
Wikipedia Pageviews Fields
windsor.ai
json
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Windsor.ai (2024). Wikipedia Pageviews Fields [Dataset]. https://windsor.ai/data-field/wikipedia_pageviews/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 1, 2024
Dataset provided by
Windsor.ai
Variables measured
Today, Source, top.day, top.year, top.month, top.access, Data Source, top.project, top.articles, per-article.agent, and 6 more
Description
Auto-generated structured data of Wikipedia Pageviews from table Fields
Wikipedia English: number of page views 2023, by country
statista.com
Updated Dec 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Wikipedia English: number of page views 2023, by country [Dataset]. https://www.statista.com/statistics/1428253/wikipedia-english-page-views-country/
Explore at:
Dataset updated
Dec 13, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2023
Area covered
Worldwide
Description
In November 2023, the English version of Wikipedia received over 3 billion page views originating from the United States across all platforms. The United Kingdom was the country to generate the second-most page views for the subdomain, with 809.9 million views, followed by India, with 773.2 million visualizations.
f
English Wikipedia pageviews by second
figshare.com
huggingface.co
+1more
application/gzip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Os Keyes (2016). English Wikipedia pageviews by second [Dataset]. http://doi.org/10.6084/m9.figshare.1394684.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1394684.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Authors
Os Keyes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains a count of pageviews to the English-language Wikipedia from 2015-03-16T00:00:00 to 2015-04-25T15:59:59, grouped by timestamp (down to a one-second resolution level) and site (mobile or desktop). The smallest number of events in a group is 645; because of this, we are confident there should not be privacy implications of releasing this data.
Wikipedia: most viewed articles in 2024
statista.com
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Wikipedia: most viewed articles in 2024 [Dataset]. https://www.statista.com/statistics/1358978/wikipedia-most-viewed-articles-by-number-of-views/
Explore at:
Dataset updated
Dec 4, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
The most viewed English-language article on Wikipedia in 2023 was Deaths in 2024, with a total of 44.4 million views. Political topics also dominated the list, with articles related to the 2024 U.S. presidential election and key political figures like Kamala Harris and Donald Trump ranking among the top ten most viewed pages. Wikipedia's language diversity As of December 2024, the English Wikipedia subdomain contained approximately 6.91 million articles, making it the largest in terms of content and registered active users. Interestingly, the Cebuano language ranked second with around 6.11 million entries, although many of these articles are reportedly generated by bots. German and French followed as the next most populous European language subdomains, each with over 18,000 active users. Compared to the rest of the internet, as of January 2024, English was the primary language for over 52 percent of websites worldwide, far outpacing Spanish at 5.5 percent and German at 4.8 percent. Global traffic to Wikipedia.org Hosted by the Wikimedia Foundation, Wikipedia.org saw around 4.4 billion unique global visits in March 2024, a slight decrease from 4.6 billion visitors in January. In addition, as of January 2024, Wikipedia ranked amongst the top ten websites with the most referring subnets worldwide.
o
Google Trends And Wikipedia Page Views
explore.openaire.eu
zenodo.org
Updated Jun 25, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitsuo Yoshida (2015). Google Trends And Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14539
Dataset updated
Jun 25, 2015
Authors
Mitsuo Yoshida
Description
Abstract (our paper) The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends. Data personal-name.txt.gz: The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014. personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz: The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view. Publication This data set was created for our study. If you make use of this data set, please cite: Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015. http://dx.doi.org/10.1145/2786451.2786495 http://arxiv.org/abs/1509.02218 (author-created version) Note The raw data of Wikipedia page views is available in the following page. http://dumps.wikimedia.org/other/pagecounts-raw/ {"references": ["Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.", "Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Analysis for Search Trend Prediction. Proceedings of the Annual Conference of Japanese Society for Artificial Intelligence (in Japanese). vol.29, no.2I1-1, pp.1-4, 2015."]}
Z
Data from: Wikipedia Page Views of Japanese Comic
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoshida, Mitsuo (2020). Wikipedia Page Views of Japanese Comic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_60886
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Yoshida, Mitsuo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract (our paper)

This paper investigates the page view and interlanguage link at Wikipedia for Japanese comic analysis. This paper is based on a preliminary investigation, and obtained three results, but the analysis is insufficient to use the results for a market research immediately. I am looking for research collaborators in order to conduct a more detailed analysis.

Data

Publication

This data set was created for our study. If you make use of this data set, please cite: Mitsuo Yoshida. Preliminary Investigation for Japanese Comic Analysis using Wikipedia. Proceedings of the Fifth Asian Conference on Information Systems (ACIS 2016). pp.229-230, 2016.
Most visited Wikipedia pages in the U.S. 2020, by visits
statista.com
Updated Apr 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Most visited Wikipedia pages in the U.S. 2020, by visits [Dataset]. https://www.statista.com/statistics/1115251/most-visited-wikipedia-pages-usa/
Explore at:
Dataset updated
Apr 28, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2020
Area covered
United States
Description
As of March 2020, the most visited Wikipedia page in the United States was "2020 Democratic party presidential primaries" with 2 million visits during the month. The second-most visited page was "2019-20 coronavirus pandemic" with 1.8 million visits. A significant portion of the top visited Wikipedia pages in March are related to the global coronavirus pandemic.
Total global visitor traffic to Wikipedia.org 2024
statista.com
ai-chatbox.pro
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Total global visitor traffic to Wikipedia.org 2024 [Dataset]. https://www.statista.com/statistics/1259907/wikipedia-website-traffic/
Explore at:
Dataset updated
Nov 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
Worldwide
Description
In March 2024, close to 4.4 billion unique global visitors had visited Wikipedia.org, slightly down from 4.4 billion visitors since August of the same year. Wikipedia is a free online encyclopedia with articles generated by volunteers worldwide. The platform is hosted by the Wikimedia Foundation.
Wikipedia Web Traffic 2018-19
kaggle.com
Updated Apr 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
san_bt (2021). Wikipedia Web Traffic 2018-19 [Dataset]. https://www.kaggle.com/datasets/sandeshbhat/wikipedia-web-traffic-201819/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2021
Dataset provided by
Kaggle
Authors
san_bt
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Time Series: Time series is a set of observations recorded over regular interval of time, Time series can be beneficial in many fields like stock market prediction, weather forecasting. - Accounts for the fact that data points taken over time may have an internal structure (such as auto correlation, trend or seasonal variation) that should be accounted for.

Web traffic: Amount of data sent and received by visitors to a website. - Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country

Content

Contains Page Views for 60k Wikipedia articles in 8 different languages taken on a daily basis for 2 years.

https://i.ibb.co/h1JCgpY/DSLC.png" alt="DSLC">

A Data Science Life Cycle can be used to create a project. Forecasting can be done for any interval provided sufficient dataset is available. Refer the Github link in the tasks to view the forecast done using ARIMA and Prophet. Further feel free to contribute. Several other models can be used including a neural network to improve the results by many folds.

Acknowledgements

Credits :
1. Wikipedia 2. Google
h
wikipedia-20250620
huggingface.co
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NeuML (2025). wikipedia-20250620 [Dataset]. https://huggingface.co/datasets/NeuML/wikipedia-20250620
Explore at:
Dataset updated
Jul 3, 2025
Dataset authored and provided by
NeuML
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for Wikipedia English June 2025

Dataset created using this repo with a June 2025 Wikipedia snapshot. This repo also has a precomputed pageviews database. This database has the aggregated number of views for each page in Wikipedia. This file is built using the Wikipedia Pageview complete dumps
f
Wikipedia pagecounts sorted by page (year 2014)
figshare.com
txt
Updated Feb 15, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessio Bogon; Cristian Consonni; Alberto Montresor (2016). Wikipedia pagecounts sorted by page (year 2014) [Dataset]. http://doi.org/10.6084/m9.figshare.2085643.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2085643.v1
Dataset updated
Feb 15, 2016
Dataset provided by
figshare
Authors
Alessio Bogon; Cristian Consonni; Alberto Montresor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the page view statistics for all the WikiMedia projects in the year 2014, ordered by (project, page, timestamp). It has been generated starting from the WikiMedia's pagecounts-raw[1] dataset.The CSV uses spaces as delimiter, without any form of escaping because it is not needed. It has 5 columns:* project: the project name* page: the page requested, url-escaped* timestamp: the timestamp of the hour (format: "%Y%m%d-%H%M%S")* count: the number of times the page has been requested (in that hour)* bytes: the number of bytes transferred (in that hour)You can download the full dataset via torrent[2].Further information about this dataset are available at:http://disi.unitn.it/~consonni/datasets/wikipedia-pagecounts-sorted-by-page-year-2014/[1] https://dumps.wikimedia.org/other/pagecounts-raw/[2] http://disi.unitn.it/~consonni/datasets/wikipedia-pagecounts-sorted-by-page-year-2014/#download
Data from: English Wikipedia - Species Pages
gbif.org
Updated Aug 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Markus Döring; Markus Döring (2022). English Wikipedia - Species Pages [Dataset]. http://doi.org/10.15468/c3kkgh
Explore at:
Unique identifier
https://doi.org/10.15468/c3kkgh
Dataset updated
Aug 23, 2022
Dataset provided by
Wikimedia Foundationhttp://www.wikimedia.org/
Global Biodiversity Information Facilityhttps://www.gbif.org/
Authors
Markus Döring; Markus Döring
Description
Species pages extracted from the English Wikipedia article XML dump from 2022-08-02. Multimedia, vernacular names and textual descriptions are extracted, but only pages with a taxobox or speciesbox template are recognized.

See https://github.com/mdoering/wikipedia-dwca for details.
f
Statistics summarizing the view and edit history of selected Wikipedia...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam M. Wilson; Gene E. Likens (2023). Statistics summarizing the view and edit history of selected Wikipedia articles. [Dataset]. http://doi.org/10.1371/journal.pone.0134454.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0134454.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Adam M. Wilson; Gene E. Likens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
a “Mean Daily Page Views” from http://toolserver.org/~emw/wikistats/ were only available after 2008-01-01 and include programmatic page requests.b include data from 2003-06-12 (when the most recent article, Heliocentrism, originated) through 2012-07-31, when this analysis was run.c Mean daily edit count excludes successive edits by the same user (n = 23,156).d Mean count of words changed (inserted, deleted, or changed, n = 8,525). Due to the heavily right-skewed distributions, geometric means and standard deviations are shown. The number of observations (n) is constant for mean edits per day because all days were included, while only days with at least one edit were used to calculate the mean words changed.Statistics summarizing the view and edit history of selected Wikipedia articles.
f
Selection of English Wikipedia pages (CNs) regarding topics with a direct...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt (2023). Selection of English Wikipedia pages (CNs) regarding topics with a direct relation to the emerging Hadoop (Big Data) market. [Dataset]. http://doi.org/10.1371/journal.pone.0141892.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0141892.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Apache Hadoop is the central software project, beside Apache SOLR, and Apache Lucene (SW, software). Companies which offer Hadoop distributions and Hadoop based solutions are the central companies in the scope of the study (HV, hardware vendors). Other companies started very early with Hadoop related projects as early adopters (EA). Global players (GP) are affected by this emerging market, its opportunities and the new competitors (NC). Some new but highly relevant companies like Talend or LucidWorks have been selected because of their obvious commitment to the open source ideas. Widely adopted technologies with a relation to the selected research topic are represented by the group TEC.
No. of page visits to the R page on Wikipedia
kaggle.com
Updated Feb 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2021). No. of page visits to the R page on Wikipedia [Dataset]. https://www.kaggle.com/gauravduttakiit/no-of-page-visits-to-the-r-page-on-wikipedia/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaurav Dutta
Description
Dataset

This dataset was created by Gaurav Dutta

Contents
WikiRank 05.2019 - quality, popularity and AI for Wikipedia articles
figshare.com
bz2
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wiki Rank (2023). WikiRank 05.2019 - quality, popularity and AI for Wikipedia articles [Dataset]. http://doi.org/10.6084/m9.figshare.8231273.v2
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8231273.v2
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wiki Rank
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes a list of over 39 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Quality scores of articles are based on Wikipedia dumps from May, 2019. Popularity and Authors' Interest based on activity in April 2019.License All files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/Format• page_id -- The identifier of the Wikipedia article (int), e.g. 4519301• page_name -- The title of the Wikipedia article (utf-8), e.g. General relativity• wikirank_quality -- quality score for Wikipedia article in a scale 0-100 (as of May 1, 2019)• poularity -- miedian of daily number of page views of the Wikipedia article during April 2019• authors_interest -- number of authors of the Wikipedia article during April 2019
Z
Kaggle Wikipedia Web Traffic Daily Dataset (without Missing Values)
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Apr 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bergmeir, Christoph (2021). Kaggle Wikipedia Web Traffic Daily Dataset (without Missing Values) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3892918
Explore at:
Dataset updated
Apr 1, 2021
Dataset provided by
Hyndman, Rob
Bergmeir, Christoph
Montero-Manso, Pablo
Godahewa, Rakshitha
Webb, Geoff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.

The original dataset contains missing values. They have been simply replaced by zeros.
D
Wikipedia page visits during the COVID-19 pandemic
ssh.datastations.nl
csv
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maurice Vergeer; Maurice Vergeer (2024). Wikipedia page visits during the COVID-19 pandemic [Dataset]. http://doi.org/10.17026/SS/1VTO3P
Explore at:
csv(20418548)Available download formats
Unique identifier
https://doi.org/10.17026/SS/1VTO3P
Dataset updated
May 16, 2024
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
Maurice Vergeer; Maurice Vergeer
License
https://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
Time period covered
Jan 3, 2020 - Mar 31, 2021
Description
Daily visits to Wikipedia articles about COVID-19
COVID-19 Pandemic Wikipedia Readership
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin (2023). COVID-19 Pandemic Wikipedia Readership [Dataset]. http://doi.org/10.6084/m9.figshare.14548032.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14548032.v3
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data release includes two Wikipedia datasets related to the readership of the project as it relates to the early COVID-19 pandemic period. The first dataset is COVID-19 article page views by country, the second dataset is one hop navigation where one of the two pages are COVID-19 related. The data covers roughly the first six months of the pandemic, more specifically from January 1st 2020 to June 30th 2020. For more background on the pandemic in those months, see English Wikipedia's Timeline of the COVID-19 pandemic.Wikipedia articles are considered COVID-19 related according the methodology described here, the list of COVID-19 articles used for the released datasets is available in covid_articles.tsv. For simplicity and transparency, the same list of articles from 20 April 2020 was used for the entire dataset though in practice new COVID-19-relevant articles were constantly being created as the pandemic evolved.Privacy considerationsWhile this data is considered valuable for the insight that it can provide about information-seeking behaviors around the pandemic in its early months across diverse geographies, care must be taken to not inadvertently reveal information about the behavior of individual Wikipedia readers. We put in place a number of filters to release as much data as we can while minimizing the risk to readers.The Wikimedia foundation started to release most viewed articles by country from Jan 2021. At the beginning of the COVID-19 an exemption was made to store reader data about the pandemic with additional privacy protections:- exclude the page views from users engaged in an edit session- exclude reader data from specific countries (with a few exceptions)- the aggregated statistics are based on 50% of reader sessions that involve a pageview to a COVID-19-related article (see covid_pages.tsv). As a control, a 1% random sample of reader sessions that have no pageviews to COVID-19-related articles was kept. In aggregate, we make sure this 1% non-COVID-19 sample and 50% COVID-19 sample represents less than 10% of pageviews for a country for that day. The randomization and filters occurs on a daily cadence with all timestamps in UTC.- exclude power users - i.e. userhashes with greater than 500 pageviews in a day. This doubles as another form of likely bot removal, protects very heavy users of the project, and also in theory would help reduce the chance of a single user heavily skewing the data.- exclude readership from users of the iOS and Android Wikipedia apps. In effect, the view counts in this dataset represent comparable trends rather than the total amount of traffic from a given country. For more background on readership data per country data, and the COVID-19 privacy protections in particular, see this phabricator.To further minimize privacy risks, a k-anonymity threshold of 100 was applied to the aggregated counts. For example, a page needs to be viewed at least 100 times in a given country and week in order to be included in the dataset. In addition, the view counts are floored to a multiple of 100.DatasetsThe datasets published in this release are derived from a reader session dataset generated by the code in this notebook with the filtering described above. The raw reader session data itself will not be publicly available due to privacy considerations. The datasets described below are similar to the pageviews and clickstream data that the Wikimedia foundation publishes already, with the addition of the country specific counts.COVID-19 pageviewsThe file covid_pageviews.tsv contains:- pageview counts for COVID-19 related pages, aggregated by week and country- k-anonymity threshold of 100- example: In the 13th week of 2020 (23 March - 29 March 2020), the page 'Pandémie_de_Covid-19_en_Italie' on French Wikipedia was visited 11700 times from readers in Belgium- as a control bucket, we include pageview counts to all pages aggregated by week and country. Due to privacy considerations during the collection of the data, the control bucket was sampled at ~1% of all view traffic. The view counts for the control title are thus proportional to the total number of pageviews to all pages.The file is ~8 MB and contains ~134000 data points across the 27 weeks, 108 countries, and 168 projects.Covid reader session bigramsThe file covid_session_bigrams.tsv contains:- number of occurrences of visits to pages A -> B, where either A or B is a COVID-19 related article. Note that the bigrams are tuples (from, to) of articles viewed in succession, the underlying mechanism can be clicking on a link in an article, but it may also have been a new search or reading both articles based on links from third source articles. In contrast, the clickstream data is based on referral information only- aggregated by month and country- k-anonymity threshold of 100- example: In March of 2020, there were a 1000 occurences of readers accessing the page es.wikipedia/SARS-CoV-2 followed by es.wikipedia/Orthocoronavirinae from ChileThe file is ~10 MB and contains ~90000 bigrams across the 6 months, 96 countries, and 56 projects.ContactPlease reach out to research-feedback@wikimedia.org for any questions.
f
Wikimedia Desktop View
figshare.com
bz2
Updated Feb 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Halfaker (2022). Wikimedia Desktop View [Dataset]. http://doi.org/10.6084/m9.figshare.1254630.v1
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1254630.v1
Dataset updated
Feb 2, 2022
Dataset provided by
figshare
Authors
Aaron Halfaker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Each row represents a page view to the Desktop version of Wikimedia's sites (e.g. wikipedia, wiktionary, etc.) This dataset was gathered through the use of a snippet of Javascript that used a cookie to randomly sample readers at 1/1000. 7.12 million page view requests by 1.18 million users were recorded.