42 datasets found
  1. w

    Wikipedia Pageviews Fields

    • windsor.ai
    json
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Windsor.ai (2024). Wikipedia Pageviews Fields [Dataset]. https://windsor.ai/data-field/wikipedia_pageviews/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 1, 2024
    Dataset provided by
    Windsor.ai
    Variables measured
    Today, Source, top.day, top.year, top.month, top.access, Data Source, top.project, top.articles, per-article.agent, and 6 more
    Description

    Auto-generated structured data of Wikipedia Pageviews from table Fields

  2. Wikipedia English: number of page views 2023, by country

    • statista.com
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Wikipedia English: number of page views 2023, by country [Dataset]. https://www.statista.com/statistics/1428253/wikipedia-english-page-views-country/
    Explore at:
    Dataset updated
    Dec 13, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2023
    Area covered
    Worldwide
    Description

    In November 2023, the English version of Wikipedia received over 3 billion page views originating from the United States across all platforms. The United Kingdom was the country to generate the second-most page views for the subdomain, with 809.9 million views, followed by India, with 773.2 million visualizations.

  3. f

    English Wikipedia pageviews by second

    • figshare.com
    • huggingface.co
    • +1more
    application/gzip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Os Keyes (2016). English Wikipedia pageviews by second [Dataset]. http://doi.org/10.6084/m9.figshare.1394684.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Os Keyes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains a count of pageviews to the English-language Wikipedia from 2015-03-16T00:00:00 to 2015-04-25T15:59:59, grouped by timestamp (down to a one-second resolution level) and site (mobile or desktop). The smallest number of events in a group is 645; because of this, we are confident there should not be privacy implications of releasing this data.

  4. Wikipedia: most viewed articles in 2024

    • statista.com
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Wikipedia: most viewed articles in 2024 [Dataset]. https://www.statista.com/statistics/1358978/wikipedia-most-viewed-articles-by-number-of-views/
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most viewed English-language article on Wikipedia in 2023 was Deaths in 2024, with a total of 44.4 million views. Political topics also dominated the list, with articles related to the 2024 U.S. presidential election and key political figures like Kamala Harris and Donald Trump ranking among the top ten most viewed pages. Wikipedia's language diversity As of December 2024, the English Wikipedia subdomain contained approximately 6.91 million articles, making it the largest in terms of content and registered active users. Interestingly, the Cebuano language ranked second with around 6.11 million entries, although many of these articles are reportedly generated by bots. German and French followed as the next most populous European language subdomains, each with over 18,000 active users. Compared to the rest of the internet, as of January 2024, English was the primary language for over 52 percent of websites worldwide, far outpacing Spanish at 5.5 percent and German at 4.8 percent. Global traffic to Wikipedia.org Hosted by the Wikimedia Foundation, Wikipedia.org saw around 4.4 billion unique global visits in March 2024, a slight decrease from 4.6 billion visitors in January. In addition, as of January 2024, Wikipedia ranked amongst the top ten websites with the most referring subnets worldwide.

  5. o

    Google Trends And Wikipedia Page Views

    • explore.openaire.eu
    • zenodo.org
    Updated Jun 25, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitsuo Yoshida (2015). Google Trends And Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
    Explore at:
    Dataset updated
    Jun 25, 2015
    Authors
    Mitsuo Yoshida
    Description

    Abstract (our paper) The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends. Data personal-name.txt.gz: The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014. personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz: The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view. Publication This data set was created for our study. If you make use of this data set, please cite: Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015. http://dx.doi.org/10.1145/2786451.2786495 http://arxiv.org/abs/1509.02218 (author-created version) Note The raw data of Wikipedia page views is available in the following page. http://dumps.wikimedia.org/other/pagecounts-raw/ {"references": ["Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.", "Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Analysis for Search Trend Prediction. Proceedings of the Annual Conference of Japanese Society for Artificial Intelligence (in Japanese). vol.29, no.2I1-1, pp.1-4, 2015."]}

  6. Z

    Data from: Wikipedia Page Views of Japanese Comic

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoshida, Mitsuo (2020). Wikipedia Page Views of Japanese Comic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_60886
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Yoshida, Mitsuo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract (our paper)

    This paper investigates the page view and interlanguage link at Wikipedia for Japanese comic analysis. This paper is based on a preliminary investigation, and obtained three results, but the analysis is insufficient to use the results for a market research immediately. I am looking for research collaborators in order to conduct a more detailed analysis.

    Data

    Publication

    This data set was created for our study. If you make use of this data set, please cite: Mitsuo Yoshida. Preliminary Investigation for Japanese Comic Analysis using Wikipedia. Proceedings of the Fifth Asian Conference on Information Systems (ACIS 2016). pp.229-230, 2016.

  7. Most visited Wikipedia pages in the U.S. 2020, by visits

    • statista.com
    Updated Apr 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Most visited Wikipedia pages in the U.S. 2020, by visits [Dataset]. https://www.statista.com/statistics/1115251/most-visited-wikipedia-pages-usa/
    Explore at:
    Dataset updated
    Apr 28, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2020
    Area covered
    United States
    Description

    As of March 2020, the most visited Wikipedia page in the United States was "2020 Democratic party presidential primaries" with 2 million visits during the month. The second-most visited page was "2019-20 coronavirus pandemic" with 1.8 million visits. A significant portion of the top visited Wikipedia pages in March are related to the global coronavirus pandemic.

  8. Total global visitor traffic to Wikipedia.org 2024

    • statista.com
    • ai-chatbox.pro
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Total global visitor traffic to Wikipedia.org 2024 [Dataset]. https://www.statista.com/statistics/1259907/wikipedia-website-traffic/
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    In March 2024, close to 4.4 billion unique global visitors had visited Wikipedia.org, slightly down from 4.4 billion visitors since August of the same year. Wikipedia is a free online encyclopedia with articles generated by volunteers worldwide. The platform is hosted by the Wikimedia Foundation.

  9. Wikipedia Web Traffic 2018-19

    • kaggle.com
    Updated Apr 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    san_bt (2021). Wikipedia Web Traffic 2018-19 [Dataset]. https://www.kaggle.com/datasets/sandeshbhat/wikipedia-web-traffic-201819/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2021
    Dataset provided by
    Kaggle
    Authors
    san_bt
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    • Time Series: Time series is a set of observations recorded over regular interval of time, Time series can be beneficial in many fields like stock market prediction, weather forecasting. - Accounts for the fact that data points taken over time may have an internal structure (such as auto correlation, trend or seasonal variation) that should be accounted for.

    • Web traffic: Amount of data sent and received by visitors to a website. - Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country

    Content

    Contains Page Views for 60k Wikipedia articles in 8 different languages taken on a daily basis for 2 years.

    https://i.ibb.co/h1JCgpY/DSLC.png" alt="DSLC">

    A Data Science Life Cycle can be used to create a project. Forecasting can be done for any interval provided sufficient dataset is available. Refer the Github link in the tasks to view the forecast done using ARIMA and Prophet. Further feel free to contribute. Several other models can be used including a neural network to improve the results by many folds.

    Acknowledgements

    Credits :
    1. Wikipedia 2. Google

  10. h

    wikipedia-20250620

    • huggingface.co
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NeuML (2025). wikipedia-20250620 [Dataset]. https://huggingface.co/datasets/NeuML/wikipedia-20250620
    Explore at:
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    NeuML
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset Card for Wikipedia English June 2025

    Dataset created using this repo with a June 2025 Wikipedia snapshot. This repo also has a precomputed pageviews database. This database has the aggregated number of views for each page in Wikipedia. This file is built using the Wikipedia Pageview complete dumps

  11. f

    Wikipedia pagecounts sorted by page (year 2014)

    • figshare.com
    txt
    Updated Feb 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessio Bogon; Cristian Consonni; Alberto Montresor (2016). Wikipedia pagecounts sorted by page (year 2014) [Dataset]. http://doi.org/10.6084/m9.figshare.2085643.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 15, 2016
    Dataset provided by
    figshare
    Authors
    Alessio Bogon; Cristian Consonni; Alberto Montresor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the page view statistics for all the WikiMedia projects in the year 2014, ordered by (project, page, timestamp). It has been generated starting from the WikiMedia's pagecounts-raw[1] dataset.The CSV uses spaces as delimiter, without any form of escaping because it is not needed. It has 5 columns:* project: the project name* page: the page requested, url-escaped* timestamp: the timestamp of the hour (format: "%Y%m%d-%H%M%S")* count: the number of times the page has been requested (in that hour)* bytes: the number of bytes transferred (in that hour)You can download the full dataset via torrent[2].Further information about this dataset are available at:http://disi.unitn.it/~consonni/datasets/wikipedia-pagecounts-sorted-by-page-year-2014/[1] https://dumps.wikimedia.org/other/pagecounts-raw/[2] http://disi.unitn.it/~consonni/datasets/wikipedia-pagecounts-sorted-by-page-year-2014/#download

  12. Data from: English Wikipedia - Species Pages

    • gbif.org
    Updated Aug 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Döring; Markus Döring (2022). English Wikipedia - Species Pages [Dataset]. http://doi.org/10.15468/c3kkgh
    Explore at:
    Dataset updated
    Aug 23, 2022
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Authors
    Markus Döring; Markus Döring
    Description

    Species pages extracted from the English Wikipedia article XML dump from 2022-08-02. Multimedia, vernacular names and textual descriptions are extracted, but only pages with a taxobox or speciesbox template are recognized.

    See https://github.com/mdoering/wikipedia-dwca for details.

  13. f

    Statistics summarizing the view and edit history of selected Wikipedia...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam M. Wilson; Gene E. Likens (2023). Statistics summarizing the view and edit history of selected Wikipedia articles. [Dataset]. http://doi.org/10.1371/journal.pone.0134454.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Adam M. Wilson; Gene E. Likens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    a “Mean Daily Page Views” from http://toolserver.org/~emw/wikistats/ were only available after 2008-01-01 and include programmatic page requests.b include data from 2003-06-12 (when the most recent article, Heliocentrism, originated) through 2012-07-31, when this analysis was run.c Mean daily edit count excludes successive edits by the same user (n = 23,156).d Mean count of words changed (inserted, deleted, or changed, n = 8,525). Due to the heavily right-skewed distributions, geometric means and standard deviations are shown. The number of observations (n) is constant for mean edits per day because all days were included, while only days with at least one edit were used to calculate the mean words changed.Statistics summarizing the view and edit history of selected Wikipedia articles.

  14. f

    Selection of English Wikipedia pages (CNs) regarding topics with a direct...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt (2023). Selection of English Wikipedia pages (CNs) regarding topics with a direct relation to the emerging Hadoop (Big Data) market. [Dataset]. http://doi.org/10.1371/journal.pone.0141892.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Apache Hadoop is the central software project, beside Apache SOLR, and Apache Lucene (SW, software). Companies which offer Hadoop distributions and Hadoop based solutions are the central companies in the scope of the study (HV, hardware vendors). Other companies started very early with Hadoop related projects as early adopters (EA). Global players (GP) are affected by this emerging market, its opportunities and the new competitors (NC). Some new but highly relevant companies like Talend or LucidWorks have been selected because of their obvious commitment to the open source ideas. Widely adopted technologies with a relation to the selected research topic are represented by the group TEC.

  15. No. of page visits to the R page on Wikipedia

    • kaggle.com
    Updated Feb 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Dutta (2021). No. of page visits to the R page on Wikipedia [Dataset]. https://www.kaggle.com/gauravduttakiit/no-of-page-visits-to-the-r-page-on-wikipedia/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 28, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gaurav Dutta
    Description

    Dataset

    This dataset was created by Gaurav Dutta

    Contents

  16. WikiRank 05.2019 - quality, popularity and AI for Wikipedia articles

    • figshare.com
    bz2
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wiki Rank (2023). WikiRank 05.2019 - quality, popularity and AI for Wikipedia articles [Dataset]. http://doi.org/10.6084/m9.figshare.8231273.v2
    Explore at:
    bz2Available download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Wiki Rank
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes a list of over 39 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Quality scores of articles are based on Wikipedia dumps from May, 2019. Popularity and Authors' Interest based on activity in April 2019.License All files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/Format• page_id -- The identifier of the Wikipedia article (int), e.g. 4519301• page_name -- The title of the Wikipedia article (utf-8), e.g. General relativity• wikirank_quality -- quality score for Wikipedia article in a scale 0-100 (as of May 1, 2019)• poularity -- miedian of daily number of page views of the Wikipedia article during April 2019• authors_interest -- number of authors of the Wikipedia article during April 2019

  17. Z

    Kaggle Wikipedia Web Traffic Daily Dataset (without Missing Values)

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bergmeir, Christoph (2021). Kaggle Wikipedia Web Traffic Daily Dataset (without Missing Values) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3892918
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Hyndman, Rob
    Bergmeir, Christoph
    Montero-Manso, Pablo
    Godahewa, Rakshitha
    Webb, Geoff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.

    The original dataset contains missing values. They have been simply replaced by zeros.

  18. D

    Wikipedia page visits during the COVID-19 pandemic

    • ssh.datastations.nl
    csv
    Updated May 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maurice Vergeer; Maurice Vergeer (2024). Wikipedia page visits during the COVID-19 pandemic [Dataset]. http://doi.org/10.17026/SS/1VTO3P
    Explore at:
    csv(20418548)Available download formats
    Dataset updated
    May 16, 2024
    Dataset provided by
    DANS Data Station Social Sciences and Humanities
    Authors
    Maurice Vergeer; Maurice Vergeer
    License

    https://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58

    Time period covered
    Jan 3, 2020 - Mar 31, 2021
    Description

    Daily visits to Wikipedia articles about COVID-19

  19. COVID-19 Pandemic Wikipedia Readership

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin (2023). COVID-19 Pandemic Wikipedia Readership [Dataset]. http://doi.org/10.6084/m9.figshare.14548032.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data release includes two Wikipedia datasets related to the readership of the project as it relates to the early COVID-19 pandemic period. The first dataset is COVID-19 article page views by country, the second dataset is one hop navigation where one of the two pages are COVID-19 related. The data covers roughly the first six months of the pandemic, more specifically from January 1st 2020 to June 30th 2020. For more background on the pandemic in those months, see English Wikipedia's Timeline of the COVID-19 pandemic.Wikipedia articles are considered COVID-19 related according the methodology described here, the list of COVID-19 articles used for the released datasets is available in covid_articles.tsv. For simplicity and transparency, the same list of articles from 20 April 2020 was used for the entire dataset though in practice new COVID-19-relevant articles were constantly being created as the pandemic evolved.Privacy considerationsWhile this data is considered valuable for the insight that it can provide about information-seeking behaviors around the pandemic in its early months across diverse geographies, care must be taken to not inadvertently reveal information about the behavior of individual Wikipedia readers. We put in place a number of filters to release as much data as we can while minimizing the risk to readers.The Wikimedia foundation started to release most viewed articles by country from Jan 2021. At the beginning of the COVID-19 an exemption was made to store reader data about the pandemic with additional privacy protections:- exclude the page views from users engaged in an edit session- exclude reader data from specific countries (with a few exceptions)- the aggregated statistics are based on 50% of reader sessions that involve a pageview to a COVID-19-related article (see covid_pages.tsv). As a control, a 1% random sample of reader sessions that have no pageviews to COVID-19-related articles was kept. In aggregate, we make sure this 1% non-COVID-19 sample and 50% COVID-19 sample represents less than 10% of pageviews for a country for that day. The randomization and filters occurs on a daily cadence with all timestamps in UTC.- exclude power users - i.e. userhashes with greater than 500 pageviews in a day. This doubles as another form of likely bot removal, protects very heavy users of the project, and also in theory would help reduce the chance of a single user heavily skewing the data.- exclude readership from users of the iOS and Android Wikipedia apps. In effect, the view counts in this dataset represent comparable trends rather than the total amount of traffic from a given country. For more background on readership data per country data, and the COVID-19 privacy protections in particular, see this phabricator.To further minimize privacy risks, a k-anonymity threshold of 100 was applied to the aggregated counts. For example, a page needs to be viewed at least 100 times in a given country and week in order to be included in the dataset. In addition, the view counts are floored to a multiple of 100.DatasetsThe datasets published in this release are derived from a reader session dataset generated by the code in this notebook with the filtering described above. The raw reader session data itself will not be publicly available due to privacy considerations. The datasets described below are similar to the pageviews and clickstream data that the Wikimedia foundation publishes already, with the addition of the country specific counts.COVID-19 pageviewsThe file covid_pageviews.tsv contains:- pageview counts for COVID-19 related pages, aggregated by week and country- k-anonymity threshold of 100- example: In the 13th week of 2020 (23 March - 29 March 2020), the page 'Pandémie_de_Covid-19_en_Italie' on French Wikipedia was visited 11700 times from readers in Belgium- as a control bucket, we include pageview counts to all pages aggregated by week and country. Due to privacy considerations during the collection of the data, the control bucket was sampled at ~1% of all view traffic. The view counts for the control title are thus proportional to the total number of pageviews to all pages.The file is ~8 MB and contains ~134000 data points across the 27 weeks, 108 countries, and 168 projects.Covid reader session bigramsThe file covid_session_bigrams.tsv contains:- number of occurrences of visits to pages A -> B, where either A or B is a COVID-19 related article. Note that the bigrams are tuples (from, to) of articles viewed in succession, the underlying mechanism can be clicking on a link in an article, but it may also have been a new search or reading both articles based on links from third source articles. In contrast, the clickstream data is based on referral information only- aggregated by month and country- k-anonymity threshold of 100- example: In March of 2020, there were a 1000 occurences of readers accessing the page es.wikipedia/SARS-CoV-2 followed by es.wikipedia/Orthocoronavirinae from ChileThe file is ~10 MB and contains ~90000 bigrams across the 6 months, 96 countries, and 56 projects.ContactPlease reach out to research-feedback@wikimedia.org for any questions.

  20. f

    Wikimedia Desktop View

    • figshare.com
    bz2
    Updated Feb 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Halfaker (2022). Wikimedia Desktop View [Dataset]. http://doi.org/10.6084/m9.figshare.1254630.v1
    Explore at:
    bz2Available download formats
    Dataset updated
    Feb 2, 2022
    Dataset provided by
    figshare
    Authors
    Aaron Halfaker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each row represents a page view to the Desktop version of Wikimedia's sites (e.g. wikipedia, wiktionary, etc.) This dataset was gathered through the use of a snippet of Javascript that used a cookie to randomly sample readers at 1/1000. 7.12 million page view requests by 1.18 million users were recorded.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Windsor.ai (2024). Wikipedia Pageviews Fields [Dataset]. https://windsor.ai/data-field/wikipedia_pageviews/

Wikipedia Pageviews Fields

Explore at:
jsonAvailable download formats
Dataset updated
Jun 1, 2024
Dataset provided by
Windsor.ai
Variables measured
Today, Source, top.day, top.year, top.month, top.access, Data Source, top.project, top.articles, per-article.agent, and 6 more
Description

Auto-generated structured data of Wikipedia Pageviews from table Fields

Search
Clear search
Close search
Google apps
Main menu