29 datasets found

w
Wikipedia Pageviews Fields
windsor.ai
json
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Windsor.ai (2024). Wikipedia Pageviews Fields [Dataset]. https://windsor.ai/data-field/wikipedia_pageviews/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 1, 2024
Dataset provided by
Windsor.ai
Variables measured
Today, Source, top.day, top.year, top.month, top.access, Data Source, top.project, top.articles, per-article.agent, and 6 more
Description
Auto-generated structured data of Wikipedia Pageviews from table Fields
English Wikipedia pageviews by second
figshare.com
huggingface.co
+1more
application/gzip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Os Keyes (2016). English Wikipedia pageviews by second [Dataset]. http://doi.org/10.6084/m9.figshare.1394684.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1394684.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Authors
Os Keyes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains a count of pageviews to the English-language Wikipedia from 2015-03-16T00:00:00 to 2015-04-25T15:59:59, grouped by timestamp (down to a one-second resolution level) and site (mobile or desktop). The smallest number of events in a group is 645; because of this, we are confident there should not be privacy implications of releasing this data.
Google Trends and Wikipedia Page Views
zenodo.org
explore.openaire.eu
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitsuo Yoshida; Mitsuo Yoshida (2020). Google Trends and Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14539
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mitsuo Yoshida; Mitsuo Yoshida
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract (our paper)

The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.

Data

personal-name.txt.gz:
The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014.

personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz:
The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view.

Publication

This data set was created for our study. If you make use of this data set, please cite:
Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.
http://dx.doi.org/10.1145/2786451.2786495
http://arxiv.org/abs/1509.02218 (author-created version)

Note

The raw data of Wikipedia page views is available in the following page.
http://dumps.wikimedia.org/other/pagecounts-raw/
Wikipedia English: number of page views 2023, by country
statista.com
Updated Dec 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Wikipedia English: number of page views 2023, by country [Dataset]. https://www.statista.com/statistics/1428253/wikipedia-english-page-views-country/
Explore at:
Dataset updated
Dec 13, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2023
Area covered
Worldwide
Description
In November 2023, the English version of Wikipedia received over 3 billion page views originating from the United States across all platforms. The United Kingdom was the country to generate the second-most page views for the subdomain, with 809.9 million views, followed by India, with 773.2 million visualizations.
Z
Yearly pageviews of English Wikipedia articles with potential links to green...
data.niaid.nih.gov
Updated Nov 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leva, Federico (2020). Yearly pageviews of English Wikipedia articles with potential links to green open access scholarly articles [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3783467
Explore at:
Dataset updated
Nov 16, 2020
Dataset authored and provided by
Leva, Federico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Number of visits in 2019 for a sample of 23462 English Wikipedia articles which contain references to academic sources which have a green open access copy available but not yet used. The consultation statistics were retrieved from the Wikimedia pageviews API using the Python client (script also included). The sample was selected among articles which in April 2020 had at least one citation of an academic paper (using the "cite journal" template) for which OAbot (through Unpaywall data) had found a green open access URL to add (gratis open access, not necessarily libre open access). Data shows that the top 1 % most visited articles received 30 % of the visits: over 500 million in the year, corresponding to 1 million potential citation link clicks to distribute across all references assuming a 0.2 % click-through rate per Piccardi et al. (2020).
Wikipedia Page Views of Japanese Comic
zenodo.org
data.niaid.nih.gov
application/gzip, bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitsuo Yoshida; Mitsuo Yoshida (2020). Wikipedia Page Views of Japanese Comic [Dataset]. http://doi.org/10.5281/zenodo.60886
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.60886
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mitsuo Yoshida; Mitsuo Yoshida
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract (our paper)

This paper investigates the page view and interlanguage link at Wikipedia for Japanese comic analysis. This paper is based on a preliminary investigation, and obtained three results, but the analysis is insufficient to use the results for a market research immediately. I am looking for research collaborators in order to conduct a more detailed analysis.

Data

Publication

This data set was created for our study. If you make use of this data set, please cite:
Mitsuo Yoshida. Preliminary Investigation for Japanese Comic Analysis using Wikipedia. Proceedings of the Fifth Asian Conference on Information Systems (ACIS 2016). pp.229-230, 2016.
h
wikipedia-20250620
huggingface.co
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NeuML (2025). wikipedia-20250620 [Dataset]. https://huggingface.co/datasets/NeuML/wikipedia-20250620
Explore at:
Dataset updated
Jul 3, 2025
Dataset authored and provided by
NeuML
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for Wikipedia English June 2025

Dataset created using this repo with a June 2025 Wikipedia snapshot. This repo also has a precomputed pageviews database. This database has the aggregated number of views for each page in Wikipedia. This file is built using the Wikipedia Pageview complete dumps
h
wikipedia-20240901
huggingface.co
Updated Sep 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NeuML (2024). wikipedia-20240901 [Dataset]. https://huggingface.co/datasets/NeuML/wikipedia-20240901
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2024
Dataset authored and provided by
NeuML
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for Wikipedia English September 2024

Dataset created using this repo with a September 2024 Wikipedia snapshot. This repo also has a precomputed pageviews database. This database has the aggregated number of views for each page in Wikipedia. This file is built using the Wikipedia Pageview complete dumps
Wikipedia Web Traffic 2018-19
kaggle.com
Updated Apr 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
san_bt (2021). Wikipedia Web Traffic 2018-19 [Dataset]. https://www.kaggle.com/datasets/sandeshbhat/wikipedia-web-traffic-201819/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2021
Dataset provided by
Kaggle
Authors
san_bt
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Time Series: Time series is a set of observations recorded over regular interval of time, Time series can be beneficial in many fields like stock market prediction, weather forecasting. - Accounts for the fact that data points taken over time may have an internal structure (such as auto correlation, trend or seasonal variation) that should be accounted for.

Web traffic: Amount of data sent and received by visitors to a website. - Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country

Content

Contains Page Views for 60k Wikipedia articles in 8 different languages taken on a daily basis for 2 years.

https://i.ibb.co/h1JCgpY/DSLC.png" alt="DSLC">

A Data Science Life Cycle can be used to create a project. Forecasting can be done for any interval provided sufficient dataset is available. Refer the Github link in the tasks to view the forecast done using ARIMA and Prophet. Further feel free to contribute. Several other models can be used including a neural network to improve the results by many folds.

Acknowledgements

Credits :
1. Wikipedia 2. Google
COVID-19 Pandemic Wikipedia Readership
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin (2023). COVID-19 Pandemic Wikipedia Readership [Dataset]. http://doi.org/10.6084/m9.figshare.14548032.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14548032.v3
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Isaac Johnson; Leila Zia; Joseph Allemandou; Marcel Ruiz Forns; Nuria Ruiz; Fabian Kaelin
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data release includes two Wikipedia datasets related to the readership of the project as it relates to the early COVID-19 pandemic period. The first dataset is COVID-19 article page views by country, the second dataset is one hop navigation where one of the two pages are COVID-19 related. The data covers roughly the first six months of the pandemic, more specifically from January 1st 2020 to June 30th 2020. For more background on the pandemic in those months, see English Wikipedia's Timeline of the COVID-19 pandemic.Wikipedia articles are considered COVID-19 related according the methodology described here, the list of COVID-19 articles used for the released datasets is available in covid_articles.tsv. For simplicity and transparency, the same list of articles from 20 April 2020 was used for the entire dataset though in practice new COVID-19-relevant articles were constantly being created as the pandemic evolved.Privacy considerationsWhile this data is considered valuable for the insight that it can provide about information-seeking behaviors around the pandemic in its early months across diverse geographies, care must be taken to not inadvertently reveal information about the behavior of individual Wikipedia readers. We put in place a number of filters to release as much data as we can while minimizing the risk to readers.The Wikimedia foundation started to release most viewed articles by country from Jan 2021. At the beginning of the COVID-19 an exemption was made to store reader data about the pandemic with additional privacy protections:- exclude the page views from users engaged in an edit session- exclude reader data from specific countries (with a few exceptions)- the aggregated statistics are based on 50% of reader sessions that involve a pageview to a COVID-19-related article (see covid_pages.tsv). As a control, a 1% random sample of reader sessions that have no pageviews to COVID-19-related articles was kept. In aggregate, we make sure this 1% non-COVID-19 sample and 50% COVID-19 sample represents less than 10% of pageviews for a country for that day. The randomization and filters occurs on a daily cadence with all timestamps in UTC.- exclude power users - i.e. userhashes with greater than 500 pageviews in a day. This doubles as another form of likely bot removal, protects very heavy users of the project, and also in theory would help reduce the chance of a single user heavily skewing the data.- exclude readership from users of the iOS and Android Wikipedia apps. In effect, the view counts in this dataset represent comparable trends rather than the total amount of traffic from a given country. For more background on readership data per country data, and the COVID-19 privacy protections in particular, see this phabricator.To further minimize privacy risks, a k-anonymity threshold of 100 was applied to the aggregated counts. For example, a page needs to be viewed at least 100 times in a given country and week in order to be included in the dataset. In addition, the view counts are floored to a multiple of 100.DatasetsThe datasets published in this release are derived from a reader session dataset generated by the code in this notebook with the filtering described above. The raw reader session data itself will not be publicly available due to privacy considerations. The datasets described below are similar to the pageviews and clickstream data that the Wikimedia foundation publishes already, with the addition of the country specific counts.COVID-19 pageviewsThe file covid_pageviews.tsv contains:- pageview counts for COVID-19 related pages, aggregated by week and country- k-anonymity threshold of 100- example: In the 13th week of 2020 (23 March - 29 March 2020), the page 'Pandémie_de_Covid-19_en_Italie' on French Wikipedia was visited 11700 times from readers in Belgium- as a control bucket, we include pageview counts to all pages aggregated by week and country. Due to privacy considerations during the collection of the data, the control bucket was sampled at ~1% of all view traffic. The view counts for the control title are thus proportional to the total number of pageviews to all pages.The file is ~8 MB and contains ~134000 data points across the 27 weeks, 108 countries, and 168 projects.Covid reader session bigramsThe file covid_session_bigrams.tsv contains:- number of occurrences of visits to pages A -> B, where either A or B is a COVID-19 related article. Note that the bigrams are tuples (from, to) of articles viewed in succession, the underlying mechanism can be clicking on a link in an article, but it may also have been a new search or reading both articles based on links from third source articles. In contrast, the clickstream data is based on referral information only- aggregated by month and country- k-anonymity threshold of 100- example: In March of 2020, there were a 1000 occurences of readers accessing the page es.wikipedia/SARS-CoV-2 followed by es.wikipedia/Orthocoronavirinae from ChileThe file is ~10 MB and contains ~90000 bigrams across the 6 months, 96 countries, and 56 projects.ContactPlease reach out to research-feedback@wikimedia.org for any questions.
w
Wikimedia user agents
data.wu.ac.at
tsv
Updated Mar 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wikimedia (2015). Wikimedia user agents [Dataset]. https://data.wu.ac.at/schema/datahub_io/YTYxNDBmYjItMjE2Ni00ZDQ4LThmZmQtOGUyMTQ5MTA2NDUz
Explore at:
tsvAvailable download formats
Dataset updated
Mar 6, 2015
Dataset provided by
Wikimedia
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A dataset of parsed reader and editor browser agents from the Wikimedia web properties. The intent behind releasing the parsed agents is to make it easier for Wikimedia developers to understand how to best test their software for the group they're targeting.

The actual data collection and anonymisation process varied between readers and editors. For readers, a 1:1000 sampled log of pageviews in February 2014 was taken. Any user agent that had more than 500 (in other words, 500,000) requests in a 24-hour period, from no fewer than 500/500,000 distinct IP addresses, was extracted, along with a count of how many times the agent appeared. For editors, a 90 day sample (December 2014 - February 2015) of user agents was taken globally; any user agent used by >= 50 distinct users was extracted, along with a count of the associated number of edits.
f
Selection of English Wikipedia pages (CNs) regarding topics with a direct...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt (2023). Selection of English Wikipedia pages (CNs) regarding topics with a direct relation to the emerging Hadoop (Big Data) market. [Dataset]. http://doi.org/10.1371/journal.pone.0141892.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0141892.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Apache Hadoop is the central software project, beside Apache SOLR, and Apache Lucene (SW, software). Companies which offer Hadoop distributions and Hadoop based solutions are the central companies in the scope of the study (HV, hardware vendors). Other companies started very early with Hadoop related projects as early adopters (EA). Global players (GP) are affected by this emerging market, its opportunities and the new competitors (NC). Some new but highly relevant companies like Talend or LucidWorks have been selected because of their obvious commitment to the open source ideas. Widely adopted technologies with a relation to the selected research topic are represented by the group TEC.
f
Sepsis information-seeking behaviors via Wikipedia between 2015 and 2018: A...
plos.figshare.com
figshare.com
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Craig S. Jabaley; Robert F. Groff; Theresa J. Barnes; Mark E. Caridi-Scheible; James M. Blum; Vikas N. O’Reilly-Shah (2023). Sepsis information-seeking behaviors via Wikipedia between 2015 and 2018: A mixed methods retrospective observational study [Dataset]. http://doi.org/10.1371/journal.pone.0221596
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0221596
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Craig S. Jabaley; Robert F. Groff; Theresa J. Barnes; Mark E. Caridi-Scheible; James M. Blum; Vikas N. O’Reilly-Shah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raising public awareness of sepsis, a potentially life-threatening dysregulated host response to infection, to hasten its recognition has become a major focus of physicians, investigators, and both non-governmental and governmental agencies. While the internet is a common means by which to seek out healthcare information, little is understood about patterns and drivers of these behaviors. We sought to examine traffic to Wikipedia, a popular and publicly available online encyclopedia, to better understand how, when, and why users access information about sepsis. Utilizing pageview traffic data for all available language localizations of the sepsis and septic shock pages between July 1, 2015 and June 30, 2018, significantly outlying daily pageview totals were identified using a seasonal hybrid extreme studentized deviate approach. Consecutive outlying days were aggregated, and a qualitative analysis was undertaken of print and online news media coverage to identify potential correlates. Traffic patterns were further characterized using paired referrer to resource (i.e. clickstream) data, which were available for a temporal subset of the pageviews. Of the 20,557,055 pageviews across 65 linguistic localizations, 47 of the 1,096 total daily pageview counts were identified as upward outliers. After aggregating sequential outlying days, 25 epochs were examined. Qualitative analysis identified at least one major news media correlate for each, which were typically related to high-profile deaths from sepsis and, less commonly, awareness promotion efforts. Clickstream analysis suggests that most sepsis and septic shock Wikipedia pageviews originate from external referrals, namely search engines. Owing to its granular and publicly available traffic data, Wikipedia holds promise as a means by which to better understand global drivers of online sepsis information seeking. Further characterization of user engagement with this information may help to elucidate means by which to optimize the visibility, content, and delivery of awareness promotion efforts.
Data from: The impact of news exposure on collective attention in the United...
zenodo.org
data.niaid.nih.gov
application/gzip, csv +1
Updated Mar 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michele Tizzoni; Michele Tizzoni; André Panisson; André Panisson; Daniela Paolotti; Daniela Paolotti; Ciro Cattuto; Ciro Cattuto (2020). The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic [Dataset]. http://doi.org/10.5281/zenodo.3603916
Explore at:
zip, csv, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3603916
Dataset updated
Mar 2, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michele Tizzoni; Michele Tizzoni; André Panisson; André Panisson; Daniela Paolotti; Daniela Paolotti; Ciro Cattuto; Ciro Cattuto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This repository contains the data of the study "The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic".

Epidemiological data

The folder zika_USA_weekly_cases_2016.zip contains weekly ZIKV incidence counts reported by the US Centers for Disease Control and Prevention in 2016, by state. Data were extracted from reports made publicly available by the CDC at: https://zenodo.org/record/584136#.Xk07-RNKjOQ

Web news data

The file news_GDELT_data.csv.gz contains all news items extracted from the GDELT platform (https://www.gdeltproject.org/) matching TAX_DISEASE_ZIKA as a Theme, and United_States as a Location in the GDELT platform.

TV closed captions

The file zika_TV_mentions_dataframe.csv contains all the TV news items of 2016 matching the word ``Zika" in the TV News Archive https://archive.org/details/tv

Wikipedia pageview counts

Dataset 1: wikipedia_dataset1_zika_daily_pageview_usa.csv

Content of each line of the dataset: day, pageview_count

The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, from January 1st to December 31st, 2016.

Dataset 2: wikipedia_dataset2_zika_daily_pageview_bystate.zip

Content of each line of the dataset: day, pageview_count, state

The dataset contains the daily number of pageview counts of 128 different Wikipedia pages related to the Zika virus (aggregated and summed to total) originated in the United States, disaggregated by state, from January 1st to December 31st, 2016.

Dataset 3: wikipedia_dataset3_zika_pagecount_by_city.csv

Content of each line of the dataset: US_city, pageview_count_Zika,pageview_count_total

The dataset contains the total number of pageview counts of 128 different Wikipedia pages related to the Zika virus (pageview_count_Zika) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016.The dataset also contains the total number of pageview counts to all Wikipedia pages (all Wikipedia projects, pageview_count_total) originated in 788 cities (US_city) of the United States with a population larger than 40,000 in 2016."
d
Replication Data for: Click, click boom: Using Wikipedia data to predict...
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oswald, Christian; Ohrenhofer, Daniel (2023). Replication Data for: Click, click boom: Using Wikipedia data to predict changes in battle-related deaths [Dataset]. http://doi.org/10.7910/DVN/W4BAN2
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/W4BAN2
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Oswald, Christian; Ohrenhofer, Daniel
Description
Data and methods development are key to improve our ability to forecast conflict. Relatively recent data sources such as mobile phone and social media data or images have received widespread attention in conflict research. Oftentimes these do not cover substantial parts of the globe or they are difficult to obtain and manipulate, which makes regular updating challenging. The sometimes vast amounts of data can also be computationally and financially costly. The data source we propose instead is cheap, readily and openly available, and updated in real time, and it provides global coverage: Wikipedia. We argue that the number of country page views can be considered a measure of interest or salience, whereas the number of page changes can be considered a measure of controversy between competing political views. We expect these predictors to be particularly successful in capturing tensions before a conflict escalates. We test our argument by predicting changes in battle-related deaths in Africa on the country-month level. We find evidence that country page views do increase predictive performance while page changes do not. Contrary to our expectation, our model seems to capture long-term trends better than sharp short-term changes.
f
Pageviews of pages with at least one DOI citation and the referrals from DOI...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lauren A. Maggio; John M. Willinsky; Ryan M. Steinberg; Daniel Mietchen; Joseph L. Wass; Ting Dong (2023). Pageviews of pages with at least one DOI citation and the referrals from DOI citations during August 2016. [Dataset]. http://doi.org/10.1371/journal.pone.0190046.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0190046.t006
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Lauren A. Maggio; John M. Willinsky; Ryan M. Steinberg; Daniel Mietchen; Joseph L. Wass; Ting Dong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pageviews of pages with at least one DOI citation and the referrals from DOI citations during August 2016.
d
Replication Data for: \"Using party press releases and Wikipedia page view...
dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Debus, Marc; Christopher Florczak (2023). Replication Data for: \"Using party press releases and Wikipedia page view data to analyse developments and determinants of parties’ issue prevalence: Evidence for the right-wing populist ‘Alternative for Germany’ [Dataset]. http://doi.org/10.7910/DVN/1XGQF2
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/1XGQF2
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Debus, Marc; Christopher Florczak
Description
This data replicates the findings of the manuscript 'Using party press releases and Wikipedia page view data to analyse developments and determinants of parties’ issue prevalence: Evidence for the right-wing populist ‘Alternative for Germany’'
WikiRank 05.2019 - quality, popularity and AI for Wikipedia articles
figshare.com
bz2
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wiki Rank (2023). WikiRank 05.2019 - quality, popularity and AI for Wikipedia articles [Dataset]. http://doi.org/10.6084/m9.figshare.8231273.v2
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8231273.v2
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Wiki Rank
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes a list of over 39 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Quality scores of articles are based on Wikipedia dumps from May, 2019. Popularity and Authors' Interest based on activity in April 2019.License All files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/Format• page_id -- The identifier of the Wikipedia article (int), e.g. 4519301• page_name -- The title of the Wikipedia article (utf-8), e.g. General relativity• wikirank_quality -- quality score for Wikipedia article in a scale 0-100 (as of May 1, 2019)• poularity -- miedian of daily number of page views of the Wikipedia article during April 2019• authors_interest -- number of authors of the Wikipedia article during April 2019
f
A season for all things: Phenological imprints in Wikipedia usage and their...
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John C. Mittermeier; Uri Roll; Thomas J. Matthews; Richard Grenyer (2023). A season for all things: Phenological imprints in Wikipedia usage and their relevance to conservation [Dataset]. http://doi.org/10.1371/journal.pbio.3000146
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.3000146
Dataset updated
May 31, 2023
Dataset provided by
PLOS Biology
Authors
John C. Mittermeier; Uri Roll; Thomas J. Matthews; Richard Grenyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Phenology plays an important role in many human–nature interactions, but these seasonal patterns are often overlooked in conservation. Here, we provide the first broad exploration of seasonal patterns of interest in nature across many species and cultures. Using data from Wikipedia, a large online encyclopedia, we analyzed 2.33 billion pageviews to articles for 31,751 species across 245 languages. We show that seasonality plays an important role in how and when people interact with plants and animals online. In total, over 25% of species in our data set exhibited a seasonal pattern in at least one of their language-edition pages, and seasonality is significantly more prevalent in pages for plants and animals than it is in a random selection of Wikipedia articles. Pageview seasonality varies across taxonomic clades in ways that reflect observable patterns in phenology, with groups such as insects and flowering plants having higher seasonality than mammals. Differences between Wikipedia language editions are significant; pages in languages spoken at higher latitudes exhibit greater seasonality overall, and species seldom show the same pattern across multiple language editions. These results have relevance to conservation policy formulation and to improving our understanding of what drives human interest in biodiversity.
Outlying Wikipedia sepsis and septic shock epochs with potential media...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Craig S. Jabaley; Robert F. Groff; Theresa J. Barnes; Mark E. Caridi-Scheible; James M. Blum; Vikas N. O’Reilly-Shah (2023). Outlying Wikipedia sepsis and septic shock epochs with potential media correlates (2015 to 2018). [Dataset]. http://doi.org/10.1371/journal.pone.0221596.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0221596.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Craig S. Jabaley; Robert F. Groff; Theresa J. Barnes; Mark E. Caridi-Scheible; James M. Blum; Vikas N. O’Reilly-Shah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Outlying Wikipedia sepsis and septic shock epochs with potential media correlates (2015 to 2018).