100+ datasets found

e
wikipedia.org Traffic Analytics Data
analytics.explodingtopics.com
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). wikipedia.org Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/wikipedia.org
Explore at:
Dataset updated
Jun 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
Description
Traffic analytics, rankings, and competitive metrics for wikipedia.org as of June 2025
h
wikipedia-analysis
huggingface.co
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Kamali (2025). wikipedia-analysis [Dataset]. https://huggingface.co/datasets/omarkamali/wikipedia-analysis
Explore at:
Dataset updated
Jul 24, 2025
Authors
Omar Kamali
Description
omarkamali/wikipedia-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Selection of English Wikipedia pages (CNs) regarding topics with a direct...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt (2023). Selection of English Wikipedia pages (CNs) regarding topics with a direct relation to the emerging Hadoop (Big Data) market. [Dataset]. http://doi.org/10.1371/journal.pone.0141892.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0141892.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Mirko Kämpf; Eric Tessenow; Dror Y. Kenett; Jan W. Kantelhardt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Apache Hadoop is the central software project, beside Apache SOLR, and Apache Lucene (SW, software). Companies which offer Hadoop distributions and Hadoop based solutions are the central companies in the scope of the study (HV, hardware vendors). Other companies started very early with Hadoop related projects as early adopters (EA). Global players (GP) are affected by this emerging market, its opportunities and the new competitors (NC). Some new but highly relevant companies like Talend or LucidWorks have been selected because of their obvious commitment to the open source ideas. Widely adopted technologies with a relation to the selected research topic are represented by the group TEC.
E
A meta analysis of Wikipedia's coronavirus sources during the COVID-19...
live.european-language-grid.eu
txt
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7806
Explore at:
txtAvailable download formats
Dataset updated
Sep 8, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
At the height of the coronavirus pandemic, on the last day of March 2020, Wikipedia in all languages broke a record for most traffic in a single day. Since the breakout of the Covid-19 pandemic at the start of January, tens if not hundreds of millions of people have come to Wikipedia to read - and in some cases also contribute - knowledge, information and data about the virus to an ever-growing pool of articles. Our study focuses on the scientific backbone behind the content people across the world read: which sources informed Wikipedia’s coronavirus content, and how was the scientific research on this field represented on Wikipedia. Using citation as readout we try to map how COVID-19 related research was used in Wikipedia and analyse what happened to it before and during the pandemic. Understanding how scientific and medical information was integrated into Wikipedia, and what were the different sources that informed the Covid-19 content, is key to understanding the digital knowledge echosphere during the pandemic. To delimitate the corpus of Wikipedia articles containing Digital Object Identifier (DOI), we applied two different strategies. First we scraped every Wikipedia pages form the COVID-19 Wikipedia project (about 3000 pages) and we filtered them to keep only page containing DOI citations. For our second strategy, we made a search with EuroPMC on Covid-19, SARS-CoV2, SARS-nCoV19 (30’000 sci papers, reviews and preprints) and a selection on scientific papers form 2019 onwards that we compared to the Wikipedia extracted citations from the english Wikipedia dump of May 2020 (2’000’000 DOIs). This search led to 231 Wikipedia articles containing at least one citation of the EuroPMC search or part of the wikipedia COVID-19 project pages containing DOIs. Next, from our 231 Wikipedia articles corpus we extracted DOIs, PMIDs, ISBNs, websites and URLs using a set of regular expressions. Subsequently, we computed several statistics for each wikipedia article and we retrive Atmetics, CrossRef and EuroPMC infromations for each DOI. Finally, our method allowed to produce tables of citations annotated and extracted infromations in each wikipadia articles such as books, websites, newspapers.Files used as input and extracted information on Wikipedia's COVID-19 sources are presented in this archive.See the WikiCitationHistoRy Github repository for the R codes, and other bash/python scripts utilities related to this project.
Wikipedia Knowledge Graph dataset
zenodo.org
produccioncientifica.ugr.es
+1more
pdf, tsv
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas (2024). Wikipedia Knowledge Graph dataset [Dataset]. http://doi.org/10.5281/zenodo.6346900
Explore at:
tsv, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6346900
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Rodrigo Costas; Rodrigo Costas
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Wikipedia is the largest and most read online free encyclopedia currently existing. As such, Wikipedia offers a large amount of data on all its own contents and interactions around them, as well as different types of open data sources. This makes Wikipedia a unique data source that can be analyzed with quantitative data science techniques. However, the enormous amount of data makes it difficult to have an overview, and sometimes many of the analytical possibilities that Wikipedia offers remain unknown. In order to reduce the complexity of identifying and collecting data on Wikipedia and expanding its analytical potential, after collecting different data from various sources and processing them, we have generated a dedicated Wikipedia Knowledge Graph aimed at facilitating the analysis, contextualization of the activity and relations of Wikipedia pages, in this case limited to its English edition. We share this Knowledge Graph dataset in an open way, aiming to be useful for a wide range of researchers, such as informetricians, sociologists or data scientists.

There are a total of 9 files, all of them in tsv format, and they have been built under a relational structure. The main one that acts as the core of the dataset is the page file, after it there are 4 files with different entities related to the Wikipedia pages (category, url, pub and page_property files) and 4 other files that act as "intermediate tables" making it possible to connect the pages both with the latter and between pages (page_category, page_url, page_pub and page_link files).

The document Dataset_summary includes a detailed description of the dataset.

Thanks to Nees Jan van Eck and the Centre for Science and Technology Studies (CWTS) for the valuable comments and suggestions.
Use of data analytics application for M&A in the U.S. 2018
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Use of data analytics application for M&A in the U.S. 2018 [Dataset]. https://www.statista.com/statistics/943056/use-of-data-analytics-application-for-ma-usa/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2018
Area covered
United States
Description
This statistic presents the use of data analytics application for M&A in the United States in 2018. At that time, ** percent of executives surveyed considered data analytics to be a core concept of their M&A analysis.
wikipedia.org Website Traffic, Ranking, Analytics [July 2025]
semrush.com
stb2.digiseotools.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). wikipedia.org Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/wikipedia.org/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
wikipedia.org is ranked #7 in US with 4.76B Traffic. Categories: Newspapers, Online Services. Learn more about website traffic, market share, and more!
Data analytics tools in use by organizations in the United States 2015-2017
statista.com
Updated Dec 1, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2015). Data analytics tools in use by organizations in the United States 2015-2017 [Dataset]. https://www.statista.com/statistics/500119/united-states-survey-use-data-analytics-tools/
Explore at:
Dataset updated
Dec 1, 2015
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2015
Area covered
United States
Description
The statistic shows the analytics tools currently in use by business organizations in the United States, as well as the analytics tools respondents believe they will be using in two years, according to a 2015 survey conducted by the Harvard Business Review Analytics Service. As of 2015, ** percent of respondents believed they were going to use predictive analytics for data analysis in two years' time.
machine_learning_wikipedia
kaggle.com
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Learn (2025). machine_learning_wikipedia [Dataset]. https://www.kaggle.com/datasets/willlearn1/machine-learning-wikipedia/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Will Learn
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This is a basic web-scrape of the wikipedia entry on 'machine learning'. i've used this to break it into chunks in my program which will be used to provide 'context' for gen-AI to demonstrate RAG
Top uses of data an analytics within companies worldwide 2018
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Top uses of data an analytics within companies worldwide 2018 [Dataset]. https://www.statista.com/statistics/893798/worldwide-data-analytics-top-uses-companies/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2018
Area covered
Worldwide
Description
This statistic shows the ways that companies are using data and analytics worldwide as of 2018. Around ** percent of respondents stated that one of the top uses of data and analytics in their company was as a driver of strategy and change.
wikipedia.com Website Traffic, Ranking, Analytics [July 2025]
stb2.digiseotools.com
semrush.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). wikipedia.com Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://stb2.digiseotools.com/website/wikipedia.com/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://sem1.theseowheel.com/company/legal/terms-of-service/https://sem1.theseowheel.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
wikipedia.com is ranked #11826 in US with 1.53M Traffic. Categories: Online Services. Learn more about website traffic, market share, and more!
Usage areas for big data analytics in companies worldwide 2017
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Usage areas for big data analytics in companies worldwide 2017 [Dataset]. https://www.statista.com/statistics/933409/worldwide-big-data-analytics-usage-areas/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
Worldwide
Description
The statistic shows the areas in which companies use or plan to use big data analytics worldwide as of 2017. A quarter of respondents stated that their company currently uses big data analytics for marketing, with another ** percent stating that their business planned to use big data for marketing within the next 12 months.
nn.wikipedia.org Website Traffic, Ranking, Analytics [July 2025]
semrush.com
Updated Aug 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). nn.wikipedia.org Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://www.semrush.com/website/nn.wikipedia.org/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://www.semrush.com/company/legal/terms-of-service/https://www.semrush.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
nn.wikipedia.org is ranked #7 in US with 59.3K Traffic. Categories: . Learn more about website traffic, market share, and more!
d
Replication Data for: Measuring Wikipedia Article Quality in One Dimension...
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TeBlunthuis, Nathan (2024). Replication Data for: Measuring Wikipedia Article Quality in One Dimension by Extending ORES with Ordinal Regression [Dataset]. http://doi.org/10.7910/DVN/U5V0G1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/U5V0G1
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
TeBlunthuis, Nathan
Description
This dataset provides code, data, and instructions for replicating the analysis of Measuring Wikipedia Article Quality in One Dimension by Extending ORES with Ordinal Regression published in OpenSym 2021 (link to come). The paper introduces a method for transforming scores from the ORES quality models into a single dimensional measure of quality amenable for statistical analysis that is well-calibrated to a dataset. The purpose is to improve the validity of research into article quality through more precise measurement. The code and data for replicating the paper are found in this dataverse repository. If you wish to use method on a new dataset, you should obtain the actively maintaned version of the code from this git repository. If you attempt to replicate part of this repository please let me know via an email to nathante@uw.edu. Replicating the Analysis from the OpenSym Paper This project analyzes a sample of articles with quality labels from the English Wikipedia XML dumps from March 2020. Copies of the dumps are not provided in this dataset. They can be obtained via https://dumps.wikimedia.org/. Everything else you need to replicate the project (other than a sufficiently powerful computer) should be available here. The project is organized into stages. The prerequisite data files are provided at each stage so you do not need to rerun the entire pipeline from the beginning, which is not easily done without a high-performance computer. If you start replicating at an intermediate stage, this should overwrite the inputs to the downstream stages. This should make it easier to verify a partial replication. To help manage the size of the dataverse, all code files are included in code.tar.gz. Extracting this with tar xzvf code.tar.gz is the first step. Getting Set Up You need a version of R >= 4.0 and a version of Python >= 3.7.8. You also need a bash shell, tar, gzip, and make installed as they should be installed on any Unix system. To install brms you need a working C++ compiler. If you run into trouble see the instructions for installing Rstan. The datasets were built on CentOS 7, except for the ORES scoring which was done on Ubuntu 18.04.5 and building which was done on Debian 9. The RemembR and pyRembr projects provide simple tools for saving intermediate variables for building papers with LaTex. First, extract the articlequality.tar.gz, RemembR.tar.gz and pyRembr.tar.gz archives. Then, install the following: Python Packages Running the following steps in a new Python virtual environment is strongly recommended. Run pip3 install -r requirements.txt to install the Python dependencies. Then navigate into the pyRembr directory and run python3 setup.py install. R Packages Run Rscript install_requirements.R to install the necessary R libraries. If you run into trouble installing brms see the instructions on Drawing a Sample of Labeled Articles I provide steps and intermediate data files for replicating the sampling of labeled articles. The steps in this section are quite computationally intensive. Those only interested in replicating the models and analyses should skip this section. Extracting Metadata from Wikipedia Dumps Metadata from the Wikipedia dumps is required for calibrating models to the revision and article levels of analysis. You can use the wikiq Python script from the mediawiki dump tools git repository to extract metadata from the XML dumps as TSV files. The version of wikiq that was used is provided here. Running Wikiq on a full dump of English Wikipedia in a reasonable amount of requires considerable computing resources. For this project, Wikiq was run on Hyak a high performance computer at the University of Washington. The code for doing so is highly speicific to Hyak. For transparency and in case it helps others using similar academic computers this code is included in WikiqRunning.tar.gz. A copy of the wikiq output is included in this dataset in the multi-part archive enwiki202003-wikiq.tar.gz. To extract this archive, download all the parts and then run cat enwiki202003-wikiq.tar.gz* > enwiki202003-wikiq.tar.gz && tar xvzf enwiki202003-wikiq.tar.gz. Obtaining Quality Labels for Articles We obtain up-to-date labels for each article using the articlequality python package included in articlequality.tar.gz. The XML dumps are also the input to this step, and while it does not require a great deal of memory, a powerful computer (we used 28 cores) is helpful so that it completes in a reasonable amount of time. extract_quality_labels.sh runs the command to extract the labels from the xml dumps. The resulting files have the format data/enwiki-20200301-pages-meta-history*.xml-p*.7z_article_labelings.json and are included in this dataset in the archive enwiki202003-article_labelings-json.tar.gz. Taking a Sample of Quality Labels I used Apache Spark to merge the metadata from Wikiq with the quality labels and to draw a sample of articles where each quality class is equally represented. To...
Polish Wikipedia articles with "Cite web" templates linking to celebrity...
figshare.com
txt
Updated Dec 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krzysztof Jasiutowicz (2018). Polish Wikipedia articles with "Cite web" templates linking to celebrity gossip blogs and websites [Dataset]. http://doi.org/10.6084/m9.figshare.7441154.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7441154.v1
Dataset updated
Dec 9, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Krzysztof Jasiutowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Polish Wikipedia articles with "Cite web" templates linking to celebrity gossip blogs and websites.
Wikipedia Web Traffic 2018-19
kaggle.com
Updated Apr 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
san_bt (2021). Wikipedia Web Traffic 2018-19 [Dataset]. https://www.kaggle.com/datasets/sandeshbhat/wikipedia-web-traffic-201819/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2021
Dataset provided by
Kaggle
Authors
san_bt
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Time Series: Time series is a set of observations recorded over regular interval of time, Time series can be beneficial in many fields like stock market prediction, weather forecasting. - Accounts for the fact that data points taken over time may have an internal structure (such as auto correlation, trend or seasonal variation) that should be accounted for.

Web traffic: Amount of data sent and received by visitors to a website. - Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country

Content

Contains Page Views for 60k Wikipedia articles in 8 different languages taken on a daily basis for 2 years.

https://i.ibb.co/h1JCgpY/DSLC.png" alt="DSLC">

A Data Science Life Cycle can be used to create a project. Forecasting can be done for any interval provided sufficient dataset is available. Refer the Github link in the tasks to view the forecast done using ARIMA and Prophet. Further feel free to contribute. Several other models can be used including a neural network to improve the results by many folds.

Acknowledgements

Credits :
1. Wikipedia 2. Google
n
Data from: Robust clustering of languages across Wikipedia growth
data.niaid.nih.gov
datadryad.org
zip
Updated Sep 19, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristina Ban; Matjaž Perc; Zoran Levnajić (2017). Robust clustering of languages across Wikipedia growth [Dataset]. http://doi.org/10.5061/dryad.sk0q2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.sk0q2
Dataset updated
Sep 19, 2017
Dataset provided by
University of Maribor
Faculty of Information Studies, Ljubljanska cesta 31A, 8000 Novo Mesto, Slovenia
Authors
Kristina Ban; Matjaž Perc; Zoran Levnajić
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy.
Leading methods of data analytics application in M&A in the U.S. 2018
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading methods of data analytics application in M&A in the U.S. 2018 [Dataset]. https://www.statista.com/statistics/943048/methods-of-data-analytics-application-in-manda-usa/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2018
Area covered
United States
Description
This statistic presents the leading methods of data analytics application in the mergers and acquisitions sector in the United States in 2018. At that time, ** percent of executives surveyed were using data analytics on customers and markets.
wikipedia.nl Website Traffic, Ranking, Analytics [July 2025]
stb2.digiseotools.com
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Semrush (2025). wikipedia.nl Website Traffic, Ranking, Analytics [July 2025] [Dataset]. https://stb2.digiseotools.com/website/wikipedia.nl/overview/
Explore at:
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Semrushhttps://fr.semrush.com/
License
https://sem1.theseowheel.com/company/legal/terms-of-service/https://sem1.theseowheel.com/company/legal/terms-of-service/
Time period covered
Aug 12, 2025
Area covered
Worldwide
Variables measured
visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
Measurement technique
Semrush Traffic Analytics; Click-stream data
Description
wikipedia.nl is ranked #6981 in NL with 105.52K Traffic. Categories: . Learn more about website traffic, market share, and more!
Citations with identifiers in Wikipedia
figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Halfaker; Bahodir Mansurov; Miriam Redi; Dario Taraborelli (2023). Citations with identifiers in Wikipedia [Dataset]. http://doi.org/10.6084/m9.figshare.1299540.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1299540.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Aaron Halfaker; Bahodir Mansurov; Miriam Redi; Dario Taraborelli
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes a list of citations with identifiers extracted from the most recent version of Wikipedia across all language editions. The data was parsed from the Wikipedia content dumps published on March 1, 2018. License All files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/ Projects Previous versions of this dataset ("Scholarly citations in Wikipedia") were limited to the English language edition. The current version includes one dataset for each of the 298 languages editions that Wikipedia supports as of March 2018. Projects are identified by their ISO 639-1/639-2 language code, per https://meta.wikimedia.org/wiki/List_of_Wikipedias. Identifiers • PubMed IDs (pmid) and PubMedCentral IDs (pmcid).• Digital Object Identifiers (doi)• International Standard Book Number (isbn)• ArXiv Ids (arxiv) Format Each row in the dataset represents a citation as a (Wikipedia article, cited source) pair. Metadata about when the citation was first added is included. • page_id -- The identifier of the Wikipedia article (int), e.g. 1325125• page_title -- The title of the Wikipedia article (utf-8), e.g. Club cell• rev_id -- The Wikipedia revision where the citation was first added (int), e.g. 282470030• timestamp -- The timestamp of the revision where the citation was first added. (ISO 8601 datetime), e.g. 2009-04-08T01:52:20Z• type -- The type of identifier, e.g. pmid• id -- The id of the cited source (utf-8), e.g. 18179694 Source code https://github.com/halfak/Extract-scholarly-article-citations-from-Wikipedia (MIT Licensed) A copy of this dataset is also available at https://analytics.wikimedia.org/datasets/archive/public-datasets/all/mwrefs/Notes Citation identifers are extracted as-is from Wikipedia article content. Our spot-checking suggests that 98% of identifiers resolve.