40 datasets found

E
A meta analysis of Wikipedia's coronavirus sources during the COVID-19...
live.european-language-grid.eu
zenodo.org
txt
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7806
Explore at:
txtAvailable download formats
Dataset updated
Sep 8, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
At the height of the coronavirus pandemic, on the last day of March 2020, Wikipedia in all languages broke a record for most traffic in a single day. Since the breakout of the Covid-19 pandemic at the start of January, tens if not hundreds of millions of people have come to Wikipedia to read - and in some cases also contribute - knowledge, information and data about the virus to an ever-growing pool of articles. Our study focuses on the scientific backbone behind the content people across the world read: which sources informed Wikipedia’s coronavirus content, and how was the scientific research on this field represented on Wikipedia. Using citation as readout we try to map how COVID-19 related research was used in Wikipedia and analyse what happened to it before and during the pandemic. Understanding how scientific and medical information was integrated into Wikipedia, and what were the different sources that informed the Covid-19 content, is key to understanding the digital knowledge echosphere during the pandemic. To delimitate the corpus of Wikipedia articles containing Digital Object Identifier (DOI), we applied two different strategies. First we scraped every Wikipedia pages form the COVID-19 Wikipedia project (about 3000 pages) and we filtered them to keep only page containing DOI citations. For our second strategy, we made a search with EuroPMC on Covid-19, SARS-CoV2, SARS-nCoV19 (30’000 sci papers, reviews and preprints) and a selection on scientific papers form 2019 onwards that we compared to the Wikipedia extracted citations from the english Wikipedia dump of May 2020 (2’000’000 DOIs). This search led to 231 Wikipedia articles containing at least one citation of the EuroPMC search or part of the wikipedia COVID-19 project pages containing DOIs. Next, from our 231 Wikipedia articles corpus we extracted DOIs, PMIDs, ISBNs, websites and URLs using a set of regular expressions. Subsequently, we computed several statistics for each wikipedia article and we retrive Atmetics, CrossRef and EuroPMC infromations for each DOI. Finally, our method allowed to produce tables of citations annotated and extracted infromations in each wikipadia articles such as books, websites, newspapers.Files used as input and extracted information on Wikipedia's COVID-19 sources are presented in this archive.See the WikiCitationHistoRy Github repository for the R codes, and other bash/python scripts utilities related to this project.
o
Wikipedia Articles Dataset
opendatabay.com
.undefined
Updated May 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2025). Wikipedia Articles Dataset [Dataset]. https://www.opendatabay.com/data/premium/b6292674-e94d-4a7e-93c0-00cf1474ffdd
Explore at:
.undefinedAvailable download formats
Dataset updated
May 25, 2025
Dataset authored and provided by
Bright Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
Access a wealth of information, including article titles, raw text, images, and structured references. Popular use cases include knowledge extraction, trend analysis, and content development.

Use our Wikipedia Articles dataset to access a vast collection of articles across a wide range of topics, from history and science to culture and current events. This dataset offers structured data on articles, categories, and revision histories, enabling deep analysis into trends, knowledge gaps, and content development.

Tailored for researchers, data scientists, and content strategists, this dataset allows for in-depth exploration of article evolution, topic popularity, and interlinking patterns. Whether you are studying public knowledge trends, performing sentiment analysis, or developing content strategies, the Wikipedia Articles dataset provides a rich resource to understand how information is shared and consumed globally.

Dataset Features - url: Direct URL to the original Wikipedia article.
- title: The title or name of the Wikipedia article.
- table_of_contents: A list or structure outlining the article's sections and hierarchy.
- raw_text: Unprocessed full text content of the article.
- cataloged_text: Cleaned and structured version of the article’s content, optimized for analysis.
- images: Links or data on images embedded in the article.
- see_also: Related articles linked under the “See Also” section.
- references: Sources cited in the article for credibility.
- external_links: Links to external websites or resources mentioned in the article.
- categories: Tags or groupings classifying the article by topic or domain.
- timestamp: Last edit date or revision time of the article snapshot.

Distribution - Data Volume: 11 Columns and 2.19 M Rows
- Format: CSV

Usage This dataset supports a wide range of applications: - Knowledge Extraction: Identify key entities, relationships, or events from Wikipedia content.
- Content Strategy & SEO: Discover trending topics and content gaps.
- Machine Learning: Train NLP models (e.g., summarisation, classification, QA systems).
- Historical Trend Analysis: Study how public interest in topics changes over time.
- Link Graph Modeling: Understand how information is interconnected.

Coverage - Geographic Coverage: Global (multi-language Wikipedia versions also available)
- Time Range: Continuous updates; snapshots available from early 2000s to present.

License

CUSTOM

Please review the respective licenses below:

Data Provider's License

Bright Data Master Service Agreement

Who Can Use It - Data Scientists: For training or testing NLP and information retrieval systems.
- Researchers: For computational linguistics, social science, or digital humanities.
- Businesses: To enhance AI-powered content tools or customer insight platforms.
- Educators/Students: For building projects, conducting research, or studying knowledge systems.

Suggested Dataset Names 1. Wikipedia Corpus+
2. Wikipedia Stream Dataset
3. Wikipedia Knowledge Bank
4. Open Wikipedia Dataset

Pricing

Based on Delivery frequency

~Up to $0.0025 per record. Min order $250

Approximately 283 new records are added each month. Approximately 1.12M records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.

Monthly

New snapshot each month, 12 snapshots/year Paid monthly

Quarterly

New snapshot each quarter, 4 snapshots/year Paid quarterly

Bi-annual

New snapshot every 6 months, 2 snapshots/year Paid twice-a-year

One-time purchase

New snapshot one-time delivery Paid once
d
Data from: Robust clustering of languages across Wikipedia growth
datadryad.org
data.niaid.nih.gov
zip
Updated Sep 19, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristina Ban; Matjaž Perc; Zoran Levnajić (2017). Robust clustering of languages across Wikipedia growth [Dataset]. http://doi.org/10.5061/dryad.sk0q2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.sk0q2
Dataset updated
Sep 19, 2017
Dataset provided by
Dryad
Authors
Kristina Ban; Matjaž Perc; Zoran Levnajić
Time period covered
2017
Description
Wikipedia dataTime stamps of 14962 Wikipedia articles across 26 different languages over a span of 15 years.data.txt
f
Wiki Surveys: Open and Quantifiable Social Data Collection
plos.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew J. Salganik; Karen E. C. Levy (2023). Wiki Surveys: Open and Quantifiable Social Data Collection [Dataset]. http://doi.org/10.1371/journal.pone.0123483
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0123483
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Matthew J. Salganik; Karen E. C. Levy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the social sciences, there is a longstanding tension between data collection methods that facilitate quantification and those that are open to unanticipated information. Advances in technology now enable new, hybrid methods that combine some of the benefits of both approaches. Drawing inspiration from online information aggregation systems like Wikipedia and from traditional survey research, we propose a new class of research instruments called wiki surveys. Just as Wikipedia evolves over time based on contributions from participants, we envision an evolving survey driven by contributions from respondents. We develop three general principles that underlie wiki surveys: they should be greedy, collaborative, and adaptive. Building on these principles, we develop methods for data collection and data analysis for one type of wiki survey, a pairwise wiki survey. Using two proof-of-concept case studies involving our free and open-source website www.allourideas.org, we show that pairwise wiki surveys can yield insights that would be difficult to obtain with other methods.
Z
Data from: Wikipedia Page Views of Japanese Comic
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoshida, Mitsuo (2020). Wikipedia Page Views of Japanese Comic [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_60886
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Yoshida, Mitsuo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Abstract (our paper)

This paper investigates the page view and interlanguage link at Wikipedia for Japanese comic analysis. This paper is based on a preliminary investigation, and obtained three results, but the analysis is insufficient to use the results for a market research immediately. I am looking for research collaborators in order to conduct a more detailed analysis.

Data

Publication

This data set was created for our study. If you make use of this data set, please cite: Mitsuo Yoshida. Preliminary Investigation for Japanese Comic Analysis using Wikipedia. Proceedings of the Fifth Asian Conference on Information Systems (ACIS 2016). pp.229-230, 2016.
d
Replication Data for: Revisiting 'The Rise and Decline' in a Population of...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SG3LP1
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
Description
This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.
A global reference database of crowdsourced cropland data collected using...
doi.pangaea.de
search.dataone.org
+2more
html, tsv
Updated Mar 24, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linda See (2017). A global reference database of crowdsourced cropland data collected using the Geo-Wiki platform [Dataset]. http://doi.org/10.1594/PANGAEA.873912
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.873912
Dataset updated
Mar 24, 2017
Dataset provided by
International Institute for Applied Systems Analysishttp://www.iiasa.ac.at/
PANGAEA
Authors
Linda See
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Variables measured
Comment, File name, File size, File format, File content, Uniform resource locator/link to file
Description
A global reference dataset on cropland was collected through a crowdsourcing campaign implemented using Geo-Wiki. This reference dataset is based on a systematic sample at latitude and longitude intersections, enhanced in locations where the cropland probability varies between 25-75% for a better representation of cropland globally. Over a three week period, around 36K samples of cropland were collected. For the purpose of quality assessment, additional datasets are provided. One is a control dataset of 1793 sample locations that have been validated by students trained in image interpretation. This dataset was used to assess the quality of the crowd validations as the campaign progressed. Another set of data contains 60 expert or gold standard validations for additional evaluation of the quality of the participants. These three datasets have two parts, one showing cropland only and one where it is compiled per location and user. This reference dataset will be used to validate and compare medium and high resolution cropland maps that have been generated using remote sensing. The dataset can also be used to train classification algorithms in developing new maps of land cover and cropland extent.
A
‘Wien Geschichte Wiki Wien’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Wien Geschichte Wiki Wien’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-europa-eu-wien-geschichte-wiki-wien-3e1a/latest
Explore at:
Dataset updated
Jan 10, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Vienna
Description
Analysis of ‘Wien Geschichte Wiki Wien’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/71ef28d3-58a4-4087-8b31-c192792beb0e on 10 January 2022.

--- Dataset description provided by original source is as follows ---

Semantische Daten aus der historischen Wissensplattform der Stadt Wien. Derzeit werden historische Daten zu Gebäuden, Topografischen Objekten und Organisationen zur Verfügung gestellt.

--- Original source retains full ownership of the source dataset ---
A
‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-list-of-top-data-breaches-2004-2021-e7ac/746cf4e2/?iid=002-608&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘List of Top Data Breaches (2004 - 2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hishaamarmghan/list-of-top-data-breaches-2004-2021 on 14 February 2022.

--- Dataset description provided by original source is as follows ---

This is a dataset containing all the major data breaches in the world from 2004 to 2021

As we know, there is a big issue related to the privacy of our data. Many major companies in the world still to this day face this issue every single day. Even with a great team of people working on their security, many still suffer. In order to tackle this situation, it is only right that we must study this issue in great depth and therefore I pulled this data from Wikipedia to conduct data analysis. I would encourage others to take a look at this as well and find as many insights as possible.

This data contains 5 columns: 1. Entity: The name of the company, organization or institute 2. Year: In what year did the data breach took place 3. Records: How many records were compromised (can include information like email, passwords etc.) 4. Organization type: Which sector does the organization belong to 5. Method: Was it hacked? Were the files lost? Was it an inside job?

Here is the source for the dataset: https://en.wikipedia.org/wiki/List_of_data_breaches

Here is the GitHub link for a guide on how it was scraped: https://github.com/hishaamarmghan/Data-Breaches-Scraping-Cleaning

--- Original source retains full ownership of the source dataset ---
Reference usage and in-page reuse, all Wikimedia wikis, snapshot 2024-05-01
figshare.com
bin
Updated Jun 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Wight (2024). Reference usage and in-page reuse, all Wikimedia wikis, snapshot 2024-05-01 [Dataset]. http://doi.org/10.6084/m9.figshare.26003965.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26003965.v1
Dataset updated
Jun 10, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Adam Wight
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
OverviewThis data was produced by Wikimedia Germany’s Technical Wishes team, and focuses on usage statistics for reference footnotes made using the Cite extension, across Main-namespace pages (articles) on nearly all Wikimedia sites. It was produced by processing the Wikimedia Enterprise HTML dumps.Our analysis of references was inspired by "Characterizing Wikipedia Citation Usage” and other research. Our specific goal was to understand the potential for improving the ways in which references can be reused within a page.Reference tags are frequently used in conjunction with wikitext templates, which is challenging . For this reason, we decided to parse the rendered HTML pages rather than the original wikitext.We didn’t look at reuse across pages for this analysis.LicenseAll files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/The source code is distributed under BSD-3-Clause.Source code and pluggable frameworkThe dumps were processed by HTML dump scraper v0.3.1 written in the Elixir language.The job was run on the Wikimedia Analytics Cluster to take advantage of its high-speed access to HTML dumps. Production configuration is included in the source code repository, and the commandline used to run was: “MIX_ENV=prod mix run pipeline.exs” .Our team plans to continue development of the scraper to support future projects as well.Suggestions for new or improved analysis units are welcomed.Data formatFiles are provided at several levels of granularity, from per-page and per-wiki analysis through all-wikis comparisons.Files are either ND-JSON (newline-delimited JSON), plain JSON or CSV.Column definitionsColumns are documented in metrics.md .Page summariesFine-grained results in which each line represents the summarization of a single wiki page.Example file name: enwiki-20240501-page-summary.ndjson.gzExample metrics found in these files:How many tags are created from templates vs. directly in the article.How many references contain a template transclusion to produce their content.How many references are unnamed, automatically, or manually named.How often references are reused via their name.Copy-pasted references that share the same or almost the same content, on the same page.Whether an article has more than one references list.Wiki summariesPage analyses are rolled up to the wiki level, in a separate file for each wiki.Example file name: enwiki-20240501-summary.jsonTop-level comparisonSummarized statistics for each wiki are collected into a single file.Non-scalar fields are discarded for now and various aggregations are used, as can be seen from aggregated column name suffixes.File name: all-wikis-20240501-summary.csvError count comparisonWe’re also collecting a total count of different Cite errors for each wiki. File name: all-wikis-20240501-cite-error-summary.csvEnvironmental costsThere were several rounds of experimentation and mistakes, costs below should be multiplied by 3-4.The computation took 4.5 days at 24x vCPU sharing 2 GB of memory at a data center in Virginia, US. Estimating the environmental impact through https://www.green-algorithms.org/ we get an upper bound of 12.6 kg CO2e, or 40.8 kWh, or 72 km driven in a passenger car.Disk usage was significant as well, with 827 GB read and 4 GB written. At the high estimate of 7 kWh/GB, this could have used as much as 5.8 MWh of energy, but likely much less since streaming was contained within one data center.
Polish Wikipedia articles with "Cite web" templates linking to celebrity...
figshare.com
txt
Updated Dec 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krzysztof Jasiutowicz (2018). Polish Wikipedia articles with "Cite web" templates linking to celebrity gossip blogs and websites [Dataset]. http://doi.org/10.6084/m9.figshare.7441154.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7441154.v1
Dataset updated
Dec 9, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Krzysztof Jasiutowicz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Polish Wikipedia articles with "Cite web" templates linking to celebrity gossip blogs and websites.
I
Global Wiki Software Market Revenue Forecasts 2025-2032
statsndata.org
excel, pdf
Updated May 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Wiki Software Market Revenue Forecasts 2025-2032 [Dataset]. https://www.statsndata.org/report/wiki-software-market-376659
Explore at:
excel, pdfAvailable download formats
Dataset updated
May 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Wiki Software market has emerged as a vital component in the digital landscape, enabling organizations to foster collaboration, knowledge sharing, and information management with ease. With its roots in user-generated content, Wiki Software provides a dynamic platform for teams and individuals to create, edit, a
p
Automated SARS-CoV-2 Nanopore Sequencing Analysis for Pandemic-Scale...
pathogens.se
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Automated SARS-CoV-2 Nanopore Sequencing Analysis for Pandemic-Scale Diagnostics [Dataset]. https://www.pathogens.se/highlights/nanopore_sequencing/
Explore at:
Dataset updated
Jul 2, 2025
Description
Researchers from Uppsala University Hospital developed and validated two automated workflows within the GUI-based software Geneious Prime 2022.1.1. Validation data and tools are openly available via GitHub, and the Sequence Read Archive.
Wiki Cat Price Prediction for Jul 30, 2025
coinunited.io
Updated Jul 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CoinUnited.io (2025). Wiki Cat Price Prediction for Jul 30, 2025 [Dataset]. https://coinunited.io/en/data/prices/crypto/wiki-cat-wkc/price-prediction
Explore at:
Dataset updated
Jul 2, 2025
Dataset provided by
CoinUnited.io
License
https://coinunited.io/termshttps://coinunited.io/terms
Variables measured
baseCasePrice, tradingSignal, predictionDate, bearishCasePrice, bullishCasePrice, priceChangePercentage
Description
Detailed price prediction analysis for Wiki Cat on Jul 30, 2025, including bearish case ($0), base case ($0), and bullish case ($0) scenarios with Buy trading signal based on technical analysis and market sentiment indicators.
Real System Failures - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Real System Failures - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/real-system-failures
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This resource area contains descriptions of actual electronic systems failure scenarios with an emphasis on the diversity of failure modes and effects that can befall dependable systems. Introductory pages begin here. The descriptions begin here. These pages are separated into sections. Each section starts with a List of failure scenarios. In between the List slides are slides that give more information on those scenarios which warrant more than a bullet or two of explanation. Some references are listed here. A list of acronyms and initialisms is here. If you would like to add a story to this list or add additional significant details to an existing story, please contact Kevin Driscoll at For a not-quite-working wiki subset of this Resource area, click on the Wiki link just to the left of this Summary or go to the URL https://c3.nasa.gov/dashlink/projects/79/wiki/test_stories_split. Also, those who log in can add comments to the Discussions at the bottom of this page.
I
Global Enterprise Wiki Software Market Competitive Environment 2025-2032
statsndata.org
excel, pdf
Updated Jun 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stats N Data (2025). Global Enterprise Wiki Software Market Competitive Environment 2025-2032 [Dataset]. https://www.statsndata.org/report/global-96383
Explore at:
pdf, excelAvailable download formats
Dataset updated
Jun 2025
Dataset authored and provided by
Stats N Data
License
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
Area covered
Global
Description
The Enterprise Wiki Software market has emerged as a pivotal tool for organizations looking to enhance collaboration, knowledge sharing, and information management across their teams. As businesses increasingly recognize the need for efficient communication channels and centralized repositories of knowledge, the dem
Signed Graphs
kaggle.com
Updated Nov 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Signed Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-signed
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2021
Dataset provided by
Kaggle
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
soc-RedditHyperlinks: Social Network: Reddit Hyperlink Network

The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017.

Subreddit Hyperlink Network: the subreddit-to-subreddit hyperlink network is extracted from the posts that create hyperlinks from one subreddit to another. We say a hyperlink originates from a post in the source community and links to a post in the target community. Each hyperlink is annotated with three properties: the timestamp, the sentiment of the source community post towards the target community post, and the text property vector of the source post. The network is directed, signed, temporal, and attributed.

Note that each post has a title and a body. The hyperlink can be present in either the title of the post or in the body. Therefore, we provide one network file for each.

Subreddit Embeddings: We have also provided embedding vectors representing each subreddit. These can be found in this dataset link: subreddit embedding dataset. Please note that some subreddit embeddings could not be generated, so this file has 51,278 embeddings.

soc-sign-bitcoin-otc: Bitcoin OTC trust weighted signed network

This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.

soc-sign-bitcoin-alpha: Bitcoin Alpha trust weighted signed network

This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin Alpha. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin Alpha rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.

soc-sign-epinions: Epinions social network

This is who-trust-whom online social network of a a general consumer review site Epinions.com. Members of the site can decide whether to ''trust'' each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user.

wiki-Elec: Wikipedia adminship election data

Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. A small part of Wikipedia contributors are administrators, who are users with access to additional technical features that aid in maintenance. In order for a user to become an administrator a Request for adminship (RfA) is issued and the Wikipedia community via a public discussion or a vote decides who to promote to adminship. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all administrator elections and vote history data. This gave us nearly 2,800 elections with around 100,000 total votes and about 7,000 users participating in the elections (either casting a vote or being voted on). Out of these 1,200 elections resulted in a successful promotion, while about 1,500 elections did not result in the promotion. About half of the votes in the dataset are by existing admins, while the other half comes from ordinary Wikipedia users.

Dataset has the following format:

E: did the elector result in promotion (1) or not (0)

T: time election was closed

U: user id (and screen name) of editor that is being considered for promotion

N: user id (and screen name) of the nominator

V: vote(1:support, 0:neutral, -1:oppose) user_id time screen_name

wiki-RfA: Wikipedia Requests for Adminship (with text)

For a Wikipedia editor to become an administrator, a request for adminship (RfA) must be submitted, either by the candidate or by another community member. Subsequently, any Wikipedia member may cast a supporting, neutral, or opposing vote.

We crawled and parsed all votes since the adoption of the RfA process in 2003 through May 2013. The dataset contains 11,381 users (voters and votees) forming 189,004 distinct voter/votee pairs, for a total of 198,275 votes (this is larger than the number of distinct voter/votee pairs because, if the same user ran for election several times, the same voter/votee pair may contribute several votes).

This induces a directed, signed network in which nodes represent Wikipedia members and edges represent votes. In this sense, the...
A
‘Jeux de données de datagouv reliées à Wikidata’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Jeux de données de datagouv reliées à Wikidata’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-europa-eu-jeux-de-donnees-de-datagouv-reliees-a-wikidata-d52d/cd002da0/?iid=000-158&v=presentation
Explore at:
Dataset updated
Jan 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Jeux de données de datagouv reliées à Wikidata’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/5d889084634f415587c3c2c3 on 14 January 2022.

--- Dataset description provided by original source is as follows ---

Ce jeu de données est un export de la liste des éléments de Wikidata avec un identifiant de jeu de données data.gouv.fr (Propriété P6526).

Voir aussi

Organisations de data.gouv.fr reliées à Wikidata

--- Original source retains full ownership of the source dataset ---
Avatar: The Last Airbender Complete Transcript
kaggle.com
Updated Nov 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BrunoVR (2021). Avatar: The Last Airbender Complete Transcript [Dataset]. https://www.kaggle.com/brunovr/avatar-the-last-airbender-complete-transcript/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BrunoVR
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
In this project, I scrape the transcripts from the popular show Avatar: The Last Airbender from the fandom wiki. With this data, I do some basic EDA focusing on the character lines.

Getting the data

Scraping is done using BeautifulSoup and basic pandas functionality. The scraping process can be found in GettingTheData.ipynb.
f
S1 Data - A game-theoretic analysis of Wikipedia’s peer production: The...
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Santhanakrishnan Anand; Ofer Arazy; Narayan Mandayam; Oded Nov (2023). S1 Data - A game-theoretic analysis of Wikipedia’s peer production: The interplay between community’s governance and contributors’ interactions [Dataset]. http://doi.org/10.1371/journal.pone.0281725.s003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0281725.s003
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Santhanakrishnan Anand; Ofer Arazy; Narayan Mandayam; Oded Nov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
S1 Data - A game-theoretic analysis of Wikipedia’s peer production: The interplay between community’s governance and contributors’ interactions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2022). A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7806

A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic

Explore at:

txtAvailable download formats

Dataset updated

Sep 8, 2022

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

At the height of the coronavirus pandemic, on the last day of March 2020, Wikipedia in all languages broke a record for most traffic in a single day. Since the breakout of the Covid-19 pandemic at the start of January, tens if not hundreds of millions of people have come to Wikipedia to read - and in some cases also contribute - knowledge, information and data about the virus to an ever-growing pool of articles. Our study focuses on the scientific backbone behind the content people across the world read: which sources informed Wikipedia’s coronavirus content, and how was the scientific research on this field represented on Wikipedia. Using citation as readout we try to map how COVID-19 related research was used in Wikipedia and analyse what happened to it before and during the pandemic. Understanding how scientific and medical information was integrated into Wikipedia, and what were the different sources that informed the Covid-19 content, is key to understanding the digital knowledge echosphere during the pandemic. To delimitate the corpus of Wikipedia articles containing Digital Object Identifier (DOI), we applied two different strategies. First we scraped every Wikipedia pages form the COVID-19 Wikipedia project (about 3000 pages) and we filtered them to keep only page containing DOI citations. For our second strategy, we made a search with EuroPMC on Covid-19, SARS-CoV2, SARS-nCoV19 (30’000 sci papers, reviews and preprints) and a selection on scientific papers form 2019 onwards that we compared to the Wikipedia extracted citations from the english Wikipedia dump of May 2020 (2’000’000 DOIs). This search led to 231 Wikipedia articles containing at least one citation of the EuroPMC search or part of the wikipedia COVID-19 project pages containing DOIs. Next, from our 231 Wikipedia articles corpus we extracted DOIs, PMIDs, ISBNs, websites and URLs using a set of regular expressions. Subsequently, we computed several statistics for each wikipedia article and we retrive Atmetics, CrossRef and EuroPMC infromations for each DOI. Finally, our method allowed to produce tables of citations annotated and extracted infromations in each wikipadia articles such as books, websites, newspapers.Files used as input and extracted information on Wikipedia's COVID-19 sources are presented in this archive.See the WikiCitationHistoRy Github repository for the R codes, and other bash/python scripts utilities related to this project.

Clear search

Close search

Google apps

Main menu

A meta analysis of Wikipedia's coronavirus sources during the COVID-19...

Wikipedia Articles Dataset

Pricing

Based on Delivery frequency

Data from: Robust clustering of languages across Wikipedia growth

Wiki Surveys: Open and Quantifiable Social Data Collection

Data from: Wikipedia Page Views of Japanese Comic

Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

A global reference database of crowdsourced cropland data collected using...

‘Wien Geschichte Wiki Wien’ analyzed by Analyst-2

‘List of Top Data Breaches (2004 - 2021)’ analyzed by Analyst-2

Reference usage and in-page reuse, all Wikimedia wikis, snapshot 2024-05-01

Polish Wikipedia articles with "Cite web" templates linking to celebrity...

Global Wiki Software Market Revenue Forecasts 2025-2032

Automated SARS-CoV-2 Nanopore Sequencing Analysis for Pandemic-Scale...

Wiki Cat Price Prediction for Jul 30, 2025

Real System Failures - Dataset - NASA Open Data Portal

Global Enterprise Wiki Software Market Competitive Environment 2025-2032

Signed Graphs

soc-RedditHyperlinks: Social Network: Reddit Hyperlink Network

soc-sign-bitcoin-otc: Bitcoin OTC trust weighted signed network

soc-sign-bitcoin-alpha: Bitcoin Alpha trust weighted signed network

soc-sign-epinions: Epinions social network

wiki-Elec: Wikipedia adminship election data

wiki-RfA: Wikipedia Requests for Adminship (with text)

‘Jeux de données de datagouv reliées à Wikidata’ analyzed by Analyst-2

Avatar: The Last Airbender Complete Transcript

Getting the data

S1 Data - A game-theoretic analysis of Wikipedia’s peer production: The...

A meta analysis of Wikipedia's coronavirus sources during the COVID-19 pandemic