34 datasets found

Wiki climate
kaggle.com
zip
Updated Oct 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
stalker (2018). Wiki climate [Dataset]. https://www.kaggle.com/datasets/brankokokanovic/wiki-climate
Explore at:
zip(2597328 bytes)Available download formats
Dataset updated
Oct 11, 2018
Authors
stalker
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

Climate data obtained from Wikipedia climate boxes. Scraping code is on Github.

Context

Data consist of cities with population over 10.000. Not all cities have climate data. All data is normalized to metric units. Data varies a lot, but you can find temperatures (mean, low, high), humidity, precipitation.. among common ones. Data is per month, but there is also aggregation per year (depending on context, it can be avg, stdev, sum...). Data is not historical, but aggregated and aggregation years vary a lot.
WikiRank quality scores and measures for Wikipedia articles (April 2022)
figshare.com
application/gzip
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wiki Rank (2023). WikiRank quality scores and measures for Wikipedia articles (April 2022) [Dataset]. http://doi.org/10.6084/m9.figshare.19762927.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19762927.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Wiki Rank
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Those datasets include lists of over 43 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Additionally, the datasets contain the quality measures (metrics) which directly affect these scores. Quality measures were extracted based on Wikipedia dumps from April, 2022.

License All files included in this datasets are released under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ Format

page_id -- The identifier of the Wikipedia article (int), e.g. 840191 page_name -- The title of the Wikipedia article (utf-8), e.g. Sagittarius A* wikirank_quality -- quality score for Wikipedia article in a scale 0-100 (as of April 1, 2022). This is a synthetic measure that was calculated based on the metrics below (also included in the datasets). norm_len - normalized "page length" norm_refs - normalized "number of references" norm_img - normalized "number of images" norm_sec - normalized "number of sections" norm_reflen - normalized "references per length ratio" norm_authors - normalized "number of authors" (without bots and anonymous users) flawtemps - flaw templates
Quality of Wikipedia articles by WikiRank
kaggle.com
zip
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Włodzimierz Lewoniewski (2025). Quality of Wikipedia articles by WikiRank [Dataset]. https://www.kaggle.com/datasets/lewoniewski/quality-of-wikipedia-articles-by-wikirank
Explore at:
zip(771671698 bytes)Available download formats
Dataset updated
Mar 18, 2025
Authors
Włodzimierz Lewoniewski
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Datasets with quality score for 47 million Wikipedia articles across 55 language versions by Wikirank, as of 1 August 2024.

Potential Applications:

Academic research: scholars can incorporate WikiRank scores into studies on information accuracy, digital literacy, collective intelligence, and crowd dynamics. This data can also inform sociological research into biases, representation, and content disparities across different languages and cultures.

Educational tools and platforms: educational institutions and learning platforms can integrate WikiRank scores to recommend reliable and high-quality articles, significantly aiding learners in sourcing accurate information.

AI and machine learning development: developers and data scientists can use WikiRank scores to train sophisticated NLP and content-generation models to recognize and produce high-quality, structured, and well-referenced content.

Content moderation and policy development: Wikipedia community can use these metrics to enforce content quality policies more effectively.

Content strategy and editorial planning: media companies, publishers, and content strategists can employ these scores to identify high-performing content and detect topics needing deeper coverage or improvement.

More information about the quality score can be found in scientific papers:

Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics

Quality and Importance of Wikipedia Articles in Different Languages

Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles

Modelling the quality of attributes in Wikipedia infoboxes
e
wikipedia.org Traffic Analytics Data
analytics.explodingtopics.com
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). wikipedia.org Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/wikipedia.org
Explore at:
Dataset updated
Oct 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
Description
Traffic analytics, rankings, and competitive metrics for wikipedia.org as of October 2025
Wikipedia World Statistics 2023
kaggle.com
zip
Updated Dec 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jitesh Kumar Sahoo (2023). Wikipedia World Statistics 2023 [Dataset]. https://www.kaggle.com/datasets/jiteshkumarsahoo/wikipedia-country-statistics-2023
Explore at:
zip(9556 bytes)Available download formats
Dataset updated
Dec 28, 2023
Authors
Jitesh Kumar Sahoo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
World
Description
Dataset Description: Wikipedia World Statistics (2023)

This dataset provides a comprehensive snapshot of global country statistics for the year 2023. It was scraped from various Wikipedia pages using BeautifulSoup, consolidating key indicators and metrics for 142 countries. The dataset covers diverse aspects such as land area, water area, Human Development Index (HDI), GDP forecasts, internet usage, and population changes.

Key Columns and Metrics:

Country: The name of the country.

Total in km2: Total area of the country.

Land in km2: Land area excluding water bodies.

Water in km2: Area covered by water bodies.

Water %: Percentage of the total area covered by water.

HDI: Human Development Index, a measure of a country's overall achievement in its social and economic dimensions.

%HDI Growth: Percentage growth in HDI.

IMF Forecast GDP(Nominal): International Monetary Fund's forecast for Gross Domestic Product in nominal terms.

World Bank Forecast GDP(Nominal): World Bank's forecast for Gross Domestic Product in nominal terms.

UN Forecast GDP(Nominal): United Nations' forecast for Gross Domestic Product in nominal terms.

IMF Forecast GDP(PPP): IMF's forecast for Gross Domestic Product in purchasing power parity terms.

World Bank Forecast GDP(PPP): World Bank's forecast for Gross Domestic Product in purchasing power parity terms.

CIA Forecast GDP(PPP): Central Intelligence Agency's forecast for Gross Domestic Product in purchasing power parity terms.

Internet Users: Number of internet users in the country.

UN Continental Region: Continental region classification by the United Nations.

UN Statistical Subregion: Statistical subregion classification by the United Nations.

Population 2022: Population of the country in the year 2022.

Population 2023: Population of the country in the year 2023.

Population %Change: Percentage change in population from 2022 to 2023.

Dataset Sources:

The dataset is sourced from various Wikipedia pages using BeautifulSoup, providing a consolidated and accessible resource for individuals interested in global country statistics. It spans a wide range of topics, making it a valuable asset for exploratory data analysis and research in fields such as economics, demographics, and international relations.

Feel free to explore and analyze this dataset to gain insights into the socio-economic dynamics of countries worldwide.
Wikipedia Phase 1 Official Quality Dataset
kaggle.com
zip
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Explorer Lab (2025). Wikipedia Phase 1 Official Quality Dataset [Dataset]. https://www.kaggle.com/datasets/dataexplorerlab/wikipedia-phase1-official-quality
Explore at:
zip(649613234 bytes)Available download formats
Dataset updated
Nov 11, 2025
Authors
Data Explorer Lab
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Phase 1 snapshot of English Wikipedia articles collected in November 2025. Each row includes the human-maintained WikiProject quality and importance labels (e.g. Stub→FA, Low→Top), along with structural features gathered from the MediaWiki API. The corpus is designed for training quality estimators, monitoring coverage, and prioritising editorial workflows.

Contents

full.csv: complete dataset (~1.5M rows)

full_part01.csv – full_part15.csv: 100k-row chunks (final file contains the remainder)

sample_10k.csv, sample_100k.csv: stratified samples for quick experimentation

prepare_kaggle_release.ipynb: reproducible sampling and chunking workflow

lightgbm_quality_importance.ipynb: baseline models predicting quality/importance

Column summary

title: article title

page_id: Wikipedia page identifier

size: byte length of the current revision

touched: last touched timestamp (UTC, ISO 8601)

internal_links_count, external_links_count, langlinks_count, images_count, redirects_count: MediaWiki API structural metrics

protection_level: current protection status (e.g. unprotected, semi-protected)

official_quality: human label (Stub, Start, C, B, GA, A, FA, etc.)

official_quality_score: numeric mapping of official_quality (Stub=1, Start=2, C=3, B=4, GA=5, A=6, FA=7, 8–10 for rare higher tiers)

official_importance: human label (Low, Mid, High, Top, etc.)

official_importance_score: numeric mapping of the importance label (Low=1, Mid=3, High=5, Top=8, 10 for special tiers)

categories, templates: pipe-delimited lists of categories/templates (UTF-8 sanitised)

Notes

Files are encoded in UTF-8 with BOM; straight double quotes were replaced with double-prime characters to remain Excel-friendly.

Use the chunked files or chunksize when streaming the full dataset.

Feedback and feature requests are welcome; Phase 2 roadmap adds pageview aggregates and revision metrics.
Z
Data from: TokTrack: A Complete Token Provenance and Change Tracking Dataset...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flöck, Fabian; Erdogan, Kenan; Acosta, Maribel (2020). TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_789289
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
GESIS - Leibniz Institute for the Social Sciences
Karlsruhe Institute of Technology
Authors
Flöck, Fabian; Erdogan, Kenan; Acosta, Maribel
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fixes in version 1.1 (= Zenodo's "version 2")

*In 20161101-revisions-part1-12-1728.csv, missing first data line is added.

*In Current_content and Deleted_content files, some token values ('str' column) which contain regular quotes ('"') are fixed.

*In Current_content and Deleted_content files, some wrong revision ID values for 'origin_rev_id', 'in' and 'out' columns are fixed.

This dataset contains every instance of all tokens (≈ words) ever written in undeleted, non-redirect English Wikipedia articles until October 2016, in total 13,545,349,787 instances. Each token is annotated with (i) the article revision it was originally created in, and (ii) lists with all the revisions in which the token was ever deleted and (potentially) re-added and re-deleted from its article, enabling a complete and straightforward tracking of its history.

This data would be exceedingly hard to create by an average potential user as it is (i) very expensive to compute and as (ii) accurately tracking the history of each token in revisioned documents is a non-trivial task. Adapting a state-of-the-art algorithm, we have produced a dataset that allows for a range of analyses and metrics, already popular in research and going beyond, to be generated on complete-Wikipedia scale; ensuring quality and allowing researchers to forego expensive text-comparison computation, which so far has hindered scalable usage.

This dataset, its creation process and use cases are described in a dedicated dataset paper of the same name, published at the ICWSM 2017 conference. In this paper, we show how this data enables, on token level, computation of provenance, measuring survival of content over time, very detailed conflict metrics, and fine-grained interactions of editors like partial reverts, re-additions and other metrics.

Tokenization used: https://gist.github.com/faflo/3f5f30b1224c38b1836d63fa05d1ac94

Toy example for how the token metadata is generated: https://gist.github.com/faflo/8bd212e81e594676f8d002b175b79de8

Be sure to read the ReadMe.txt or - even more detailed - the supporting paper which is referenced under "related identifiers".
[deprecated] Reference and map usage across Wikimedia wiki pages
figshare.com
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Wight (2023). [deprecated] Reference and map usage across Wikimedia wiki pages [Dataset]. http://doi.org/10.6084/m9.figshare.24064941.v2
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.24064941.v2
Dataset updated
Dec 18, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Adam Wight
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
ErrataPlease note that this data set includes some major inaccuracies and should not be used. The data files will be unpublished from their hosting and this metadata will eventually be unpublished as well.A short list of issues discovered:Many dumps were truncated (T345176).Pages appeared multiple times, with different revision numbers.Revisions were sometimes mixed, with wikitext and HTML coming from different versions of an article.Reference similarity was overcounted when more than two refs shared content.In particular, the truncation and duplication means that the aggregate statistics are inaccurate and can't be compared to other data points.OverviewThis data was produced by Wikimedia Germany’s Technical Wishes team, and focuses on real-world usage statistics for reference footnotes (Cite extension) and maps (Kartographer extension) across all main-namespace pages (articles) on about 700 Wikimedia wikis. It was produced by processing the Wikimedia Enterprise HTML dumps which are a fully-parsed rendering of the pages, and by querying the MediaWiki query API to get more detailed information about maps. The data is also accompanied by several more general columns about each page for context.Our analysis of references was inspired by "Characterizing Wikipedia Citation Usage” and other research, but the goal in our case was to understand the potential impact of improving the ways in which references can be reused within a page. Gathering the map data was to understand the actual impact of improvements made to how external data can be integrated in maps. Both tasks are complicated by the heavy use of wikitext templates, obscuring when and how and tags are being used. For this reason, we decided to parse the rendered HTML pages rather than the original wikitext.LicenseAll files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/The source code is distributed under BSD-3-Clause.Source code and executionThe program used to create these files is our HTML dump scraper, version 0.1, written in Elixir. It can be run locally, but we used the Wikimedia Cloud VPS in order to have intra-datacenter access to the HTML dump file inputs. Our production configuration is included in the source code repository, and the commandline used to run was: “MIX_ENV=prod mix run pipeline.exs” .Execution was interrupted and restarted many times in order to make small fixes to the code. We expect that the only class of inconsistency this could have caused is that a small number of article records may potentially be repeated in the per-page summary files, and these pages’ statistics duplicated in the aggregates. Whatever the cause, we’ve found many of these duplicate errors and counts are given in the “duplicates.txt” file.The program is pluggable and configurable, it can be extended by writing new analysis modules. Our team plans to continue development and to run it again in the near future to track evolution of the collected metrics over time.FormatAll fields are documented in metrics.md as part of the code repository. Outputs are mostly split into separate ND-JSON (newline-delimited) and JSON files, and grand totals are gathered into a single CSV file.Per-page summary filesThe first phase of scraping produces a fine-grained report summarizing each page into a few statistics. Each file corresponds to a wiki (using its database name, for example "enwiki" for English Wikipedia) and each line of the file is a JSON object corresponding to a page.Example file name: enwiki-20230601-page-summary.ndjson.gzExample metrics:How many tags are created from templates vs. directly in the article.How many references contain a template transclusion to produce their content.How many references are unnamed, automatically, or manually named.How often references are reused via their name.Copy-pasted references that share the same or almost the same content, on the same page.Whether an article has more than one list.Mapdata filesExample file name: enwiki-20230601-mapdata.ndjson.gzThese files give the count of different types of map "external data" on each page. A line will either be empty "{}" or it will include the revid and number of external data references for maps on that page.External data is tallied in 9 different buckets, starting with "page" meaning that the source is .map data from the Wikimedia Commons server, or geoline / geoshape / geomask / geopoint and the data source, either an "ids" (Wikidata Q-ID) or "query" (SPARQL query) source.Mapdata summary filesEach wiki has a summary of map external data counts, which contains a sum for each type count.Example file name: enwiki-20230601-mapdata-summary.jsonWiki summary filesPer-page statistics are rolled up to the wiki level, and results are stored in a separate file for each wiki. Some statistics are summed, some are averaged, check the suffix on the column name for a hint.Example file name: enwiki-20230601-summary.jsonTop-level summary fileThere is one file which aggregates the wiki summary statistics, discarding non-numeric fields and formatting as a CSV for ease of use: all-wikis-20230601-summary.csv
Digital Nation Data Explorer
catalog.data.gov
Updated Jul 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Telecommunications and Information Administration (2022). Digital Nation Data Explorer [Dataset]. https://catalog.data.gov/dataset/digital-nation-data-explorer
Explore at:
Dataset updated
Jul 15, 2022
Dataset provided by
National Telecommunications and Information Administrationhttp://ntia.gov/
Description
Data Explorer enables easy tracking of metrics about computer and Internet use over time. Simply choose a metric of interest from the drop-down menu. The default Map mode depicts percentages by state, while Chart mode allows metrics to be broken down by demographics and viewed as either percentages of the population or estimated numbers of people or households.
q
Wikipedia CJK Corpora
researchdatafinder.qut.edu.au
researchdata.edu.au
Updated Feb 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shlomo Geva (2022). Wikipedia CJK Corpora [Dataset]. https://researchdatafinder.qut.edu.au/display/q7
Explore at:
Dataset updated
Feb 15, 2022
Dataset provided by
Queensland University of Technology (QUT)
Authors
Shlomo Geva
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Wikipedia web pages in different languages are rarely linked except for the cross-lingual link between web pages about the same subject. Collected in June 2010, this data collection consists of 10GB of tagged Chinese, Japanese and Korean articles, converted from Wikipedia to an XML structure by a multi-lingual adaptation of the YAWN system (see Related Information). Data were collected as part of the NII Test Collection for IR Systems (NTCIR) Project, which aims to enhance research in Information Access (IA) technologies, including information retrieval, to enhance cross-lingual link discovery (a way of automatically finding potential links between documents written in different languages). Through cross-lingual link discovery, users are able to discover documents in languages which they are either familiar with, or which have a richer set of documents than in their language of choice.
A Wikipedia Based Map of Science
figshare.com
txt
Updated Jan 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Calderone (2020). A Wikipedia Based Map of Science [Dataset]. http://doi.org/10.6084/m9.figshare.11638932.v5
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11638932.v5
Dataset updated
Jan 20, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Alberto Calderone
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
15th January 2020 - A Map of Science. v. 1.0 Description:A network which shows the similarities among different branches of science. It's based on Wikipedia pages in outline of natural, formal, social and applied sciences plus Data Science, which is not yet included (18 Jan. 2020). All pages called "Outline of X" were ignored. Pages are pre processed to get the main content with regular expressions. Stop words removal, lemmatization with WordNetLemmatizer in NLTK. Edges represent cosine similarity and filtered calculating zscore leaving only edges with a zscore > 1.959964 . Isolated nodes were removed.Materials:R, python, igraph, nltk, d3, javascript, html, wikipedia Contact:Alberto Calderone - sinnefa@gmail.comPreview:http://www.sinnefa.com/wikipediasciencemap/
Top 100 Companies in USA According to Wikipedia
kaggle.com
zip
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timothy kipchirchir Kimutai (2024). Top 100 Companies in USA According to Wikipedia [Dataset]. https://www.kaggle.com/datasets/timothykipchirchir/top-100-companies-in-usa-according-to-wikipedia
Explore at:
zip(22 bytes)Available download formats
Dataset updated
Sep 3, 2024
Authors
Timothy kipchirchir Kimutai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
United States
Description
This dataset was created by scraping Wikipedia to compile a list of the top 100 companies in the USA. The data includes key information such as company names, industry sectors, revenue figures, and headquarters locations. The dataset captures the most recent rankings of these companies based on metrics like annual revenue, market capitalization, or employee size, as listed on Wikipedia. The dataset serves as a valuable resource for analyzing trends in the U.S. corporate landscape, including industry dominance and geographic distribution of major corporations.
d
Resource Reporting Methodology Analysis and Development of Geothermal...
catalog.data.gov
data.openei.org
+3more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). Resource Reporting Methodology Analysis and Development of Geothermal Reporting Metric for the Geothermal Technologies Office Hydrothermal Program [Dataset]. https://catalog.data.gov/dataset/resource-reporting-methodology-analysis-and-development-of-geothermal-reporting-metric-for-14aab
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
The US Geological Survey (USGS) resource assessment (Williams et al., 2009) outlined a mean 30GWe of undiscovered hydrothermal resource in the western US. One goal of the Geothermal Technologies Office (GTO) is to accelerate the development of this undiscovered resource. The Geothermal Technologies Program (GTP) Blue Ribbon Panel (GTO, 2011) recommended that DOE focus efforts on helping industry identify hidden geothermal resources to increase geothermal capacity in the near term. Increased exploration activity will produce more prospects, more discoveries, and more readily developable resources. Detailed exploration case studies akin to those found in oil and gas (e.g. Beaumont, et al, 1990) will give operators a single point of information to gather clean, unbiased information on which to build geothermal drilling prospects. To support this effort, the National Renewable Energy laboratory (NREL) has been working with the Department of Energy (DOE) to develop a template for geothermal case studies on the Geothermal Gateway on OpenEI. In fiscal year 2013, the template was developed and tested with two case studies: Raft River Geothermal Area (http://en.openei.org/wiki/Raft_River_Geothermal_Area) and Coso Geothermal Area (http://en.openei.org/wiki/Coso_Geothermal_Area). In fiscal year 2014, ten additional case studies were completed, and additional features were added to the template to allow for more data and the direct citations of data. The template allows for: Data - a variety of data can be collected for each area, including power production information, well field information, geologic information, reservoir information, and geochemistry information. Narratives ? general (e.g. area overview, history and infrastructure), technical (e.g. exploration history, well field description, R&D activities) and geologic narratives (e.g. area geology, hydrothermal system, heat source, geochemistry.) Exploration Activity Catalog - catalog of exploration activities conducted in the area (with dates and references.) NEPA Analysis ? a query of NEPA analyses conducted in the area (that have been catalogued in the OpenEI NEPA database.) In fiscal year 2015, NREL is working with universities to populate additional case studies on OpenEI. The goal is to provide a large enough dataset to start conducting analyses of exploration programs to identify correlations between successful exploration plans for areas with similar geologic occurrence models.
Viet Nam Roads (OpenStreetMap Export)
data.humdata.org
geojson, geopackage +2
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Humanitarian OpenStreetMap Team (HOT) (2025). Viet Nam Roads (OpenStreetMap Export) [Dataset]. https://data.humdata.org/dataset/12bc17f3-abad-4f24-9bc8-08099d4ee121?force_layout=desktop
Explore at:
geopackage(476711174), geojson(1605937), kml(1565947), shp(481899506), kml(284559176), geojson(290314458), shp(2620503), geopackage(2642725)Available download formats
Dataset updated
Oct 6, 2025
Dataset provided by
OpenStreetMap//www.openstreetmap.org/
Humanitarian OpenStreetMap Team
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
OpenStreetMap contains roughly 754.8 thousand km of roads in this region. Based on AI-mapped estimates, this is approximately 73 % of the total road length in the dataset region. The average age of data for the region is 3 years ( Last edited 9 days ago ) and 8% of roads were added or updated in the last 6 months. Read about what this summary means : indicators , metrics

This theme includes all OpenStreetMap features in this area matching ( Learn what tags means here ) :

tags['highway'] IS NOT NULL

Features may have these attributes:

name

name:en

highway

surface

smoothness

width

lanes

oneway

bridge

layer

source

name:vi

This dataset is one of many "https://data.humdata.org/organization/hot">OpenStreetMap exports on HDX. See the Humanitarian OpenStreetMap Team website for more information.
R code for data simulation from How best to quantify replication success? A...
rs.figshare.com
txt
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jasmine Muradchanian; Rink Hoekstra; Henk Kiers; Don van Ravenzwaaij (2023). R code for data simulation from How best to quantify replication success? A simulation study on the comparison of replication success metrics [Dataset]. http://doi.org/10.6084/m9.figshare.14564615.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14564615.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Royal Societyhttp://royalsociety.org/
Authors
Jasmine Muradchanian; Rink Hoekstra; Henk Kiers; Don van Ravenzwaaij
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account. Our results show that Bayesian metrics seem to slightly outperform frequentist metrics across the board. Generally, meta-analytic approaches seem to slightly outperform metrics that evaluate single studies, except in the scenario of extreme publication bias, where this pattern reverses.
Z
Simulated performance data: MILC, LAMMPS, and uniform random traffic...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Apr 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brown, Kevin A. (2024). Simulated performance data: MILC, LAMMPS, and uniform random traffic patterns on 72-ndoe dragonfly network [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_11075429
Explore at:
Dataset updated
Apr 26, 2024
Dataset provided by
Argonne National Laboratory
Authors
Brown, Kevin A.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data generated from an old, private fork of the CODES simulation toolkit: https://github.com/codes-org/codes

Data dictionary: https://github.com/kevinabrown/codes/wiki/Dragonfly-Dally-DEBUG-Metrics
d
Geothermal Exploration Cost and Time Metric
catalog.data.gov
gdr.openei.org
+3more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). Geothermal Exploration Cost and Time Metric [Dataset]. https://catalog.data.gov/dataset/geothermal-exploration-cost-and-time-metric-0ca20
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
The National Renewable Energy Laboratory (NREL) was tasked with developing a metric in 2012 to measure the impacts of RD&D funding on the cost and time required for geothermal exploration activities. The development of this cost and time metric included collecting cost and time data for exploration techniques, creating a baseline suite of exploration techniques to which future exploration cost and time improvements can be compared, and developing an online tool for graphically showing potential project impacts (all available at http://en.openei.org/wiki/Gateway: Geothermal). This paper describes the methodology used to define the baseline exploration suite of techniques (baseline), as well as the approach that was used to create the cost and time data set that populates the baseline. The resulting product, an online tool for measuring impact, and the aggregated cost and time data are available on the Open Energy Information website (OpenEI, http://en.openei.org) for public access.
Running Performance Metric Summary Data from Sprinting with prosthetic...
rs.figshare.com
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Owen N. Beck; Paolo TABOGA; Alena M. Grabowski (2023). Running Performance Metric Summary Data from Sprinting with prosthetic versus biological legs: insight from experimental data [Dataset]. http://doi.org/10.6084/m9.figshare.17427586.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17427586.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Royal Societyhttp://royalsociety.org/
Authors
Owen N. Beck; Paolo TABOGA; Alena M. Grabowski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analyzed 400 m running performance metric data from the fastest athlete with bilateral leg amputations
Drive_Stats
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Backblaze, Drive_Stats [Dataset]. https://huggingface.co/datasets/backblaze/Drive_Stats
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Backblaze
Backblazehttp://www.backblaze.com/
Authors
Backblaze
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Drive Stats

Drive Stats is a public data set of daily metrics on the hard drives in Backblaze’s cloud storage infrastructure that Backblaze has open-sourced since April 2013. Currently, Drive Stats comprises over 388 million records, rising by over 240,000 records per day. Drive Stats is an append-only dataset effectively logging daily statistics that once written are never updated or deleted. This is our first Hugging Face dataset; feel free to suggest improvements by creating a… See the full description on the dataset page: https://huggingface.co/datasets/backblaze/Drive_Stats.
Broadband Summary API - Nation
catalog.data.gov
datasets.ai
+1more
Updated Mar 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Telecommunication and Information Administration, Department of Commerce (2021). Broadband Summary API - Nation [Dataset]. https://catalog.data.gov/dataset/broadband-summary-api-nation
Explore at:
Dataset updated
Mar 11, 2021
Dataset provided by
National Telecommunications and Information Administrationhttp://ntia.gov/
United States Department of Commercehttp://commerce.gov/
Description
This API returns broadband summary data for the entire United States. It is designed to retrieve broadband summary data and census metrics (population or households) combined as search criteria. The data includes wireline and wireless providers, different technologies and broadband speeds reported in the particular area being searched for on a scale of 0 to 1.

Facebook

Twitter

Click to copy link

Link copied

Cite

stalker (2018). Wiki climate [Dataset]. https://www.kaggle.com/datasets/brankokokanovic/wiki-climate

Wiki climate

Worldwide climate data from Wikipedia

Explore at:

zip(2597328 bytes)Available download formats

Dataset updated

Oct 11, 2018

Authors

stalker

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Context

Climate data obtained from Wikipedia climate boxes. Scraping code is on Github.

Context

Data consist of cities with population over 10.000. Not all cities have climate data. All data is normalized to metric units. Data varies a lot, but you can find temperatures (mean, low, high), humidity, precipitation.. among common ones. Data is per month, but there is also aggregation per year (depending on context, it can be avg, stdev, sum...). Data is not historical, but aggregated and aggregation years vary a lot.

Clear search

Close search

Google apps

Main menu

Wiki climate

Context

Context

WikiRank quality scores and measures for Wikipedia articles (April 2022)

Quality of Wikipedia articles by WikiRank

Potential Applications:

wikipedia.org Traffic Analytics Data

Wikipedia World Statistics 2023

Dataset Description: Wikipedia World Statistics (2023)

Key Columns and Metrics:

Dataset Sources:

Wikipedia Phase 1 Official Quality Dataset

Contents

Column summary

Notes

Data from: TokTrack: A Complete Token Provenance and Change Tracking Dataset...

[deprecated] Reference and map usage across Wikimedia wiki pages

Digital Nation Data Explorer

Wikipedia CJK Corpora

A Wikipedia Based Map of Science

Top 100 Companies in USA According to Wikipedia

Resource Reporting Methodology Analysis and Development of Geothermal...

Viet Nam Roads (OpenStreetMap Export)

R code for data simulation from How best to quantify replication success? A...

Simulated performance data: MILC, LAMMPS, and uniform random traffic...

Geothermal Exploration Cost and Time Metric

Running Performance Metric Summary Data from Sprinting with prosthetic...

Drive_Stats

Broadband Summary API - Nation

Wiki climate

Worldwide climate data from Wikipedia

Context

Context