34 datasets found
  1. Wiki climate

    • kaggle.com
    zip
    Updated Oct 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    stalker (2018). Wiki climate [Dataset]. https://www.kaggle.com/datasets/brankokokanovic/wiki-climate
    Explore at:
    zip(2597328 bytes)Available download formats
    Dataset updated
    Oct 11, 2018
    Authors
    stalker
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Climate data obtained from Wikipedia climate boxes. Scraping code is on Github.

    Context

    Data consist of cities with population over 10.000. Not all cities have climate data. All data is normalized to metric units. Data varies a lot, but you can find temperatures (mean, low, high), humidity, precipitation.. among common ones. Data is per month, but there is also aggregation per year (depending on context, it can be avg, stdev, sum...). Data is not historical, but aggregated and aggregation years vary a lot.

  2. WikiRank quality scores and measures for Wikipedia articles (April 2022)

    • figshare.com
    application/gzip
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wiki Rank (2023). WikiRank quality scores and measures for Wikipedia articles (April 2022) [Dataset]. http://doi.org/10.6084/m9.figshare.19762927.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Wiki Rank
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Those datasets include lists of over 43 million Wikipedia articles in 55 languages with quality scores by WikiRank (https://wikirank.net). Additionally, the datasets contain the quality measures (metrics) which directly affect these scores. Quality measures were extracted based on Wikipedia dumps from April, 2022.

    License All files included in this datasets are released under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ Format

    page_id -- The identifier of the Wikipedia article (int), e.g. 840191 page_name -- The title of the Wikipedia article (utf-8), e.g. Sagittarius A* wikirank_quality -- quality score for Wikipedia article in a scale 0-100 (as of April 1, 2022). This is a synthetic measure that was calculated based on the metrics below (also included in the datasets). norm_len - normalized "page length" norm_refs - normalized "number of references" norm_img - normalized "number of images" norm_sec - normalized "number of sections" norm_reflen - normalized "references per length ratio" norm_authors - normalized "number of authors" (without bots and anonymous users) flawtemps - flaw templates

  3. Quality of Wikipedia articles by WikiRank

    • kaggle.com
    zip
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Włodzimierz Lewoniewski (2025). Quality of Wikipedia articles by WikiRank [Dataset]. https://www.kaggle.com/datasets/lewoniewski/quality-of-wikipedia-articles-by-wikirank
    Explore at:
    zip(771671698 bytes)Available download formats
    Dataset updated
    Mar 18, 2025
    Authors
    Włodzimierz Lewoniewski
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Datasets with quality score for 47 million Wikipedia articles across 55 language versions by Wikirank, as of 1 August 2024.

    Potential Applications:

    • Academic research: scholars can incorporate WikiRank scores into studies on information accuracy, digital literacy, collective intelligence, and crowd dynamics. This data can also inform sociological research into biases, representation, and content disparities across different languages and cultures.
    • Educational tools and platforms: educational institutions and learning platforms can integrate WikiRank scores to recommend reliable and high-quality articles, significantly aiding learners in sourcing accurate information.
    • AI and machine learning development: developers and data scientists can use WikiRank scores to train sophisticated NLP and content-generation models to recognize and produce high-quality, structured, and well-referenced content.
    • Content moderation and policy development: Wikipedia community can use these metrics to enforce content quality policies more effectively.
    • Content strategy and editorial planning: media companies, publishers, and content strategists can employ these scores to identify high-performing content and detect topics needing deeper coverage or improvement.

    More information about the quality score can be found in scientific papers:

  4. e

    wikipedia.org Traffic Analytics Data

    • analytics.explodingtopics.com
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). wikipedia.org Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/wikipedia.org
    Explore at:
    Dataset updated
    Oct 1, 2025
    Variables measured
    Global Rank, Monthly Visits, Authority Score, US Country Rank, Online Services Category Rank
    Description

    Traffic analytics, rankings, and competitive metrics for wikipedia.org as of October 2025

  5. Wikipedia World Statistics 2023

    • kaggle.com
    zip
    Updated Dec 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jitesh Kumar Sahoo (2023). Wikipedia World Statistics 2023 [Dataset]. https://www.kaggle.com/datasets/jiteshkumarsahoo/wikipedia-country-statistics-2023
    Explore at:
    zip(9556 bytes)Available download formats
    Dataset updated
    Dec 28, 2023
    Authors
    Jitesh Kumar Sahoo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    World
    Description

    Dataset Description: Wikipedia World Statistics (2023)

    This dataset provides a comprehensive snapshot of global country statistics for the year 2023. It was scraped from various Wikipedia pages using BeautifulSoup, consolidating key indicators and metrics for 142 countries. The dataset covers diverse aspects such as land area, water area, Human Development Index (HDI), GDP forecasts, internet usage, and population changes.

    Key Columns and Metrics:

    1. Country: The name of the country.
    2. Total in km2: Total area of the country.
    3. Land in km2: Land area excluding water bodies.
    4. Water in km2: Area covered by water bodies.
    5. Water %: Percentage of the total area covered by water.
    6. HDI: Human Development Index, a measure of a country's overall achievement in its social and economic dimensions.
    7. %HDI Growth: Percentage growth in HDI.
    8. IMF Forecast GDP(Nominal): International Monetary Fund's forecast for Gross Domestic Product in nominal terms.
    9. World Bank Forecast GDP(Nominal): World Bank's forecast for Gross Domestic Product in nominal terms.
    10. UN Forecast GDP(Nominal): United Nations' forecast for Gross Domestic Product in nominal terms.
    11. IMF Forecast GDP(PPP): IMF's forecast for Gross Domestic Product in purchasing power parity terms.
    12. World Bank Forecast GDP(PPP): World Bank's forecast for Gross Domestic Product in purchasing power parity terms.
    13. CIA Forecast GDP(PPP): Central Intelligence Agency's forecast for Gross Domestic Product in purchasing power parity terms.
    14. Internet Users: Number of internet users in the country.
    15. UN Continental Region: Continental region classification by the United Nations.
    16. UN Statistical Subregion: Statistical subregion classification by the United Nations.
    17. Population 2022: Population of the country in the year 2022.
    18. Population 2023: Population of the country in the year 2023.
    19. Population %Change: Percentage change in population from 2022 to 2023.

    Dataset Sources:

    The dataset is sourced from various Wikipedia pages using BeautifulSoup, providing a consolidated and accessible resource for individuals interested in global country statistics. It spans a wide range of topics, making it a valuable asset for exploratory data analysis and research in fields such as economics, demographics, and international relations.

    Feel free to explore and analyze this dataset to gain insights into the socio-economic dynamics of countries worldwide.

  6. Wikipedia Phase 1 Official Quality Dataset

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Explorer Lab (2025). Wikipedia Phase 1 Official Quality Dataset [Dataset]. https://www.kaggle.com/datasets/dataexplorerlab/wikipedia-phase1-official-quality
    Explore at:
    zip(649613234 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    Data Explorer Lab
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Phase 1 snapshot of English Wikipedia articles collected in November 2025. Each row includes the human-maintained WikiProject quality and importance labels (e.g. Stub→FA, Low→Top), along with structural features gathered from the MediaWiki API. The corpus is designed for training quality estimators, monitoring coverage, and prioritising editorial workflows.

    Contents

    • full.csv: complete dataset (~1.5M rows)
    • full_part01.csvfull_part15.csv: 100k-row chunks (final file contains the remainder)
    • sample_10k.csv, sample_100k.csv: stratified samples for quick experimentation
    • prepare_kaggle_release.ipynb: reproducible sampling and chunking workflow
    • lightgbm_quality_importance.ipynb: baseline models predicting quality/importance

    Column summary

    • title: article title
    • page_id: Wikipedia page identifier
    • size: byte length of the current revision
    • touched: last touched timestamp (UTC, ISO 8601)
    • internal_links_count, external_links_count, langlinks_count, images_count, redirects_count: MediaWiki API structural metrics
    • protection_level: current protection status (e.g. unprotected, semi-protected)
    • official_quality: human label (Stub, Start, C, B, GA, A, FA, etc.)
    • official_quality_score: numeric mapping of official_quality (Stub=1, Start=2, C=3, B=4, GA=5, A=6, FA=7, 8–10 for rare higher tiers)
    • official_importance: human label (Low, Mid, High, Top, etc.)
    • official_importance_score: numeric mapping of the importance label (Low=1, Mid=3, High=5, Top=8, 10 for special tiers)
    • categories, templates: pipe-delimited lists of categories/templates (UTF-8 sanitised)

    Notes

    • Files are encoded in UTF-8 with BOM; straight double quotes were replaced with double-prime characters to remain Excel-friendly.
    • Use the chunked files or chunksize when streaming the full dataset.
    • Feedback and feature requests are welcome; Phase 2 roadmap adds pageview aggregates and revision metrics.
  7. Z

    Data from: TokTrack: A Complete Token Provenance and Change Tracking Dataset...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flöck, Fabian; Erdogan, Kenan; Acosta, Maribel (2020). TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_789289
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    GESIS - Leibniz Institute for the Social Sciences
    Karlsruhe Institute of Technology
    Authors
    Flöck, Fabian; Erdogan, Kenan; Acosta, Maribel
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fixes in version 1.1 (= Zenodo's "version 2")

    *In 20161101-revisions-part1-12-1728.csv, missing first data line is added.

    *In Current_content and Deleted_content files, some token values ('str' column) which contain regular quotes ('"') are fixed.

    *In Current_content and Deleted_content files, some wrong revision ID values for 'origin_rev_id', 'in' and 'out' columns are fixed.

    This dataset contains every instance of all tokens (≈ words) ever written in undeleted, non-redirect English Wikipedia articles until October 2016, in total 13,545,349,787 instances. Each token is annotated with (i) the article revision it was originally created in, and (ii) lists with all the revisions in which the token was ever deleted and (potentially) re-added and re-deleted from its article, enabling a complete and straightforward tracking of its history.

    This data would be exceedingly hard to create by an average potential user as it is (i) very expensive to compute and as (ii) accurately tracking the history of each token in revisioned documents is a non-trivial task. Adapting a state-of-the-art algorithm, we have produced a dataset that allows for a range of analyses and metrics, already popular in research and going beyond, to be generated on complete-Wikipedia scale; ensuring quality and allowing researchers to forego expensive text-comparison computation, which so far has hindered scalable usage.

    This dataset, its creation process and use cases are described in a dedicated dataset paper of the same name, published at the ICWSM 2017 conference. In this paper, we show how this data enables, on token level, computation of provenance, measuring survival of content over time, very detailed conflict metrics, and fine-grained interactions of editors like partial reverts, re-additions and other metrics.

    Tokenization used: https://gist.github.com/faflo/3f5f30b1224c38b1836d63fa05d1ac94

    Toy example for how the token metadata is generated: https://gist.github.com/faflo/8bd212e81e594676f8d002b175b79de8

    Be sure to read the ReadMe.txt or - even more detailed - the supporting paper which is referenced under "related identifiers".

  8. [deprecated] Reference and map usage across Wikimedia wiki pages

    • figshare.com
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Wight (2023). [deprecated] Reference and map usage across Wikimedia wiki pages [Dataset]. http://doi.org/10.6084/m9.figshare.24064941.v2
    Explore at:
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Adam Wight
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    ErrataPlease note that this data set includes some major inaccuracies and should not be used. The data files will be unpublished from their hosting and this metadata will eventually be unpublished as well.A short list of issues discovered:Many dumps were truncated (T345176).Pages appeared multiple times, with different revision numbers.Revisions were sometimes mixed, with wikitext and HTML coming from different versions of an article.Reference similarity was overcounted when more than two refs shared content.In particular, the truncation and duplication means that the aggregate statistics are inaccurate and can't be compared to other data points.OverviewThis data was produced by Wikimedia Germany’s Technical Wishes team, and focuses on real-world usage statistics for reference footnotes (Cite extension) and maps (Kartographer extension) across all main-namespace pages (articles) on about 700 Wikimedia wikis. It was produced by processing the Wikimedia Enterprise HTML dumps which are a fully-parsed rendering of the pages, and by querying the MediaWiki query API to get more detailed information about maps. The data is also accompanied by several more general columns about each page for context.Our analysis of references was inspired by "Characterizing Wikipedia Citation Usage” and other research, but the goal in our case was to understand the potential impact of improving the ways in which references can be reused within a page. Gathering the map data was to understand the actual impact of improvements made to how external data can be integrated in maps. Both tasks are complicated by the heavy use of wikitext templates, obscuring when and how and tags are being used. For this reason, we decided to parse the rendered HTML pages rather than the original wikitext.LicenseAll files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/The source code is distributed under BSD-3-Clause.Source code and executionThe program used to create these files is our HTML dump scraper, version 0.1, written in Elixir. It can be run locally, but we used the Wikimedia Cloud VPS in order to have intra-datacenter access to the HTML dump file inputs. Our production configuration is included in the source code repository, and the commandline used to run was: “MIX_ENV=prod mix run pipeline.exs” .Execution was interrupted and restarted many times in order to make small fixes to the code. We expect that the only class of inconsistency this could have caused is that a small number of article records may potentially be repeated in the per-page summary files, and these pages’ statistics duplicated in the aggregates. Whatever the cause, we’ve found many of these duplicate errors and counts are given in the “duplicates.txt” file.The program is pluggable and configurable, it can be extended by writing new analysis modules. Our team plans to continue development and to run it again in the near future to track evolution of the collected metrics over time.FormatAll fields are documented in metrics.md as part of the code repository. Outputs are mostly split into separate ND-JSON (newline-delimited) and JSON files, and grand totals are gathered into a single CSV file.Per-page summary filesThe first phase of scraping produces a fine-grained report summarizing each page into a few statistics. Each file corresponds to a wiki (using its database name, for example "enwiki" for English Wikipedia) and each line of the file is a JSON object corresponding to a page.Example file name: enwiki-20230601-page-summary.ndjson.gzExample metrics:How many tags are created from templates vs. directly in the article.How many references contain a template transclusion to produce their content.How many references are unnamed, automatically, or manually named.How often references are reused via their name.Copy-pasted references that share the same or almost the same content, on the same page.Whether an article has more than one list.Mapdata filesExample file name: enwiki-20230601-mapdata.ndjson.gzThese files give the count of different types of map "external data" on each page. A line will either be empty "{}" or it will include the revid and number of external data references for maps on that page.External data is tallied in 9 different buckets, starting with "page" meaning that the source is .map data from the Wikimedia Commons server, or geoline / geoshape / geomask / geopoint and the data source, either an "ids" (Wikidata Q-ID) or "query" (SPARQL query) source.Mapdata summary filesEach wiki has a summary of map external data counts, which contains a sum for each type count.Example file name: enwiki-20230601-mapdata-summary.jsonWiki summary filesPer-page statistics are rolled up to the wiki level, and results are stored in a separate file for each wiki. Some statistics are summed, some are averaged, check the suffix on the column name for a hint.Example file name: enwiki-20230601-summary.jsonTop-level summary fileThere is one file which aggregates the wiki summary statistics, discarding non-numeric fields and formatting as a CSV for ease of use: all-wikis-20230601-summary.csv

  9. Digital Nation Data Explorer

    • catalog.data.gov
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Telecommunications and Information Administration (2022). Digital Nation Data Explorer [Dataset]. https://catalog.data.gov/dataset/digital-nation-data-explorer
    Explore at:
    Dataset updated
    Jul 15, 2022
    Description

    Data Explorer enables easy tracking of metrics about computer and Internet use over time. Simply choose a metric of interest from the drop-down menu. The default Map mode depicts percentages by state, while Chart mode allows metrics to be broken down by demographics and viewed as either percentages of the population or estimated numbers of people or households.

  10. q

    Wikipedia CJK Corpora

    • researchdatafinder.qut.edu.au
    • researchdata.edu.au
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shlomo Geva (2022). Wikipedia CJK Corpora [Dataset]. https://researchdatafinder.qut.edu.au/display/q7
    Explore at:
    Dataset updated
    Feb 15, 2022
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Shlomo Geva
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Wikipedia web pages in different languages are rarely linked except for the cross-lingual link between web pages about the same subject. Collected in June 2010, this data collection consists of 10GB of tagged Chinese, Japanese and Korean articles, converted from Wikipedia to an XML structure by a multi-lingual adaptation of the YAWN system (see Related Information). Data were collected as part of the NII Test Collection for IR Systems (NTCIR) Project, which aims to enhance research in Information Access (IA) technologies, including information retrieval, to enhance cross-lingual link discovery (a way of automatically finding potential links between documents written in different languages). Through cross-lingual link discovery, users are able to discover documents in languages which they are either familiar with, or which have a richer set of documents than in their language of choice.

  11. A Wikipedia Based Map of Science

    • figshare.com
    txt
    Updated Jan 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Calderone (2020). A Wikipedia Based Map of Science [Dataset]. http://doi.org/10.6084/m9.figshare.11638932.v5
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 20, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Alberto Calderone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    15th January 2020 - A Map of Science. v. 1.0 Description:A network which shows the similarities among different branches of science. It's based on Wikipedia pages in outline of natural, formal, social and applied sciences plus Data Science, which is not yet included (18 Jan. 2020). All pages called "Outline of X" were ignored. Pages are pre processed to get the main content with regular expressions. Stop words removal, lemmatization with WordNetLemmatizer in NLTK. Edges represent cosine similarity and filtered calculating zscore leaving only edges with a zscore > 1.959964 . Isolated nodes were removed.Materials:R, python, igraph, nltk, d3, javascript, html, wikipedia Contact:Alberto Calderone - sinnefa@gmail.comPreview:http://www.sinnefa.com/wikipediasciencemap/

  12. Top 100 Companies in USA According to Wikipedia

    • kaggle.com
    zip
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothy kipchirchir Kimutai (2024). Top 100 Companies in USA According to Wikipedia [Dataset]. https://www.kaggle.com/datasets/timothykipchirchir/top-100-companies-in-usa-according-to-wikipedia
    Explore at:
    zip(22 bytes)Available download formats
    Dataset updated
    Sep 3, 2024
    Authors
    Timothy kipchirchir Kimutai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    United States
    Description

    This dataset was created by scraping Wikipedia to compile a list of the top 100 companies in the USA. The data includes key information such as company names, industry sectors, revenue figures, and headquarters locations. The dataset captures the most recent rankings of these companies based on metrics like annual revenue, market capitalization, or employee size, as listed on Wikipedia. The dataset serves as a valuable resource for analyzing trends in the U.S. corporate landscape, including industry dominance and geographic distribution of major corporations.

  13. d

    Resource Reporting Methodology Analysis and Development of Geothermal...

    • catalog.data.gov
    • data.openei.org
    • +3more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). Resource Reporting Methodology Analysis and Development of Geothermal Reporting Metric for the Geothermal Technologies Office Hydrothermal Program [Dataset]. https://catalog.data.gov/dataset/resource-reporting-methodology-analysis-and-development-of-geothermal-reporting-metric-for-14aab
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Description

    The US Geological Survey (USGS) resource assessment (Williams et al., 2009) outlined a mean 30GWe of undiscovered hydrothermal resource in the western US. One goal of the Geothermal Technologies Office (GTO) is to accelerate the development of this undiscovered resource. The Geothermal Technologies Program (GTP) Blue Ribbon Panel (GTO, 2011) recommended that DOE focus efforts on helping industry identify hidden geothermal resources to increase geothermal capacity in the near term. Increased exploration activity will produce more prospects, more discoveries, and more readily developable resources. Detailed exploration case studies akin to those found in oil and gas (e.g. Beaumont, et al, 1990) will give operators a single point of information to gather clean, unbiased information on which to build geothermal drilling prospects. To support this effort, the National Renewable Energy laboratory (NREL) has been working with the Department of Energy (DOE) to develop a template for geothermal case studies on the Geothermal Gateway on OpenEI. In fiscal year 2013, the template was developed and tested with two case studies: Raft River Geothermal Area (http://en.openei.org/wiki/Raft_River_Geothermal_Area) and Coso Geothermal Area (http://en.openei.org/wiki/Coso_Geothermal_Area). In fiscal year 2014, ten additional case studies were completed, and additional features were added to the template to allow for more data and the direct citations of data. The template allows for: Data - a variety of data can be collected for each area, including power production information, well field information, geologic information, reservoir information, and geochemistry information. Narratives ? general (e.g. area overview, history and infrastructure), technical (e.g. exploration history, well field description, R&D activities) and geologic narratives (e.g. area geology, hydrothermal system, heat source, geochemistry.) Exploration Activity Catalog - catalog of exploration activities conducted in the area (with dates and references.) NEPA Analysis ? a query of NEPA analyses conducted in the area (that have been catalogued in the OpenEI NEPA database.) In fiscal year 2015, NREL is working with universities to populate additional case studies on OpenEI. The goal is to provide a large enough dataset to start conducting analyses of exploration programs to identify correlations between successful exploration plans for areas with similar geologic occurrence models.

  14. Viet Nam Roads (OpenStreetMap Export)

    • data.humdata.org
    geojson, geopackage +2
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Humanitarian OpenStreetMap Team (HOT) (2025). Viet Nam Roads (OpenStreetMap Export) [Dataset]. https://data.humdata.org/dataset/12bc17f3-abad-4f24-9bc8-08099d4ee121?force_layout=desktop
    Explore at:
    geopackage(476711174), geojson(1605937), kml(1565947), shp(481899506), kml(284559176), geojson(290314458), shp(2620503), geopackage(2642725)Available download formats
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    OpenStreetMap//www.openstreetmap.org/
    Humanitarian OpenStreetMap Team
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    OpenStreetMap contains roughly 754.8 thousand km of roads in this region. Based on AI-mapped estimates, this is approximately 73 % of the total road length in the dataset region. The average age of data for the region is 3 years ( Last edited 9 days ago ) and 8% of roads were added or updated in the last 6 months. Read about what this summary means : indicators , metrics

    This theme includes all OpenStreetMap features in this area matching ( Learn what tags means here ) :

    tags['highway'] IS NOT NULL

    Features may have these attributes:

    This dataset is one of many "https://data.humdata.org/organization/hot">OpenStreetMap exports on HDX. See the Humanitarian OpenStreetMap Team website for more information.

  15. R code for data simulation from How best to quantify replication success? A...

    • rs.figshare.com
    txt
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jasmine Muradchanian; Rink Hoekstra; Henk Kiers; Don van Ravenzwaaij (2023). R code for data simulation from How best to quantify replication success? A simulation study on the comparison of replication success metrics [Dataset]. http://doi.org/10.6084/m9.figshare.14564615.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Royal Societyhttp://royalsociety.org/
    Authors
    Jasmine Muradchanian; Rink Hoekstra; Henk Kiers; Don van Ravenzwaaij
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To overcome the frequently debated crisis of confidence, replicating studies is becoming increasingly more common. Multiple frequentist and Bayesian measures have been proposed to evaluate whether a replication is successful, but little is known about which method best captures replication success. This study is one of the first attempts to compare a number of quantitative measures of replication success with respect to their ability to draw the correct inference when the underlying truth is known, while taking publication bias into account. Our results show that Bayesian metrics seem to slightly outperform frequentist metrics across the board. Generally, meta-analytic approaches seem to slightly outperform metrics that evaluate single studies, except in the scenario of extreme publication bias, where this pattern reverses.

  16. Z

    Simulated performance data: MILC, LAMMPS, and uniform random traffic...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brown, Kevin A. (2024). Simulated performance data: MILC, LAMMPS, and uniform random traffic patterns on 72-ndoe dragonfly network [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_11075429
    Explore at:
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Argonne National Laboratory
    Authors
    Brown, Kevin A.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data generated from an old, private fork of the CODES simulation toolkit: https://github.com/codes-org/codes

    Data dictionary: https://github.com/kevinabrown/codes/wiki/Dragonfly-Dally-DEBUG-Metrics

  17. d

    Geothermal Exploration Cost and Time Metric

    • catalog.data.gov
    • gdr.openei.org
    • +3more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). Geothermal Exploration Cost and Time Metric [Dataset]. https://catalog.data.gov/dataset/geothermal-exploration-cost-and-time-metric-0ca20
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Description

    The National Renewable Energy Laboratory (NREL) was tasked with developing a metric in 2012 to measure the impacts of RD&D funding on the cost and time required for geothermal exploration activities. The development of this cost and time metric included collecting cost and time data for exploration techniques, creating a baseline suite of exploration techniques to which future exploration cost and time improvements can be compared, and developing an online tool for graphically showing potential project impacts (all available at http://en.openei.org/wiki/Gateway: Geothermal). This paper describes the methodology used to define the baseline exploration suite of techniques (baseline), as well as the approach that was used to create the cost and time data set that populates the baseline. The resulting product, an online tool for measuring impact, and the aggregated cost and time data are available on the Open Energy Information website (OpenEI, http://en.openei.org) for public access.

  18. Running Performance Metric Summary Data from Sprinting with prosthetic...

    • rs.figshare.com
    xlsx
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owen N. Beck; Paolo TABOGA; Alena M. Grabowski (2023). Running Performance Metric Summary Data from Sprinting with prosthetic versus biological legs: insight from experimental data [Dataset]. http://doi.org/10.6084/m9.figshare.17427586.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Royal Societyhttp://royalsociety.org/
    Authors
    Owen N. Beck; Paolo TABOGA; Alena M. Grabowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analyzed 400 m running performance metric data from the fastest athlete with bilateral leg amputations

  19. Drive_Stats

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Backblaze, Drive_Stats [Dataset]. https://huggingface.co/datasets/backblaze/Drive_Stats
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Backblaze
    Backblazehttp://www.backblaze.com/
    Authors
    Backblaze
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Drive Stats

    Drive Stats is a public data set of daily metrics on the hard drives in Backblaze’s cloud storage infrastructure that Backblaze has open-sourced since April 2013. Currently, Drive Stats comprises over 388 million records, rising by over 240,000 records per day. Drive Stats is an append-only dataset effectively logging daily statistics that once written are never updated or deleted. This is our first Hugging Face dataset; feel free to suggest improvements by creating a… See the full description on the dataset page: https://huggingface.co/datasets/backblaze/Drive_Stats.

  20. Broadband Summary API - Nation

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Mar 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Telecommunication and Information Administration, Department of Commerce (2021). Broadband Summary API - Nation [Dataset]. https://catalog.data.gov/dataset/broadband-summary-api-nation
    Explore at:
    Dataset updated
    Mar 11, 2021
    Description

    This API returns broadband summary data for the entire United States. It is designed to retrieve broadband summary data and census metrics (population or households) combined as search criteria. The data includes wireline and wireless providers, different technologies and broadband speeds reported in the particular area being searched for on a scale of 0 to 1.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
stalker (2018). Wiki climate [Dataset]. https://www.kaggle.com/datasets/brankokokanovic/wiki-climate
Organization logo

Wiki climate

Worldwide climate data from Wikipedia

Explore at:
zip(2597328 bytes)Available download formats
Dataset updated
Oct 11, 2018
Authors
stalker
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Context

Climate data obtained from Wikipedia climate boxes. Scraping code is on Github.

Context

Data consist of cities with population over 10.000. Not all cities have climate data. All data is normalized to metric units. Data varies a lot, but you can find temperatures (mean, low, high), humidity, precipitation.. among common ones. Data is per month, but there is also aggregation per year (depending on context, it can be avg, stdev, sum...). Data is not historical, but aggregated and aggregation years vary a lot.

Search
Clear search
Close search
Google apps
Main menu