wotfa_ea_alt_04_pd_plots_poly: This dataset depicts the public domain (PD) lands identified for conversion to Oregon and California (O&C) lands to replace those O&C lands conveyed to Tribes by the Western Oregon Tribal Fairness Act (WOTFA) under Alternative 4.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Myanmar Wikipedia Dataset (Last Crawl Date: 25/03/2025)
A collection of scraped Myanmar Wikipedia pages organized by category paths.
Overview
This dataset contains Myanmar Wikipedia articles scraped based on categorical organization. Unlike the official Wikimedia dataset (subset: 20231101.my), this repository provides an alternative approach to Myanmar Wikipedia content by following the categorical structure starting from the main entry page.
Figure 1: The initial… See the full description on the dataset page: https://huggingface.co/datasets/chuuhtetnaing/myanmar-wikipedia-dataset.
This data package includes the underlying data and files to replicate the calculations, charts, and tables presented in Alternatives to Currency Manipulation: What Switzerland, Singapore, and Hong Kong Can Do, PIIE Policy Brief 14-17.
If you use the data, please cite as: Gagnon, Joseph E. (2014). Alternatives to Currency Manipulation: What Switzerland, Singapore, and Hong Kong Can Do. PIIE Policy Brief 14-17. Peterson Institute for International Economics.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Introduction
Wikipedia is written in the wikitext markup language. When serving content, the MediaWiki software that powers Wikipedia parses wikitext to HTML, thereby inserting additional content by expanding macros (templates and modules). Hence, researchers who intend to analyze Wikipedia as seen by its readers should work with HTML, rather than wikitext. Since Wikipedia’s revision history is made publicly available by the Wikimedia Foundation exclusively in wikitext format, researchers have had to produce HTML themselves, typically by using Wikipedia’s REST API for ad-hoc wikitext-to-HTML parsing. This approach, however, (1) does not scale to very large amounts of data and (2) does not correctly expand macros in historical article revisions.
We have solved these problems by developing a parallelized architecture for parsing massive amounts of wikitext using local instances of MediaWiki, enhanced with the capacity of correct historical macro expansion. By deploying our system, we produce and hereby release WikiHist.html, English Wikipedia’s full revision history in HTML format. It comprises the HTML content of 580M revisions of 5.8M articles generated from the full English Wikipedia history spanning 18 years from 1 January 2001 to 1 March 2019. Boilerplate content such as page headers, footers, and navigation sidebars are not included in the HTML.
For more details, please refer to the description below and to the dataset paper:
Blagoj Mitrevski, Tiziano Piccardi, and Robert West: WikiHist.html: English Wikipedia’s Full Revision History in HTML Format. In Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020.
https://arxiv.org/abs/2001.10256
When using the dataset, please cite the above paper.
Dataset summary
The dataset consists of three parts:
Part 1 is our main contribution, while parts 2 and 3 contain complementary information that can aid researchers in their analyses.
Getting the data
Parts 2 and 3 are hosted in this Zenodo repository. Part 1 is 7TB large -- too large for Zenodo -- and is therefore hosted externally on the Internet Archive. For downloading part 1, you have multiple options:
Dataset details
Part 1: HTML revision history
The data is split into 558 directories, named enwiki-20190301-pages-meta-history$1.xml-p$2p$3, where $1 ranges from 1 to 27, and p$2p$3 indicates that the directory contains revisions for pages with ids between $2 and $3. (This naming scheme directly mirrors that of the wikitext revision history from which WikiHist.html was derived.) Each directory contains a collection of gzip-compressed JSON files, each containing 1,000 HTML article revisions. Each row in the gzipped JSON files represents one article revision. Rows are sorted by page id, and revisions of the same page are sorted by revision id. We include all revision information from the original wikitext dump, the only difference being that we replace the revision’s wikitext content with its parsed HTML version (and that we store the data in JSON rather than XML):
Part 2: Page creation times (page_creation_times.json.gz)
This JSON file specifies the creation time of each English Wikipedia page. It can, e.g., be used to determine if a wiki link was blue or red at a specific time in the past. Format:
Part 3: Redirect history (redirect_history.json.gz)
This JSON file specifies all revisions corresponding to redirects, as well as the target page to which the respective page redirected at the time of the revision. This information is useful for reconstructing Wikipedia's link network at any time in the past. Format:
The repository also contains two additional files, metadata.zip and mysql_database.zip. These two files are not part of WikiHist.html per se, and most users will not need to download them manually. The file metadata.zip is required by the download script (and will be fetched by the script automatically), and mysql_database.zip is required by the code used to produce WikiHist.html. The code that uses these files is hosted at GitHub, but the files are too big for GitHub and are therefore hosted here.
WikiHist.html was produced by parsing the 1 March 2019 dump of https://dumps.wikimedia.org/enwiki/20190301 from wikitext to HTML. That old dump is not available anymore on Wikimedia's servers, so we make a copy available at https://archive.org/details/enwiki-20190301-original-full-history-dump_dlab .
SJIRMP_ERMA_Veg_Intersect_ALTB_poly: Vegetation in the Expanded Resource Management Areas in the San Juan National Island National Monument RMP area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This fileset contains supplementary material for the paper "Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability".We share here 2 files related to the qualitative analysis of the reasons why editors add citations to Wikipedia.Citation Needed Policy Summary (Citation_Needed_Policy_Summary.pdf)We did a qualitative analysis of the various policies that editors of English, Italian, and French Wikipedia follow when adding (or not adding) inline citations, we categorized them into macro-classes, and summarized in this docuemnt. Citation Needed Reason Clusters (Citation_Needed_Reason_Clusters.pdf)When adding the {citation needed} template, editors also have the option to specify a reason via a free-form text field. We extracted the text of this field from more than 200,000 citation needed tags added by English Wikipedia editors and converted it into a numerical feature using Fasttext [1], then clustered them. Each cluster contains groups of consistent reasons why editors requested a citation. [1] https://fasttext.cc/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked in alt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinatesRegion:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, eg RU-VLD
federal_district
, eg Центральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
Comments
City groups
Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication as parts of Тюменская область
.
Several notable towns are classified as administrative part of larger cities (Сестрорецк
is a municpality at Saint-Petersburg, Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
We suppressed letter "ё" city
columns in towns.csv - we have Орел
, but not Орёл
. This affected:
Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
Дмитриев
and Дмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
Саратовская область.doc
to docxCreates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
Creates:
3. Merge data
Run:
Creates:
Polygons depict Areas of Livestock Grazing allocations for each alternative in the Greater Sage-Grouse Final Environmental Impact Statement/ Proposed RMP Amendment in Montana/Dakotas. Details for each Alternative can be found in Chapter 2 of the Final Greater Sage-Grouse EIS/RMP Amendment.
Polygons depict Areas of Geothermal Mineral allocations for each alternative in the Greater Sage-Grouse Final Environmental Impact Statement/ Proposed RMP Amendment in Montana/Dakotas. Details for each Alternative can be found in Chapter 2 of the Final Greater Sage-Grouse EIS/RMP Amendment.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The complement system is a biochemical cascade that helps, or complements, the ability of antibodies to clear pathogens from an organism. It is part of the immune system called the innate immune system that is not adaptable and does not change over the course of an individual's lifetime. However, it can be recruited and brought into action by the adaptive immune system. The classical pathway of activation of the complement system is a group of blood proteins that mediate the specific antibody response. [source: Wikipedia] The classical pathway begins with circulating C1Q binding to an antigen on the surface of a pathogen, which goes on to active and recruit 2 copies of each C1R and C1S, forming a C1 complex. The activated C1 complex cleaves C2 and C4. Activated cleavage products C2A and C4B combine to form C3 convertase, which cleaves C3. The cleavage product C3B joins the complex to form C5 convertase, which cleaves C5. The cleavage product C5B joins C6, C7, C8 and multiple copies of C9 to form the membrane attack complex, which forms a channel for water to flood into the target cell, leading to osmotic lysis. The decay accelerating factor (DAF) inhibits C3 convertase. The lectin pathway involves mannose-binding lectin (MBL) binding the surface of the pathogen instead of C1Q. MBL-associated serine proteases MASP1 and MASP1 can cleave C2 and C4 in place of the C1 complex, leading to the formation of C3 convertase and the subsequent cascade. The alternative pathway relies on the spontaneous hydrolysis of C3 and the cleavage of factor B (CFB) by factor D (CFD), which form an alternative C3 convertase stabilized by factor P (CFP). Additional copies of the cleavage product C3B are recruited to the complex, resulting in an alternative C5 convertase, which cleaves C5 and contributes C5B to the formation of the membrane attack complex. Proteins on this pathway have targeted assays available via the CPTAC Assay Portal
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Currently, the paradigm of scientific publication involves primarily the model of peer-reviewed publications. Dominated by a few companies and institutions, this model is based on scientific journals that publish in print, on the Internet, or both, works that go through a laborious selection of quality. Initially, an editorial board evaluates the overall quality of submissions, their suitability to the journal editorial line, its apparent scientific soundness, the general design and interest to the audience of the journal. After this highly subjective filter, the unpublished manuscripts are still sent to technical reviewers, usually people with deep knowledge about the area of the submission. At least two of these reviewers are commissioned by the editors. If both accept the task and approve the manuscript, it is submitted for publication.
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Provides monthly and annual historical records on all HMRC Tax & NIC receipts, and Tax Credit payments. Previously listed under 'Revenue-based Taxes and Benefits: Personal Tax Credits'.
Source agency: HM Revenue and Customs
Designation: National Statistics
Language: English
Alternative title: Tax Credits: Net payments
SJIRMP_BLM_Dispersed_Camping_NoAction_poly: No Action alternative Dispersed Camping areas to be used in analysis for the San Juan Island Resource Management Plan.
This dataset approximates areas available for utility-grade solar energy development under the Solar Energy Zone Alternative of the Solar Energy Development Programmatic Environmental Impact Statement (PEIS) as modified and adjusted by the San Luis Resource Area resource specialists on 08/10/2011. Refer to the Solar Energy Development PEIS for more details. Developed for the Department of Interior, Bureau of Land Management by Argonne National Laboratory.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Bulletins provide a full historic series of data detailing amounts of goods cleared and amount of duty collected. Source agency: HM Revenue and Customs Designation: National Statistics Language: English Alternative title: Indirect Tax
This is a graphic representation of the data stewards based on PLSS Townships in PLSS areas. In non-PLSS areas the metadata at a glance is based on a data steward defined polygons such as a city or county or other units. The identification of the data steward is a general indication of the agency that will be responsible for updates and providing the authoritative data sources. In other implementations this may have been termed the alternate source, meaning alternate to the BLM. But in the shared environment of the NSDI the data steward for an area is the primary coordinator or agency responsible for making updates or causing updates to be made. The data stewardship polygons are defined and provided by the data steward.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
G proteins, short for guanine nucleotide-binding proteins, are a family of proteins involved in second messenger cascades. G proteins are so called because they function as "molecular switches". They alternate from 'inactive' guanosine diphosphate (GDP) to 'active' guanosine triphosphate (GTP), which is a binding state, and which proceeds to regulate downstream cell processes. Source: Wikipedia
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provides general information on all HMRC taxes, including tax receipts, the number of taxpayers, personal tax credits, child benefit and estimates of the cost of tax expenditures and structural relief. Source agency: HM Revenue and Customs Designation: National Statistics Language: English Alternative title: Revenue Based Taxes
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Alternative Songs eski adıyla Modern Rock Tracks veya Hot Modern Rock Tracks ayrıca Alternative olarak da bilinir Bill
https://www.coolest-gadgets.com/privacy-policyhttps://www.coolest-gadgets.com/privacy-policy
CBD Statistics: CBD has become a global product, with its popularity growing quickly. What started as a small alternative to traditional medicine is now a mainstream trend. Today, CBD isn’t just found in oils, capsules, and tinctures. You can find CBD in many different products all over the world, such as CBD makeup, bath bombs, toothpaste, sheets, and even dog treats.
There are many different opinions about whether CBD is a miracle medicine or just another health fad. To help answer this, looking at CBD Statistics can give us a better idea of what's true. We’ve done a survey and created a reliable study to help you understand the possible health benefits of CBD.
wotfa_ea_alt_04_pd_plots_poly: This dataset depicts the public domain (PD) lands identified for conversion to Oregon and California (O&C) lands to replace those O&C lands conveyed to Tribes by the Western Oregon Tribal Fairness Act (WOTFA) under Alternative 4.