Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked in alt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinatesRegion:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, eg RU-VLD
federal_district
, eg Центральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
Comments
City groups
Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication as parts of Тюменская область
.
Several notable towns are classified as administrative part of larger cities (Сестрорецк
is a municpality at Saint-Petersburg, Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
We suppressed letter "ё" city
columns in towns.csv - we have Орел
, but not Орёл
. This affected:
Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
Дмитриев
and Дмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
Саратовская область.doc
to docxCreates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
Creates:
3. Merge data
Run:
Creates:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The database consists of full-text patient reviews, reflecting their dissatisfaction with healthcare quality. Materials in Russian have been posted in the «Review list» of the site infodoctor.ru. Publication period: July 2012 to August 2023. The database consists of 18,492 reviews covering 16 Russian cities with population of over one million. Data format: .xlsx.
Data access: 10.5281/zenodo.15257447
Data collection methodology
Based on the fact that negative reviews may be more reliable than positive ones, the authors carried out negative reviews from 16 Russian cities with a population of over one million, for which it was possible to collect representative samples (at least 1000 reviews for each city). We have extracted reviews from the one-star section of this site's guestbook, as they are reliably identified as negative. Duplicates were removed from the database. Personal data in comment texts have been replaced with "##########". The author's gender was determined manually based on his/her name or gender endings in the texts of reviews. Otherwise, we indicated "0" - gender cannot be determined.
For Moscow reviews, classification was carried out using manual markup methods - based on the majority of votes for the review class from 3 annotators (if at least one annotator indicated that it was impossible to determine, the review was classified as #N/A - impossible to clearly determine). For reviews from other cities, classification was made into 3 classes using machine learning methods based on logistic regression. The classification accuracy was 88%.
The medical specialties were distributed into large groups for the convenience of further analysis. The correspondence of medical specialties to large groups is presented in detail in Appendix 1.
· CITY – the name of a city with a population of over a million (on a separate sheet – Moscow), the other 15 are Volgograd, Voronezh, Yekaterinburg, Kazan, Krasnodar, Krasnoyarsk, Nizhny Novgorod, Novosibirsk, Omsk, Perm, Rostov-on-Don, Samara, St. Petersburg, Ufa, Chelyabinsk
· TEXT – review text
· GENDER – gender of the review author (2 – female, 1 – male, 0 – cannot be determined)
· CLASS_1 – group of reasons for dissatisfaction with medical care (M – issues of medical content, O – issues of organizational support and economic aspect, C – mixed (combined) class, #N/A – cannot be clearly determined)[1]
· CLASS_2 – group of reasons for dissatisfaction with medical care (0 – issues of medical content, 1 – issues of organizational support and economic aspect, 2 – mixed (combined) class, #N/A – cannot be clearly determined)
· DAY – day of the month the review was posted
· MONTH – month the review was posted
· YEAR – year the review was posted
· DOCTOR_OR_CLINIC – what or who is the review dedicated to – the doctor or the clinic
· SPEC – physician specialty (for observations where the review is dedicated to the physician)
· GROUP_SPEC – a large group of a physician’s specialty
· ID – observation identifier
The data are suitable for analyzing patient dissatisfaction trends with medical services in Russia over the period from July 2012 to August 2023. This dataset could be particularly useful for healthcare providers, policymakers, and researchers interested in understanding patient experiences and identifying areas for quality improvement in Russian healthcare. Some potential applications include:
The database provides rich qualitative data through full-text review texts, allowing for in-depth analysis of patient experiences. The structured variables like city, date, doctor/clinic information, etc. enable quantitative analysis as well. This combination of qualitative and quantitative data makes it possible to gain a comprehensive understanding of patient dissatisfaction patterns in Russia's healthcare system over more than a decade.
For researchers specifically interested in healthcare quality issues, this dataset could serve as an important resource for studying patient experiences and outcomes in Russia's medical system. The longitudinal nature of the data (2012-2023) also allows for analysis of changes over time in patient satisfaction.
Overall, this database provides valuable insights into patient perceptions of healthcare quality that could inform policy decisions, quality improvement
[1] We divided the variable-indicator of the group of reasons for dissatisfaction with medical care into 2 options - with letter (CLASS_1) and numeric codes (CLASS_2) (for the convenience of possible use of data in the work)
Together with the Russian Academy of Sciences, IIASA's Forestry (FOR) project has released a CD-ROM titled Land Resources of Russia, Version 1.1, containing socioeconomic and biophysical data sets on important targets of international conventions — climate change, wetlands, desertification, and biodiversity. The CD-ROM, a country-scale integrated information system, supports sustainable use of land resources in line with Chapter 10 of Agenda 21 (UNCED) and makes a contribution to the Rio+10 Summit.
The Project's analysis of land resources are crucial for doing full greenhouse gas (or carbon) accounting. Integrated land analyses are also important for the introduction of sustainable forest management. FOR's land analyses concentrate on Russia, which is used as a case study for full carbon and greenhouse accounting.
Russia's area of forests, called here the forest zone, covers about 1180 million ha or 69% of the land of the country. The forested area (forests forming closed stands) occupies some 765 million ha constituting 65% of the forest zone. Forests are elements of a land-cover mosaic that direct the features of landscapes, ecosystems, vegetation and land uses. The FOR project attempts to overcome the traditional approach of just considering the direct utilities of forests. Instead, FOR operates with a holistic view of forests in a fully-fledged land concept. Integrated analysis of the land requires extended databases that includes various data for the total land operated in the form of GIS-based tools.
The land databases on Russia are the most comprehensive ever assembled, inside or outside of Russia. The databases have been enriched by remotely sensed data, biogeochemical functionality (carbon analysis), and institutional frameworks. The data included on the CD-ROM have been specially selected and filtered to meet the following criteria: (1) completeness: to meet a variety of the analysis tasks; (2) complexity: to describe a diversity of the task aspects; (3) consistency: to provide compatible results; to be ata compatible scale and, to provide a compatible time horizon; and (4) uniformity: to allow them to be standardized and formatted according to modern data handling routines.
The following databases and coverages are included on the CD-ROM and are available for download:
Socioeconomic Database -- Describes the social environment of each administrative region in Russia with close to 7000 parameters. The data cover the years 1987-1993. Coverages in this section include:
(1) Socioeconomic Statistical Database. This database provides the following statistical data sets: Population; Labor and Salary; Industry; Agriculture; Capital Construction; Communication and Transport; State Trade and Catering; Utilities and Services; Health Care and Sport; Education and Culture; Finance; Public Consumption; Industrial Production; Interregional Trade; Labor Resources; Supply of Materials; Environmental Protection; Foreign Trade; and Price Indices.
(2) Population Database. Adapted from Center for International Earth Science Information Network (CIESIN), Columbia University; International Food Policy Research Institute (IFPRI); and World Resources Institute (WRI). 2000. Gridded Population of the World (GPW), Version 2, this coverage contains population densities for 1995 on a 2.5 degree grid. Data were adjusted to match United Nations national population estimates for 1995.
(3) Administrative Oblasts, Cities & Towns Database. Oblasts coverage contains 92 polygons, 88 of which contain Oblast names, the other four represent waterbodies. The cities coverage contains 37 cities identified by name.
(4) Transportation Database. The statistical data sets and maps cover the transport routes of the railway, road, and river networks spanning the entire country. Railways and roads are classified by type and status, and major rivers are named. Map coverages (line data) were created from the Digital Chart of the World, using the 1993 version at the 1:1,000,000 scale.
Natural Conditions Database. This section of the CD-ROM contains the basic land characteristics. This database provides specialists and scientists in research institutes and international agencies with the capability to perform scientific analysis with a Geographic Information System. These data describe land characteristics that might be applied in various ways, such as individual items (e.g., temperature, elevation, vegetation community, etc.), in combination (e.g., forest-temperature associations, soil spectra for land use types, etc.), and as aggregations based on a conceptual framework of a different level of complexity (e.g., ecosystem establishment, human-induced land cover transformation, biochemical cycle analysis, etc.). Coverage includes:
(1) Climate Database. Temperature (annual and seasonal) and Precipitation... Visit https://dataone.org/datasets/Land_Resources_of_Russia%2C_Version_1.1.xml for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database provides a construction of Large Urban Regions (LUR) in Russia. A Large Urban Region (LUR) can be defined as an aggregation of continuous statistical units around a core that are economically dependent on this core and linked to it by economic and social strong interdependences. The main purpose of this delineation is to make cities comparable on the national and world scales and to make comparative social-economic urban studies. Aggregating different municipal districts around a core city, we construct a single large urban region, which allows to include all the area of economic influence of a core into one statistical unit (see Rogov & Rozenblat, 2019 for more details). In doing so we use four principal urban concepts (Pumain et al., 1992): political definition, morphological definition, functional definition and conurbation that we call Large Urban Region. We implemented LURs using criteria such as population distribution, road networks, access to an airport, distance from a core, presence of multinational firms. In this database we provide population data for LURs and their administrative units.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database provides construction of Large Urban Regions (LUR) in Russia. A Large Urban Region (LUR) can be defined as an aggregation of continuous statistical units around a core that are economically dependent on this core and linked to it by economic and social strong interdependences. The main purpose of this delineation is to make cities comparable on the national and world scales and to make comparative social-economic urban studies. Aggregating different municipal districts around a core city, we construct a single large urban region, which allows to include all the area of economic influence of a core into one statistical unit (see Rogov & Rozenblat, 2020 for more details) thus, changing a city position in a global urban hierarchy. In doing so we use four principal urban concepts (Pumain et al., 1992): political definition, morphological definition, functional definition and conurbation that we call Large Urban Region. We constructed Russian LURs using criteria such as population distribution, road networks, access to an airport, distance from a core, presence of multinational firms. In this database, we provide population data for LURs and their administrative units.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset includes entrepreneurship development indicators: - Number of enterprises and organizations (at the end of the year) (2020), units and thousand units; - Turnover of organizations (2020), billion rubles; - Balanced financial result (profit minus loss) of organizations' activities (2020), million rubles and billion rubles; - Share of organizations using special software (2020), % The dataset also contains indicators such as Ыhare of organizations implementing technological innovations (2020), %; Gross regional product (GRP) (2019), million rubles. The values of indicators are given for 96 main territories of the Russian Federation allocated by Rosstat, including regions, federal districts and largest cities (federal centers). The data are given by regions for 2020, as well as for Russia as a whole for 2010, 2015 and 2020. Source of the data: Regions of Russia. Socio-economic indicators - 2021 [Electronic resource]. - Rosstat. – Access mode: https://rosstat.gov.ru/folder/210/document/13204 The dataset is available in Russian (на Русском) and English (in separate files).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In order to create assistant for the content-advertising system, there was an automated generator advert content. So a lot of ads were collected from a popular search engine (only for Russian ads campaign). (if someone interesting in, i can upload full 40+GB data)
The database was collected from open public sources and contains ads from regions of Russia, Ukraine, Belarus, Kazakhstan and the major cities of these countries.
Unique items: 800 000 (part1) Total size about 15MM
The database was collected in October 2016 - January 2017. No one was harmed when collecting the database (the program does not click on the ads).
Try to search patterns in the ads, and develop an automatic text generator for ad systems.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked in alt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinatesRegion:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, eg RU-VLD
federal_district
, eg Центральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
Comments
City groups
Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication as parts of Тюменская область
.
Several notable towns are classified as administrative part of larger cities (Сестрорецк
is a municpality at Saint-Petersburg, Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
We suppressed letter "ё" city
columns in towns.csv - we have Орел
, but not Орёл
. This affected:
Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
Дмитриев
and Дмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
Саратовская область.doc
to docxCreates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
Creates:
3. Merge data
Run:
Creates: