Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked in alt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinatesRegion:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, eg RU-VLD
federal_district
, eg Центральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
Comments
City groups
Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication as parts of Тюменская область
.
Several notable towns are classified as administrative part of larger cities (Сестрорецк
is a municpality at Saint-Petersburg, Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
We suppressed letter "ё" city
columns in towns.csv - we have Орел
, but not Орёл
. This affected:
Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
Дмитриев
and Дмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
Саратовская область.doc
to docxCreates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
Creates:
3. Merge data
Run:
Creates:
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description of Data
Period: 01/03–30/03 Object: City or Region (842 cities — some data N/A, 84 regions) Target: inf_rate — infection rate, calculated as Log(infected_at_30.03 + 1) — Log( infected_at_15.03 + 1)
Explanatory Variables - population — number of citizens - density — population density (http://worldgeo.ru/russia/lists/?id=26) - lat — latitude - lng — longtitude - cleanness, public_services, neighbourhood, children_places, sport_and_outdoor, shops_and_malls, public_transport, security, life_costs — survey responses for life quality evaluation survey (https://www.domofond.ru/statya/polnyy_reyting_250_gorodov_rossii_po_kachestvu_zhizni/6764) - ivl_per_100k, ivl_number, ekmo_per_100k, ekmo_number — number of ventilators absolute and per 100k population, number of ECMO equipments — absolute and per 100k population. - infected_3003, died_3003, recovered_3003, sick_3003, infected_1503, died_1503, recovered_1503, sick_1503 — measure for calcularing infection_rate. We assume incubation period as two weeks and calculate log-transformed increment as proxy of reproduction rate (https://ru.wikipedia.org/wiki/%D0%A0%D0%B0%D1%81%D0%BF%D1%80%D0%BE%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B5%D0%BD%D0%B8%D0%B5_COVID-19_%D0%B2_%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D0%B8) - avg_temp_min, avg_temp_max, avg_temp_std, avg_temp_median, humidity_min, humidity_max, humidity_std, humidity_median, pressure_min, pressure_max, pressure_std, pressure_median, wind_speed_ms_min, wind_speed_ms_max, wind_speed_ms_std, wind_speed_ms_median — weather archive data for March 2020 from http://weatherarchive.ru/catalog1 - urban_50–54_years, urban_55–59_years, urban_60–64_years, urban_65–69_years, urban_70–74_years, urban_75–79_years, urban_80–84_years, urban_85–89_years, urban_90–94_years, rural_50–54_years, rural_55–59_years, rural_60–64_years, rural_65–69_years, rural_70–74_years, rural_75–79_years, rural_80–84_years, rural_85–89_years, rural_90–94_years — number of inhabitants by age groups and urban/rural areas, (http://showdata.gks.ru/, measure name — 23110000100030200007_Численность_постоянного_населения_России_по_возрасту_на_1_январ.) - work_ratio_15–72_years, work_ratio_55–64_years, work_ratio_15–24_years, work_ratio_15–64_years, work_ratio_25–54_years — percentage of people working by age groups (http://showdata.gks.ru/, measure name — 11242000300080200004_Уровень_занятости_по_полу_и_возрастным_группам) - num_patients_tubercul_1992 .. 2017 — number of tuberculosis patients by years (better or worse vaccination? vaccination data not available by regions, http://showdata.gks.ru/, measure name — 22420000100070200001_Численность_выявленных_пациентов_с_впервые_в_жизни_установленным_диагнозом_акт) - volume_serv_household_2017, volume_serv_chargeable_2017, volume_serv_transport_2017, volume_serv_post_2017, volume_serv_accommodation_2017, volume_serv_telecom_2017, volume_serv_others_2017, volume_serv_veterinary_2017, volume_serv_housing_2017, volume_serv_education_2017, volume_serv_medicine_2017, volume_serv_disabled_2017, volume_serv_culture_2017, volume_serv_sport_2017, volume_serv_hotels_2017, volume_serv_tourism_2017, volume_serv_sanatorium_2017 — volume of services by different types in RUB currency (http://showdata.gks.ru/, measure name — 21373000200010200001_Объем_платных_услуг_населению_с_2017_г_). Could be also useful as lower estimate of lost money due to lockdown. - num_phones_rural_2018, num_phones_urban_2018— number of phones by urban/rural area, (http://showdata.gks.ru/, measure name — 11111132100050200001_Число_телефонных_аппаратов_(включая_таксофоны). Could be also useful for evaluating possibility of contact tracing — at least at the level of availability of phones in risky areas. - bus_march_travel_18, bus_april_travel_18 — number of passenger kilometers * 1000 for buses in March/April 2018 (http://showdata.gks.ru/, measure name — Пассажирооборот автобусов по маршрутам регулярных перевозок (тысяча пассажиро-километров)) - epirank_avia, epirank_bus, epirank_train, epirank_avia_cat, epirank_bus_cat, epirank_train_cat — epirank indexes, calculated based on the following paper (prefix …cat — head/tail breaks, as described in the paper, daytime =0.55, d=1.0) https://www.researchgate.net/publication/332131602_EpiRank_Modeling_Bidirectional_Disease_Spread_in_Asymmetric_Commuting_Networks based on the following implementation: https://github.com/wcchin/EpiRank, based on the following data Origin-Destination data for Aviation, Trains and Buses from Tutu.ru (Thank you!): https://github.com/ods-ai-ml4sg/covid19-tutu whole_population, urban, rural — risky population from 65+group in the whole, and by urban/rural, http://showdata.gks.ru/)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files and columns
1. towns.csv:
city_name
population
- Rosstat estimate, thousand people, as of 1.1.2020region_name
- subnational region (oblast, republic or krai)To be added (via dadata.ru and OSM):
lat,lon
- geographic coordinates (from OSM)federal_district
, eg "Центральный"region_iso_code
- ISO 3166 code, eg RU-VLD
fias_id
- код ФИАС
kladr_id
- код КЛАДР
(depreciated)place_id
- OpenStreetMap (OSM) identifier
2. regions.csv - list of Russian Federation regions
Sources
Comments
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020Дмитриев
and Дмитриев-Льговский
are the same city.Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication (parts of Тюменская область
).Сестрорецк
is a municpality at Санкт-Петербург
) and not reported in this dataset.
In 2025, Moscow was the largest city in Europe with an estimated urban agglomeration of 12.74 million people. The French capital, Paris, was the second largest city in 2025 at 11.35 million, followed by the capitals of the United Kingdom and Spain, with London at 9.84 million and Madrid at 6.81 million people. Istanbul, which would otherwise be the largest city in Europe in 2025, is excluded as it is only partially in Europe, with a sizeable part of its population living in Asia. Europe’s population is almost 750 million Since 1950, the population of Europe has increased by approximately 200 million people, increasing from 550 million to 750 million in these seventy years. Before the turn of the millennium, Europe was the second-most populated continent, before it was overtaken by Africa, which saw its population increase from 228 million in 1950 to 817 million by 2000. Asia has consistently had the largest population of the world’s continents and was estimated to have a population of 4.6 billion. Europe’s largest countries Including its territory in Asia, Russia is by far the largest country in the world, with a territory of around 17 million square kilometers, almost double that of the next largest country, Canada. Within Europe, Russia also has the continent's largest population at 145 million, followed by Germany at 83 million and the United Kingdom at almost 68 million. By contrast, Europe is also home to various micro-states such as San Marino, which has a population of just 30 thousand.
In 2024, Monaco was the European country estimated to have the highest fertility rate. The country had a fertility rate of 2.1 children per woman. Other small countries such as Gibraltar or Montenegro also came towards the top of the list for 2024, while the large country with the highest fertility rate was France, with 1.64 children per woman. On the other hand, Ukraine had the lowest fertility rate, averaging around one child per woman.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked in alt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinatesRegion:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, eg RU-VLD
federal_district
, eg Центральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
Comments
City groups
Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication as parts of Тюменская область
.
Several notable towns are classified as administrative part of larger cities (Сестрорецк
is a municpality at Saint-Petersburg, Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
We suppressed letter "ё" city
columns in towns.csv - we have Орел
, but not Орёл
. This affected:
Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
Дмитриев
and Дмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
Саратовская область.doc
to docxCreates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
Creates:
3. Merge data
Run:
Creates: