Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked in alt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinatesRegion:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, eg RU-VLD
federal_district
, eg Центральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
Comments
City groups
Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication as parts of Тюменская область
.
Several notable towns are classified as administrative part of larger cities (Сестрорецк
is a municpality at Saint-Petersburg, Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
We suppressed letter "ё" city
columns in towns.csv - we have Орел
, but not Орёл
. This affected:
Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
Дмитриев
and Дмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
Саратовская область.doc
to docxCreates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
Creates:
3. Merge data
Run:
Creates:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database provides a construction of Large Urban Regions (LUR) in Russia. A Large Urban Region (LUR) can be defined as an aggregation of continuous statistical units around a core that are economically dependent on this core and linked to it by economic and social strong interdependences. The main purpose of this delineation is to make cities comparable on the national and world scales and to make comparative social-economic urban studies. Aggregating different municipal districts around a core city, we construct a single large urban region, which allows to include all the area of economic influence of a core into one statistical unit (see Rogov & Rozenblat, 2019 for more details). In doing so we use four principal urban concepts (Pumain et al., 1992): political definition, morphological definition, functional definition and conurbation that we call Large Urban Region. We implemented LURs using criteria such as population distribution, road networks, access to an airport, distance from a core, presence of multinational firms. In this database we provide population data for LURs and their administrative units.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.
How to use
from pathlib import Path import requests import pandas as pd url = ("https://raw.githubusercontent.com/" "epogrebnyak/ru-cities/main/assets/towns.csv") # save file locally p = Path("towns.csv") if not p.exists(): content = requests.get(url).text p.write_text(content, encoding="utf-8") # read as dataframe df = pd.read_csv("towns.csv") print(df.sample(5))
Files:
Сolumns (towns.csv):
Basic info:
city
- city name (several cities have alternative names marked in alt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinatesRegion:
region_name
- subnational region (oblast, republic, krai or AO)region_iso_code
- ISO 3166 code, eg RU-VLD
federal_district
, eg Центральный
City codes:
okato
oktmo
fias_id
kladr_id
Data sources
Comments
City groups
Ханты-Мансийский
and Ямало-Ненецкий
autonomous regions excluded to avoid duplication as parts of Тюменская область
.
Several notable towns are classified as administrative part of larger cities (Сестрорецк
is a municpality at Saint-Petersburg, Щербинка
part of Moscow). They are not and not reported in this dataset.
By individual city
Белоозерский
not found in Rosstat publication, but should be considered a city as of 1.1.2020
Alternative city names
We suppressed letter "ё" city
columns in towns.csv - we have Орел
, but not Орёл
. This affected:
Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
Дмитриев
and Дмитриев-Льговский
are the same city.
assets/alt_city_names.json
contains these names.
Tests
poetry install
poetry run python -m pytest
How to replicate dataset
1. Base dataset
Run:
Саратовская область.doc
to docxCreates:
_towns.csv
assets/regions.csv
2. API calls
Note: do not attempt if you do not have to - this runs a while and loads third-party API access.
You have the resulting files in repo, so probably does not need to these scripts.
Run:
cd geocoding
Creates:
3. Merge data
Run:
Creates: