7 datasets found
  1. 1117 Russian cities with city name, region, geographic coordinates and 2020...

    • zenodo.org
    csv
    Updated Aug 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov (2021). 1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate [Dataset]. http://doi.org/10.5281/zenodo.5151423
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 6, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Russia
    Description

    1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.

    How to use

    from pathlib import Path
    import requests
    import pandas as pd
    
    url = ("https://raw.githubusercontent.com/"
       "epogrebnyak/ru-cities/main/assets/towns.csv")
    
    # save file locally
    p = Path("towns.csv")
    if not p.exists():
      content = requests.get(url).text
      p.write_text(content, encoding="utf-8")
    
    # read as dataframe
    df = pd.read_csv("towns.csv")
    print(df.sample(5))

    Files:

    Сolumns (towns.csv):

    Basic info:

    • city - city name (several cities have alternative names marked in alt_city_names.json)
    • population - city population, thousand people, Rosstat estimate as of 1.1.2020
    • lat,lon - city geographic coordinates

    Region:

    • region_name - subnational region (oblast, republic, krai or AO)
    • region_iso_code - ISO 3166 code, eg RU-VLD
    • federal_district, eg Центральный

    City codes:

    • okato
    • oktmo
    • fias_id
    • kladr_id

    Data sources

    Comments

    City groups

    • Ханты-Мансийский and Ямало-Ненецкий autonomous regions excluded to avoid duplication as parts of Тюменская область.

    • Several notable towns are classified as administrative part of larger cities (Сестрорецк is a municpality at Saint-Petersburg, Щербинка part of Moscow). They are not and not reported in this dataset.

    By individual city

    Alternative city names

    • We suppressed letter "ё" city columns in towns.csv - we have Орел, but not Орёл. This affected:

      • Белоозёрский
      • Королёв
      • Ликино-Дулёво
      • Озёры
      • Щёлково
      • Орёл
    • Дмитриев and Дмитриев-Льговский are the same city.

    assets/alt_city_names.json contains these names.

    Tests

    poetry install
    poetry run python -m pytest
    

    How to replicate dataset

    1. Base dataset

    Run:

    • download data stro rar/get.sh
    • convert Саратовская область.doc to docx
    • run make.py

    Creates:

    • _towns.csv
    • assets/regions.csv

    2. API calls

    Note: do not attempt if you do not have to - this runs a while and loads third-party API access.

    You have the resulting files in repo, so probably does not need to these scripts.

    Run:

    • cd geocoding
    • run coord_dadata.py (needs token)
    • run coord_osm.py

    Creates:

    • coord_dadata.csv
    • coord_osm.csv

    3. Merge data

    Run:

    • run merge.py

    Creates:

    • assets/towns.csv

  2. Database of patient reviews expressing dissatisfaction with the quality of...

    • zenodo.org
    bin
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irina Kalabikhina; Irina Kalabikhina; Anton Kolotusha; Anton Kolotusha (2025). Database of patient reviews expressing dissatisfaction with the quality of medical services in Russia in 2012-2023 [Dataset]. http://doi.org/10.5281/zenodo.15257447
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Irina Kalabikhina; Irina Kalabikhina; Anton Kolotusha; Anton Kolotusha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Russia
    Description

    Data format and access

    The database consists of full-text patient reviews, reflecting their dissatisfaction with healthcare quality. Materials in Russian have been posted in the «Review list» of the site infodoctor.ru. Publication period: July 2012 to August 2023. The database consists of 18,492 reviews covering 16 Russian cities with population of over one million. Data format: .xlsx.

    Data access: 10.5281/zenodo.15257447

    Data collection methodology

    Based on the fact that negative reviews may be more reliable than positive ones, the authors carried out negative reviews from 16 Russian cities with a population of over one million, for which it was possible to collect representative samples (at least 1000 reviews for each city). We have extracted reviews from the one-star section of this site's guestbook, as they are reliably identified as negative. Duplicates were removed from the database. Personal data in comment texts have been replaced with "##########". The author's gender was determined manually based on his/her name or gender endings in the texts of reviews. Otherwise, we indicated "0" - gender cannot be determined.

    For Moscow reviews, classification was carried out using manual markup methods - based on the majority of votes for the review class from 3 annotators (if at least one annotator indicated that it was impossible to determine, the review was classified as #N/A - impossible to clearly determine). For reviews from other cities, classification was made into 3 classes using machine learning methods based on logistic regression. The classification accuracy was 88%.

    The medical specialties were distributed into large groups for the convenience of further analysis. The correspondence of medical specialties to large groups is presented in detail in Appendix 1.

    Sample structure and description of variables

    · CITY – the name of a city with a population of over a million (on a separate sheet – Moscow), the other 15 are Volgograd, Voronezh, Yekaterinburg, Kazan, Krasnodar, Krasnoyarsk, Nizhny Novgorod, Novosibirsk, Omsk, Perm, Rostov-on-Don, Samara, St. Petersburg, Ufa, Chelyabinsk

    · TEXT – review text

    · GENDER – gender of the review author (2 – female, 1 – male, 0 – cannot be determined)

    · CLASS_1 – group of reasons for dissatisfaction with medical care (M – issues of medical content, O – issues of organizational support and economic aspect, C – mixed (combined) class, #N/A – cannot be clearly determined)[1]

    · CLASS_2 – group of reasons for dissatisfaction with medical care (0 – issues of medical content, 1 – issues of organizational support and economic aspect, 2 – mixed (combined) class, #N/A – cannot be clearly determined)

    · DAY – day of the month the review was posted

    · MONTH – month the review was posted

    · YEAR – year the review was posted

    · DOCTOR_OR_CLINIC – what or who is the review dedicated to – the doctor or the clinic

    · SPEC – physician specialty (for observations where the review is dedicated to the physician)

    · GROUP_SPEC – a large group of a physician’s specialty

    · ID – observation identifier

    Database application

    The data are suitable for analyzing patient dissatisfaction trends with medical services in Russia over the period from July 2012 to August 2023. This dataset could be particularly useful for healthcare providers, policymakers, and researchers interested in understanding patient experiences and identifying areas for quality improvement in Russian healthcare. Some potential applications include:

    • Analyzing geographic patterns of patient complaints across different cities in Russia
    • Examining trends in patient dissatisfaction over time
    • Identifying common reasons for dissatisfaction with medical care
    • Comparing dissatisfaction levels between different medical specialties
    • Assessing gender differences in patient complaints

    The database provides rich qualitative data through full-text review texts, allowing for in-depth analysis of patient experiences. The structured variables like city, date, doctor/clinic information, etc. enable quantitative analysis as well. This combination of qualitative and quantitative data makes it possible to gain a comprehensive understanding of patient dissatisfaction patterns in Russia's healthcare system over more than a decade.

    For researchers specifically interested in healthcare quality issues, this dataset could serve as an important resource for studying patient experiences and outcomes in Russia's medical system. The longitudinal nature of the data (2012-2023) also allows for analysis of changes over time in patient satisfaction.

    Overall, this database provides valuable insights into patient perceptions of healthcare quality that could inform policy decisions, quality improvement


    [1] We divided the variable-indicator of the group of reasons for dissatisfaction with medical care into 2 options - with letter (CLASS_1) and numeric codes (CLASS_2) (for the convenience of possible use of data in the work)

  3. d

    Land Resources of Russia, Version 1.1

    • search.dataone.org
    Updated Nov 17, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stolbovoi, Vladimir; McCallum, Ian (2014). Land Resources of Russia, Version 1.1 [Dataset]. https://search.dataone.org/view/Land_Resources_of_Russia%2C_Version_1.1.xml
    Explore at:
    Dataset updated
    Nov 17, 2014
    Dataset provided by
    Regional and Global Biogeochemical Dynamics Data (RGD)
    Authors
    Stolbovoi, Vladimir; McCallum, Ian
    Time period covered
    Jan 1, 1987 - Dec 31, 1993
    Area covered
    Description

    Together with the Russian Academy of Sciences, IIASA's Forestry (FOR) project has released a CD-ROM titled Land Resources of Russia, Version 1.1, containing socioeconomic and biophysical data sets on important targets of international conventions — climate change, wetlands, desertification, and biodiversity. The CD-ROM, a country-scale integrated information system, supports sustainable use of land resources in line with Chapter 10 of Agenda 21 (UNCED) and makes a contribution to the Rio+10 Summit.

    The Project's analysis of land resources are crucial for doing full greenhouse gas (or carbon) accounting. Integrated land analyses are also important for the introduction of sustainable forest management. FOR's land analyses concentrate on Russia, which is used as a case study for full carbon and greenhouse accounting.

    Russia's area of forests, called here the forest zone, covers about 1180 million ha or 69% of the land of the country. The forested area (forests forming closed stands) occupies some 765 million ha constituting 65% of the forest zone. Forests are elements of a land-cover mosaic that direct the features of landscapes, ecosystems, vegetation and land uses. The FOR project attempts to overcome the traditional approach of just considering the direct utilities of forests. Instead, FOR operates with a holistic view of forests in a fully-fledged land concept. Integrated analysis of the land requires extended databases that includes various data for the total land operated in the form of GIS-based tools.

    The land databases on Russia are the most comprehensive ever assembled, inside or outside of Russia. The databases have been enriched by remotely sensed data, biogeochemical functionality (carbon analysis), and institutional frameworks. The data included on the CD-ROM have been specially selected and filtered to meet the following criteria: (1) completeness: to meet a variety of the analysis tasks; (2) complexity: to describe a diversity of the task aspects; (3) consistency: to provide compatible results; to be ata compatible scale and, to provide a compatible time horizon; and (4) uniformity: to allow them to be standardized and formatted according to modern data handling routines.

    The following databases and coverages are included on the CD-ROM and are available for download:

    Socioeconomic Database -- Describes the social environment of each administrative region in Russia with close to 7000 parameters. The data cover the years 1987-1993. Coverages in this section include:

    (1) Socioeconomic Statistical Database. This database provides the following statistical data sets: Population; Labor and Salary; Industry; Agriculture; Capital Construction; Communication and Transport; State Trade and Catering; Utilities and Services; Health Care and Sport; Education and Culture; Finance; Public Consumption; Industrial Production; Interregional Trade; Labor Resources; Supply of Materials; Environmental Protection; Foreign Trade; and Price Indices.

    (2) Population Database. Adapted from Center for International Earth Science Information Network (CIESIN), Columbia University; International Food Policy Research Institute (IFPRI); and World Resources Institute (WRI). 2000. Gridded Population of the World (GPW), Version 2, this coverage contains population densities for 1995 on a 2.5 degree grid. Data were adjusted to match United Nations national population estimates for 1995.

    (3) Administrative Oblasts, Cities & Towns Database. Oblasts coverage contains 92 polygons, 88 of which contain Oblast names, the other four represent waterbodies. The cities coverage contains 37 cities identified by name.

    (4) Transportation Database. The statistical data sets and maps cover the transport routes of the railway, road, and river networks spanning the entire country. Railways and roads are classified by type and status, and major rivers are named. Map coverages (line data) were created from the Digital Chart of the World, using the 1993 version at the 1:1,000,000 scale.

    Natural Conditions Database. This section of the CD-ROM contains the basic land characteristics. This database provides specialists and scientists in research institutes and international agencies with the capability to perform scientific analysis with a Geographic Information System. These data describe land characteristics that might be applied in various ways, such as individual items (e.g., temperature, elevation, vegetation community, etc.), in combination (e.g., forest-temperature associations, soil spectra for land use types, etc.), and as aggregations based on a conceptual framework of a different level of complexity (e.g., ecosystem establishment, human-induced land cover transformation, biochemical cycle analysis, etc.). Coverage includes:

    (1) Climate Database. Temperature (annual and seasonal) and Precipitation... Visit https://dataone.org/datasets/Land_Resources_of_Russia%2C_Version_1.1.xml for complete metadata about this dataset.

  4. DATABASE: RUSSIAN LARGE URBAN REGIONS (LUR)

    • zenodo.org
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail Rogov; Mikhail Rogov (2025). DATABASE: RUSSIAN LARGE URBAN REGIONS (LUR) [Dataset]. http://doi.org/10.5281/zenodo.3354436
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mikhail Rogov; Mikhail Rogov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Russia
    Description

    This database provides a construction of Large Urban Regions (LUR) in Russia. A Large Urban Region (LUR) can be defined as an aggregation of continuous statistical units around a core that are economically dependent on this core and linked to it by economic and social strong interdependences. The main purpose of this delineation is to make cities comparable on the national and world scales and to make comparative social-economic urban studies. Aggregating different municipal districts around a core city, we construct a single large urban region, which allows to include all the area of economic influence of a core into one statistical unit (see Rogov & Rozenblat, 2019 for more details). In doing so we use four principal urban concepts (Pumain et al., 1992): political definition, morphological definition, functional definition and conurbation that we call Large Urban Region. We implemented LURs using criteria such as population distribution, road networks, access to an airport, distance from a core, presence of multinational firms. In this database we provide population data for LURs and their administrative units.

  5. Z

    DATABASE: RUSSIAN LARGE URBAN REGIONS 2020

    • data.niaid.nih.gov
    Updated Nov 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail Rogov (2021). DATABASE: RUSSIAN LARGE URBAN REGIONS 2020 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3354435
    Explore at:
    Dataset updated
    Nov 25, 2021
    Dataset authored and provided by
    Mikhail Rogov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Russia
    Description

    This database provides construction of Large Urban Regions (LUR) in Russia. A Large Urban Region (LUR) can be defined as an aggregation of continuous statistical units around a core that are economically dependent on this core and linked to it by economic and social strong interdependences. The main purpose of this delineation is to make cities comparable on the national and world scales and to make comparative social-economic urban studies. Aggregating different municipal districts around a core city, we construct a single large urban region, which allows to include all the area of economic influence of a core into one statistical unit (see Rogov & Rozenblat, 2020 for more details) thus, changing a city position in a global urban hierarchy. In doing so we use four principal urban concepts (Pumain et al., 1992): political definition, morphological definition, functional definition and conurbation that we call Large Urban Region. We constructed Russian LURs using criteria such as population distribution, road networks, access to an airport, distance from a core, presence of multinational firms. In this database, we provide population data for LURs and their administrative units.

  6. m

    Entrepreneurship Development in Russia to Support Economic Growth and...

    • data.mendeley.com
    Updated Oct 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana Litvinova (2022). Entrepreneurship Development in Russia to Support Economic Growth and Technology Leadership [Dataset]. http://doi.org/10.17632/kzpt8j27vr.1
    Explore at:
    Dataset updated
    Oct 10, 2022
    Authors
    Tatiana Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Russia
    Description

    The dataset includes entrepreneurship development indicators: - Number of enterprises and organizations (at the end of the year) (2020), units and thousand units; - Turnover of organizations (2020), billion rubles; - Balanced financial result (profit minus loss) of organizations' activities (2020), million rubles and billion rubles; - Share of organizations using special software (2020), % The dataset also contains indicators such as Ыhare of organizations implementing technological innovations (2020), %; Gross regional product (GRP) (2019), million rubles. The values of indicators are given for 96 main territories of the Russian Federation allocated by Rosstat, including regions, federal districts and largest cities (federal centers). The data are given by regions for 2020, as well as for Russia as a whole for 2010, 2015 and 2020. Source of the data: Regions of Russia. Socio-economic indicators - 2021 [Electronic resource]. - Rosstat. – Access mode: https://rosstat.gov.ru/folder/210/document/13204 The dataset is available in Russian (на Русском) and English (in separate files).

  7. Ads from context advertising

    • kaggle.com
    zip
    Updated Feb 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kotobotov (2017). Ads from context advertising [Dataset]. https://www.kaggle.com/kotobotov/context-advertising
    Explore at:
    zip(9888139 bytes)Available download formats
    Dataset updated
    Feb 17, 2017
    Authors
    Kotobotov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    In order to create assistant for the content-advertising system, there was an automated generator advert content. So a lot of ads were collected from a popular search engine (only for Russian ads campaign). (if someone interesting in, i can upload full 40+GB data)

    Content

    The database was collected from open public sources and contains ads from regions of Russia, Ukraine, Belarus, Kazakhstan and the major cities of these countries.

    Unique items: 800 000 (part1) Total size about 15MM

    Acknowledgements

    The database was collected in October 2016 - January 2017. No one was harmed when collecting the database (the program does not click on the ads).

    Inspiration

    Try to search patterns in the ads, and develop an automatic text generator for ad systems.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov (2021). 1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate [Dataset]. http://doi.org/10.5281/zenodo.5151423
Organization logo

1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate

Explore at:
csvAvailable download formats
Dataset updated
Aug 6, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Evgeniy Pogrebnyak; Evgeniy Pogrebnyak; Kirill Artemov; Kirill Artemov
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Russia
Description

1117 Russian cities with city name, region, geographic coordinates and 2020 population estimate.

How to use

from pathlib import Path
import requests
import pandas as pd

url = ("https://raw.githubusercontent.com/"
   "epogrebnyak/ru-cities/main/assets/towns.csv")

# save file locally
p = Path("towns.csv")
if not p.exists():
  content = requests.get(url).text
  p.write_text(content, encoding="utf-8")

# read as dataframe
df = pd.read_csv("towns.csv")
print(df.sample(5))

Files:

Сolumns (towns.csv):

Basic info:

  • city - city name (several cities have alternative names marked in alt_city_names.json)
  • population - city population, thousand people, Rosstat estimate as of 1.1.2020
  • lat,lon - city geographic coordinates

Region:

  • region_name - subnational region (oblast, republic, krai or AO)
  • region_iso_code - ISO 3166 code, eg RU-VLD
  • federal_district, eg Центральный

City codes:

  • okato
  • oktmo
  • fias_id
  • kladr_id

Data sources

Comments

City groups

  • Ханты-Мансийский and Ямало-Ненецкий autonomous regions excluded to avoid duplication as parts of Тюменская область.

  • Several notable towns are classified as administrative part of larger cities (Сестрорецк is a municpality at Saint-Petersburg, Щербинка part of Moscow). They are not and not reported in this dataset.

By individual city

Alternative city names

  • We suppressed letter "ё" city columns in towns.csv - we have Орел, but not Орёл. This affected:

    • Белоозёрский
    • Королёв
    • Ликино-Дулёво
    • Озёры
    • Щёлково
    • Орёл
  • Дмитриев and Дмитриев-Льговский are the same city.

assets/alt_city_names.json contains these names.

Tests

poetry install
poetry run python -m pytest

How to replicate dataset

1. Base dataset

Run:

  • download data stro rar/get.sh
  • convert Саратовская область.doc to docx
  • run make.py

Creates:

  • _towns.csv
  • assets/regions.csv

2. API calls

Note: do not attempt if you do not have to - this runs a while and loads third-party API access.

You have the resulting files in repo, so probably does not need to these scripts.

Run:

  • cd geocoding
  • run coord_dadata.py (needs token)
  • run coord_osm.py

Creates:

  • coord_dadata.csv
  • coord_osm.csv

3. Merge data

Run:

  • run merge.py

Creates:

  • assets/towns.csv

Search
Clear search
Close search
Google apps
Main menu