Facebook
TwitterThe African Cities Population Database (ACPD) has been produced by the Birkbeck College of the University of London in 1990 at the request of the United Nations Environment Programme (UNEP) in Nairobi, Kenya. The database contains head counts for 479 cities in Africa which either have a population of over 20,000 or are capitals of their nation state. Listed are the geographical location of the cities and their population sizes. The material is primarily derived from a 1988 report of the Economic Commission for Africa (ECA) and several issues of the United Nations Demographic Yearbook (1973-81). Severe problems were found with several countries such as Togo, Ghana and South Africa. For South Africa, the data were derived from the United Nations Demographic Yearbook 1987.
WCPD is an Arc/Info point coverage. It has no projection, as the cities are located on the basis of their latitude and longitude. Coordinates were assigned on the basis of gazetteers or African maps. Each record in the data base contains details of the city name, country name, latitude and longitude of the city, and its population at a defined time. The Arc/Info attribute table contains the following fields:
AREA Arc/Info item PERIMETER Arc/Info item ACPD# Arc/Info item ACPD-ID Arc/Info item ID-NUM Unique number for each city CITY City name COUNTRY Country name CITY-POP Population of city proper YEAR Latest available year of collection
ACPD comes as an Arc/Info EXPORT file originally called "ACPD.E00" and contains 67 Kb of data. The file has a record length of 80 and a block size of 8000 (blocking factor = 100). The file can be read from tape using Arc/Info's TAPEREAD command or any other generic copy utility. If distributed on a diskette it can be read using the ordinary DOS 'COPY' command. The file has to be converted to Arc/Info internal format using its IMPORT command.
References to the WCPD data set can be found in:
The source of the WCPD data set as held by GRID is Birkbeck College, University of London, Department of Geography, London, UK.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The West Africa Coastal Vulnerability Mapping: Population Projections, 2030 and 2050 data set is based on an unreleased working version of the Gridded Population of the World (GPW), Version 4, year 2010 population count raster but at a coarser 5 arc-minute resolution. Bryan Jones of Baruch College produced country-level projections based on the Shared Socioeconomic Pathway 4 (SSP4). SSP4 reflects a divided world where cities that have relatively high standards of living, are attractive to internal and international migrants. In low income countries, rapidly growing rural populations live on shrinking areas of arable land due to both high population pressure and expansion of large-scale mechanized farming by international agricultural firms. This pressure induces large migration flow to the cities, contributing to fast urbanization, although urban areas do not provide many opportunities for the poor and there is a massive expansion of slums and squatter settlements. This scenario may not be the most likely for the West Africa region, but it has internal coherence and is at least plausible. To provide areas in West Africa that may be particularly exposed to climate stressors owing to future high population growth.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
City population size is a crucial measure when trying to understand urban life. Many socio-economic indicators scale superlinearly with city size, whilst some infrastructure indicators scale sublinearly with city size. However, the impact of size also extends beyond the city’s limits. Here, we analyse the scaling behaviour of cities beyond their boundaries by considering the emergence and growth of nearby cities. Based on an urban network from African continental cities, we construct an algorithm to create the region of influence of cities. The number of cities and the population within a region of influence are then analysed in the context of urban scaling. Our results are compared against a random permutation of the network, showing that the observed scaling power of cities to enhance the emergence and growth of cities is not the result of randomness. By altering the radius of influence of cities, we observe three regimes. Large cities tend to be surrounded by many small towns for small distances. For medium distances (above 114 km), large cities are surrounded by many other cities containing large populations. Large cities boost urban emergence and growth (even more than 190 km away), but their scaling power decays with distance.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">
This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.
| Feature | Description | Range |
|---|---|---|
| 10 Features | Economic, environmental & social indicators | Realistically scaled |
| 300 Cities | Europe, Asia, Americas, Africa, Oceania | Diverse distributions |
| Strong Correlations | Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6) | ML-ready |
| No Missing Values | Clean, preprocessed data | Ready for analysis |
| 4-5 Natural Clusters | Metropolitan hubs, eco-towns, developing centers | Pre-validated |
✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)
# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)
# Analyze
print(df.groupby('cluster').mean())
After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics
| Cluster | Characteristics | Example Cities |
|---|---|---|
| Metropolitan Tech Hubs | High income, density, rent | Silicon Valley, Singapore |
| Eco-Friendly Towns | Low density, clean air, high happiness | Nordic cities |
| Developing Centers | Mid income, high density, poor air | Emerging markets |
| Low-Income Suburban | Low infrastructure, income | Rural areas |
| Industrial Mega-Cities | Very high density, pollution | Manufacturing hubs |
Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code
✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights
This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.
Happy Clustering! 🎉
Facebook
TwitterThis dataset was developed as part of a challenge to segment building footprints from aerial imagery. The goal of the challenge was to accelerate the development of more accurate, relevant, and usable open-source AI models to support mapping for disaster risk management in African cities [Read more about the challenge]. The data consists of drone imagery from 10 different cities and regions across Africa
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To address the absence of comparable city-level water data, this dataset compiles data from 15 global South cities located in sub-Saharan Africa, South Asia, and Latin America and among the regions that are the focus of the World Resources Report (WRR) "Towards a More Equal City". The 15 cities are Kampala, Uganda; Lagos, Nigeria; Maputo, Mozambique; Mzuzu, Malawi; Nairobi, Kenya; Bengaluru, India; Colombo, Sri Lanka; Dhaka, Bangladesh; Karachi, Pakistan; Mumbai, India; Caracas, Venezuela; Cochabamba, Bolivia; Rio de Janeiro, Brazil; São Paulo, Brazil; and Santiago de Cali, Colombia. To compile a data set on each city, we collaborated with local researchers who had a minimum of seven years of experience in the water sector. Data were obtained from a combination of interviews, fieldwork in an informal settlement, publicly available data sets, administrative records, websites, and project documents. Researchers in each city conducted an average of seven key informant interviews. Data were collected about household water and sanitation access at the city level and fieldwork was conducted in one informal settlement in each city. The dataset includes cost, % coverage, availability, and cost burdens on household water and sanitation practices; water intermittency; household treatment practices; access to facilities; citywide sanitation infrastructure; cost of on-site sanitation construction, and fecal sludge removal; fees for piped sewage; the lining of pit latrines; and proximity of septic tanks and pit latrines to water sources. At the city level, data were collected on the water utility, the city’s sources of water, and the water utility’s legal and administrative status. We augmented the city-level data with fieldwork and data from one informal settlement in each city, for two reasons: (1) city-level data are usually presented in averages and thus tend to mask extremes at both ends of the socioeconomic distribution; and (2) in many cities, informal settlements are excluded from formal city-level statistics because their land occupation is considered illegal. To select the informal settlement in each city, the researchers identified a centrally located, well-established settlement that did not represent either the city’s “best” or “worst” conditions but instead represented challenges to water access common in similar settlements in the city.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Measuring Living Standards in Cities (MLSC) survey is a new instrument designed to enhance understanding of cities in Africa and support evidence based policy design. The instrument was developed under the World Bank’s Spatial Development of African Cities Program, and was piloted in Dar es Salaam (Tanzania) and Durban (South Africa) over the course of 2014/15. These geo-referenced surveys provide information on urban living standards at an unprecedented level of granularity: they can be compared across different geographic levels within the cities, and between areas of ‘regular’ and ‘irregular’ settlement patterns. They also respond to the need to increased understanding of specifically ‘urban’ dimensions of quality of living: housing attributes, access to basic services, and commuting patterns, among others.
Facebook
TwitterThis dataset consists of very high resolution urban land cover maps for two African cities, Mekelle, Ethiopia and Polokwane, South Africa for 2020. Maps were generated from Planet SuperDove satellite imagery at 3.125-m spatial resolution, and Worldview-3 satellite imagery (Maxar Techologies) at two spatial resolutions, 2 m for multispectral imagery and 0.5-m spatial resolution for pansharpened imagery. An object-based image classification approach was used to produce a multi-class land cover product for each image source. The aim of this work was to support fine scale urban land cover analyses and comparative assessments between different high resolution satellite imagery sources. The data are provided in shapefile format.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Data for Adam Storeygard; Farther on down the Road: Transport Costs, Trade and Urban Growth in Sub-Saharan Africa, The Review of Economic Studies, Volume 83, Issue 3, 1 July 2016, Pages 1263–1295, https://doi.org/10.1093/restud/rdw020. Abstract: How does isolation affect the economic activity of cities? Transport costs are widely considered an important barrier to local economic activity but their impact in developing countries is not well-studied. This paper investigates the role of inter-city transport costs in determining the income of sub-Saharan African cities. In particular, focusing on fifteen countries whose largest city is a port, I ask how important access to that city is for the income of hinterland cities. The lack of panel data on both local economic activity and transport costs has prevented rigorous empirical investigation of this question. I fill this gap with two new datasets. Satellite data on lights at night proxy for city economic activity, and new road network data allow me to calculate the shortest route between cities. Cost per unit distance is identified by plausibly exogenous world oil prices. The results show that an oil price increase of the magnitude experienced between 2002 and 2008 induces the income of cities near a major port to increase by 6.6 percent relative to otherwise identical cities one standard deviation farther away. Combined with external estimates, this implies an elasticity of city economic activity with respect to transport costs of -0.25 at that distance. Moreover, the effect differs by the surface of roads between cities. Cities connected to the port by paved roads are chiefly affected by transport costs to the port, while cities connected to the port by unpaved roads are more affected by connections to secondary centers. This dataset is part of the Global Research Program on Spatial Development of Cities funded by the Multi-Donor Trust Fund on Sustainable Urbanization of the World Bank and supported by the U.K. Department for International Development.
Facebook
TwitterThe percentage of persons, out of the total number of persons living in an area, self-identifying as racially Black or African American (and ethnically non-Hispanic). “Black or African American” refers to a person having origins in any of the Black racial groups of Africa. This indicator includes people who identified their race as “Black”. Source: U.S. Census Bureau, American Community Survey Years Available: 2010, 2011-2015, 2012-2016, 2013-2017, 2014-2018, 2015-2019, 2020, 2017-2021, 2018-2022, 2019-2023
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This multi-spectral satellite image data set is associated with our recent work on analyzing and predicting urban land use forms in East Africa using OpenStreetMap data, satellite imagery, and Convolutional Neural Networks.
The images were extracted using an automated Python script from Google Maps Static API, based on sample locations in four East African capital cities namely Kampala, Nairobi, Dar es Salaam, and Kigali.
Other data sets associated with this work, that is, ESRI shapefiles for administrative level 1 and OpenStreetMap data for the named cities may be downloaded directly from the respective URLs provided in the manuscript.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set provides a list of Juveniles in need of protections, conflict with law and also children beyond parental care as at 2009 till 30th May 2014.
Facebook
TwitterThe places we live affect our health status and the choices and opportunities we have (or do not have) to lead fulfilling lives. Over the past ten years, the African Population & Health Research Centre (APHRC) has led pioneering work in highlighting some of the major health and livelihood challenges associated with rapid urbanization in sub-Saharan Africa (SSA). In 2002, the Centre established the first longitudinal platform in urban Africa in the city of Nairobi in Kenya. The platform known as the Nairobi Urban Health and Demographic Surveillance System collects data on two informal settlements - Korogocho and Viwandani - in Nairobi City every four months on issues ranging from household dynamics to fertility and mortality, migration and livelihood as well as on causes of death, using a verbal autopsy technique. The dataset provided here contains key demographic and health indicators extracted from the longitudinal database. Researchers interested in accessing the micro-data can look at our data access policy and contact us.
The Demographic Surveillance Area (combining Viwandani and Korogocho slum settlements) covers a land area of about 0.97 km2, with the two informal settlements located about 7 km from each other. Korogocho is located 12 km from the Nairobi city center; in Kasarani division (now Kasarani district), while Viwandani is about 7 km from Nairobi city center in Makadara division (now Madaraka district). The DSA covers about seven villages each in Korogocho and Viwandani.
Individual
Between 1st January and 31st December,2015 the Nairobi HDSS covered 86,304 individualis living in 30,219 households distributed across two informal settlements(Korogocho and Viwandani) were observed. All persons who sleep in the household prior to the day of the survey are included in the survey, while non-resident household members are excluded from the survey.
The present universe started out through an initial census carried out on 1st August,2002 of the population living in the two Informal settlements (Korogocho and Viwandani). Regular visits have since then been made (3 times a year) to update information on births, deaths and migration that have occurred in the households observed at the initial census. New members join the population through a birth to a registered member, or an in-migration, while existing members leave through a death or out-migration. The DSS adopts the concept of an open cohort that allows new members to join and regular members to leave and return to the system.
Event history data
Three rounds in a year
This dataset is related to the whole demographic surveillance area population. The number of respondents has varied over the last 13 years (2002-2015), with variations being observed at both household level and at Individual level. As at 31st December 2015, 66,848 were being observed under the Nairobi HDSS living in 25,812 households distributed across two informal settlements(Korogocho and Viwandani). The variable IndividualId uniquely identifies every respondent observed while the variable LocationId uniquely identifies the room in which the individual was living at any point in time. To identify individuals who were living together at any one point in time (a household) the data can be split on location and observation dates.
None
Proxy Respondent [proxy]
Questionnaires are printed and administered in Swahili, the country's national language.
The questionnaires for the Nairobi HDSS were structured questionnaires based on the INDEPTH Model Questionnaire and were translated into Swahili with some modifications and additions.After an initial review the questionnaires were translated back into English by an independent translator with no prior knowledge of the survey. The back translation from the Swahili version was independently reviewed and compared to the English original. Differences in translation were reviewed and resolved in collaboration with the original translators. The English and Swahili questionnaires were both piloted as part of the survey pretest.
At baseline, a household questionnaire was administered in each household, which collected various information on household members including sex, age, relationship, and orphanhood status. In later rounds questionnaires to track the migration of the population observed at baseline, and additonal questionnaires to capture demographic and health events happening to the population have been introduced.
Data editing took place at a number of stages throughout the processing, including: a) Office editing and coding b) During data entry c) Structure checking and completeness d) Secondary editing e) Structural checking of STATA data files
Where changes were made by the program, a cold deck imputation is preferred; where incorrect values were imputed using existing data from another dataset. If cold deck imputation was found to be insufficient, hot deck imputation was used, In this case, a missing value was imputed from a randomly selected similar record in the same dataset.
Some corrections are made automatically by the program(80%) and the rest by visual control of the questionnaires (20%).
Over the years the response rate at household level has varied between 95% and 97% with response rate at Individual Level varying between 92% and 95%. Challenges to acheiving a 100% response rate have included: - high population mobility within the study area - high population attrition - respondent fatigue - security in some areas
Not applicable for surveillance data
CentreId MetricTable QMetric Illegal Legal Total Metric RunDate
KE031 MicroDataCleaned Starts 219285 2017-05-16 18:25
KE031 MicroDataCleaned Transitions 825036 825036 0 2017-05-16 18:25
KE031 MicroDataCleaned Ends 219285 2017-05-16 18:25
KE031 MicroDataCleaned SexValues 825036 2017-05-16 18:25
KE031 MicroDataCleaned DoBValues 42 824994 825036 0 2017-05-16 18:25
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data contains 12 files: (1) InSAR_data.csv contains the vertical land motion (VLM) and east-west in cm/year for Lagos, Nigeria. (2) Four GeoTIFF files containing the risk levels of Lagos for different year period (4 years, 10 years, 35 years, and 75 years). (3) Four .csv files containing the risk levels for Lagos for different year period (4 years, 10 years, 35 years, and 75 years). (4) Longitude.csv containing the longitude for plotting the risk levels map. (5) Latitude.csv containing the latitude for plotting the risk levels map. (6) BuildingCollapseTable.xlsx contains the catalog of 106 building collapse data compiled for this study. See further details below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Installed and Effective Capacities (MW) per Power Facilities 2014. Data complied from the Kenya Power annual report 2014 (Data submitted on 30.06.2014); the Kenyan Energy Regulatory Commission and Wikipedia for some geolocalizations. Citation: Negawatt challenge. A curated list of datasets for the World Bank Negawatt Challenge competition in Accra and Nairobi cities. https://datahub.io/dataset/kenya-geolocalized-power-facilities-2014
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe African Cities Population Database (ACPD) has been produced by the Birkbeck College of the University of London in 1990 at the request of the United Nations Environment Programme (UNEP) in Nairobi, Kenya. The database contains head counts for 479 cities in Africa which either have a population of over 20,000 or are capitals of their nation state. Listed are the geographical location of the cities and their population sizes. The material is primarily derived from a 1988 report of the Economic Commission for Africa (ECA) and several issues of the United Nations Demographic Yearbook (1973-81). Severe problems were found with several countries such as Togo, Ghana and South Africa. For South Africa, the data were derived from the United Nations Demographic Yearbook 1987.
WCPD is an Arc/Info point coverage. It has no projection, as the cities are located on the basis of their latitude and longitude. Coordinates were assigned on the basis of gazetteers or African maps. Each record in the data base contains details of the city name, country name, latitude and longitude of the city, and its population at a defined time. The Arc/Info attribute table contains the following fields:
AREA Arc/Info item PERIMETER Arc/Info item ACPD# Arc/Info item ACPD-ID Arc/Info item ID-NUM Unique number for each city CITY City name COUNTRY Country name CITY-POP Population of city proper YEAR Latest available year of collection
ACPD comes as an Arc/Info EXPORT file originally called "ACPD.E00" and contains 67 Kb of data. The file has a record length of 80 and a block size of 8000 (blocking factor = 100). The file can be read from tape using Arc/Info's TAPEREAD command or any other generic copy utility. If distributed on a diskette it can be read using the ordinary DOS 'COPY' command. The file has to be converted to Arc/Info internal format using its IMPORT command.
References to the WCPD data set can be found in:
The source of the WCPD data set as held by GRID is Birkbeck College, University of London, Department of Geography, London, UK.