Facebook
TwitterThe African Cities Population Database (ACPD) has been produced by the Birkbeck College of the University of London in 1990 at the request of the United Nations Environment Programme (UNEP) in Nairobi, Kenya. The database contains head counts for 479 cities in Africa which either have a population of over 20,000 or are capitals of their nation state. Listed are the geographical location of the cities and their population sizes. The material is primarily derived from a 1988 report of the Economic Commission for Africa (ECA) and several issues of the United Nations Demographic Yearbook (1973-81). Severe problems were found with several countries such as Togo, Ghana and South Africa. For South Africa, the data were derived from the United Nations Demographic Yearbook 1987.
WCPD is an Arc/Info point coverage. It has no projection, as the cities are located on the basis of their latitude and longitude. Coordinates were assigned on the basis of gazetteers or African maps. Each record in the data base contains details of the city name, country name, latitude and longitude of the city, and its population at a defined time. The Arc/Info attribute table contains the following fields:
AREA Arc/Info item PERIMETER Arc/Info item ACPD# Arc/Info item ACPD-ID Arc/Info item ID-NUM Unique number for each city CITY City name COUNTRY Country name CITY-POP Population of city proper YEAR Latest available year of collection
ACPD comes as an Arc/Info EXPORT file originally called "ACPD.E00" and contains 67 Kb of data. The file has a record length of 80 and a block size of 8000 (blocking factor = 100). The file can be read from tape using Arc/Info's TAPEREAD command or any other generic copy utility. If distributed on a diskette it can be read using the ordinary DOS 'COPY' command. The file has to be converted to Arc/Info internal format using its IMPORT command.
References to the WCPD data set can be found in:
The source of the WCPD data set as held by GRID is Birkbeck College, University of London, Department of Geography, London, UK.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
City population size is a crucial measure when trying to understand urban life. Many socio-economic indicators scale superlinearly with city size, whilst some infrastructure indicators scale sublinearly with city size. However, the impact of size also extends beyond the city’s limits. Here, we analyse the scaling behaviour of cities beyond their boundaries by considering the emergence and growth of nearby cities. Based on an urban network from African continental cities, we construct an algorithm to create the region of influence of cities. The number of cities and the population within a region of influence are then analysed in the context of urban scaling. Our results are compared against a random permutation of the network, showing that the observed scaling power of cities to enhance the emergence and growth of cities is not the result of randomness. By altering the radius of influence of cities, we observe three regimes. Large cities tend to be surrounded by many small towns for small distances. For medium distances (above 114 km), large cities are surrounded by many other cities containing large populations. Large cities boost urban emergence and growth (even more than 190 km away), but their scaling power decays with distance.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">
This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.
| Feature | Description | Range |
|---|---|---|
| 10 Features | Economic, environmental & social indicators | Realistically scaled |
| 300 Cities | Europe, Asia, Americas, Africa, Oceania | Diverse distributions |
| Strong Correlations | Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6) | ML-ready |
| No Missing Values | Clean, preprocessed data | Ready for analysis |
| 4-5 Natural Clusters | Metropolitan hubs, eco-towns, developing centers | Pre-validated |
✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)
# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)
# Analyze
print(df.groupby('cluster').mean())
After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics
| Cluster | Characteristics | Example Cities |
|---|---|---|
| Metropolitan Tech Hubs | High income, density, rent | Silicon Valley, Singapore |
| Eco-Friendly Towns | Low density, clean air, high happiness | Nordic cities |
| Developing Centers | Mid income, high density, poor air | Emerging markets |
| Low-Income Suburban | Low infrastructure, income | Rural areas |
| Industrial Mega-Cities | Very high density, pollution | Manufacturing hubs |
Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code
✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights
This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.
Happy Clustering! 🎉
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To address the absence of comparable city-level water data, this dataset compiles data from 15 global South cities located in sub-Saharan Africa, South Asia, and Latin America and among the regions that are the focus of the World Resources Report (WRR) "Towards a More Equal City". The 15 cities are Kampala, Uganda; Lagos, Nigeria; Maputo, Mozambique; Mzuzu, Malawi; Nairobi, Kenya; Bengaluru, India; Colombo, Sri Lanka; Dhaka, Bangladesh; Karachi, Pakistan; Mumbai, India; Caracas, Venezuela; Cochabamba, Bolivia; Rio de Janeiro, Brazil; São Paulo, Brazil; and Santiago de Cali, Colombia. To compile a data set on each city, we collaborated with local researchers who had a minimum of seven years of experience in the water sector. Data were obtained from a combination of interviews, fieldwork in an informal settlement, publicly available data sets, administrative records, websites, and project documents. Researchers in each city conducted an average of seven key informant interviews. Data were collected about household water and sanitation access at the city level and fieldwork was conducted in one informal settlement in each city. The dataset includes cost, % coverage, availability, and cost burdens on household water and sanitation practices; water intermittency; household treatment practices; access to facilities; citywide sanitation infrastructure; cost of on-site sanitation construction, and fecal sludge removal; fees for piped sewage; the lining of pit latrines; and proximity of septic tanks and pit latrines to water sources. At the city level, data were collected on the water utility, the city’s sources of water, and the water utility’s legal and administrative status. We augmented the city-level data with fieldwork and data from one informal settlement in each city, for two reasons: (1) city-level data are usually presented in averages and thus tend to mask extremes at both ends of the socioeconomic distribution; and (2) in many cities, informal settlements are excluded from formal city-level statistics because their land occupation is considered illegal. To select the informal settlement in each city, the researchers identified a centrally located, well-established settlement that did not represent either the city’s “best” or “worst” conditions but instead represented challenges to water access common in similar settlements in the city.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Data for Adam Storeygard; Farther on down the Road: Transport Costs, Trade and Urban Growth in Sub-Saharan Africa, The Review of Economic Studies, Volume 83, Issue 3, 1 July 2016, Pages 1263–1295, https://doi.org/10.1093/restud/rdw020. Abstract: How does isolation affect the economic activity of cities? Transport costs are widely considered an important barrier to local economic activity but their impact in developing countries is not well-studied. This paper investigates the role of inter-city transport costs in determining the income of sub-Saharan African cities. In particular, focusing on fifteen countries whose largest city is a port, I ask how important access to that city is for the income of hinterland cities. The lack of panel data on both local economic activity and transport costs has prevented rigorous empirical investigation of this question. I fill this gap with two new datasets. Satellite data on lights at night proxy for city economic activity, and new road network data allow me to calculate the shortest route between cities. Cost per unit distance is identified by plausibly exogenous world oil prices. The results show that an oil price increase of the magnitude experienced between 2002 and 2008 induces the income of cities near a major port to increase by 6.6 percent relative to otherwise identical cities one standard deviation farther away. Combined with external estimates, this implies an elasticity of city economic activity with respect to transport costs of -0.25 at that distance. Moreover, the effect differs by the surface of roads between cities. Cities connected to the port by paved roads are chiefly affected by transport costs to the port, while cities connected to the port by unpaved roads are more affected by connections to secondary centers. This dataset is part of the Global Research Program on Spatial Development of Cities funded by the Multi-Donor Trust Fund on Sustainable Urbanization of the World Bank and supported by the U.K. Department for International Development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set provides a list of Juveniles in need of protections, conflict with law and also children beyond parental care as at 2009 till 30th May 2014.
Facebook
TwitterThe percentage of persons, out of the total number of persons living in an area, self-identifying as racially Black or African American (and ethnically non-Hispanic). “Black or African American” refers to a person having origins in any of the Black racial groups of Africa. This indicator includes people who identified their race as “Black”. Source: U.S. Census Bureau, American Community Survey Years Available: 2010, 2011-2015, 2012-2016, 2013-2017, 2014-2018, 2015-2019, 2020, 2017-2021, 2018-2022, 2019-2023
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
OpenStreetMap exports for use in GIS applications.
This theme includes all OpenStreetMap features in this area matching:
amenity IS NOT NULL OR man_made IS NOT NULL OR shop IS NOT NULL OR tourism IS NOT NULL
Features may have these attributes:
This dataset is one of many "/dataset?tags=openstreetmap">OpenStreetMap exports on HDX. See the Humanitarian OpenStreetMap Team website for more information.
Facebook
TwitterThe places we live affect our health status and the choices and opportunities we have (or do not have) to lead fulfilling lives. Over the past ten years, the African Population & Health Research Centre (APHRC) has led pioneering work in highlighting some of the major health and livelihood challenges associated with rapid urbanization in sub-Saharan Africa (SSA). In 2002, the Centre established the first longitudinal platform in urban Africa in the city of Nairobi in Kenya. The platform known as the Nairobi Urban Health and Demographic Surveillance System collects data on two informal settlements - Korogocho and Viwandani - in Nairobi City every four months on issues ranging from household dynamics to fertility and mortality, migration and livelihood as well as on causes of death, using a verbal autopsy technique. The dataset provided here contains key demographic and health indicators extracted from the longitudinal database. Researchers interested in accessing the micro-data can look at our data access policy and contact us.
The Demographic Surveillance Area (combining Viwandani and Korogocho slum settlements) covers a land area of about 0.97 km2, with the two informal settlements located about 7 km from each other. Korogocho is located 12 km from the Nairobi city center; in Kasarani division (now Kasarani district), while Viwandani is about 7 km from Nairobi city center in Makadara division (now Madaraka district). The DSA covers about seven villages each in Korogocho and Viwandani.
Individual
Between 1st January and 31st December,2015 the Nairobi HDSS covered 86,304 individualis living in 30,219 households distributed across two informal settlements(Korogocho and Viwandani) were observed. All persons who sleep in the household prior to the day of the survey are included in the survey, while non-resident household members are excluded from the survey.
The present universe started out through an initial census carried out on 1st August,2002 of the population living in the two Informal settlements (Korogocho and Viwandani). Regular visits have since then been made (3 times a year) to update information on births, deaths and migration that have occurred in the households observed at the initial census. New members join the population through a birth to a registered member, or an in-migration, while existing members leave through a death or out-migration. The DSS adopts the concept of an open cohort that allows new members to join and regular members to leave and return to the system.
Event history data
Three rounds in a year
This dataset is related to the whole demographic surveillance area population. The number of respondents has varied over the last 13 years (2002-2015), with variations being observed at both household level and at Individual level. As at 31st December 2015, 66,848 were being observed under the Nairobi HDSS living in 25,812 households distributed across two informal settlements(Korogocho and Viwandani). The variable IndividualId uniquely identifies every respondent observed while the variable LocationId uniquely identifies the room in which the individual was living at any point in time. To identify individuals who were living together at any one point in time (a household) the data can be split on location and observation dates.
None
Proxy Respondent [proxy]
Questionnaires are printed and administered in Swahili, the country's national language.
The questionnaires for the Nairobi HDSS were structured questionnaires based on the INDEPTH Model Questionnaire and were translated into Swahili with some modifications and additions.After an initial review the questionnaires were translated back into English by an independent translator with no prior knowledge of the survey. The back translation from the Swahili version was independently reviewed and compared to the English original. Differences in translation were reviewed and resolved in collaboration with the original translators. The English and Swahili questionnaires were both piloted as part of the survey pretest.
At baseline, a household questionnaire was administered in each household, which collected various information on household members including sex, age, relationship, and orphanhood status. In later rounds questionnaires to track the migration of the population observed at baseline, and additonal questionnaires to capture demographic and health events happening to the population have been introduced.
Data editing took place at a number of stages throughout the processing, including: a) Office editing and coding b) During data entry c) Structure checking and completeness d) Secondary editing e) Structural checking of STATA data files
Where changes were made by the program, a cold deck imputation is preferred; where incorrect values were imputed using existing data from another dataset. If cold deck imputation was found to be insufficient, hot deck imputation was used, In this case, a missing value was imputed from a randomly selected similar record in the same dataset.
Some corrections are made automatically by the program(80%) and the rest by visual control of the questionnaires (20%).
Over the years the response rate at household level has varied between 95% and 97% with response rate at Individual Level varying between 92% and 95%. Challenges to acheiving a 100% response rate have included: - high population mobility within the study area - high population attrition - respondent fatigue - security in some areas
Not applicable for surveillance data
CentreId MetricTable QMetric Illegal Legal Total Metric RunDate
KE031 MicroDataCleaned Starts 219285 2017-05-16 18:25
KE031 MicroDataCleaned Transitions 825036 825036 0 2017-05-16 18:25
KE031 MicroDataCleaned Ends 219285 2017-05-16 18:25
KE031 MicroDataCleaned SexValues 825036 2017-05-16 18:25
KE031 MicroDataCleaned DoBValues 42 824994 825036 0 2017-05-16 18:25
Facebook
TwitterCities ranking and mega citiesTokyo is the world’s largest city with an agglomeration of 37 million inhabitants, followed by New Delhi with 29 million, Shanghai with 26 million, and Mexico City and São Paulo, each with around 22 million inhabitants. Today, Cairo, Mumbai, Beijing and Dhaka all have close to 20 million inhabitants. By 2020, Tokyo’s population is projected to begin to decline, while Delhi is projected to continue growing and to become the most populous city in the world around 2028.By 2030, the world is projected to have 43 megacities with more than 10 million inhabitants, most of them in developing regions. However, some of the fastest-growing urban agglomerations are cities with fewer than 1 million inhabitants, many of them located in Asia and Africa. While one in eight people live in 33 megacities worldwide, close to half of the world’s urban dwellers reside in much smaller settlements with fewer than 500,000 inhabitants.About the dataThe 2018 Revision of the World Urbanization Prospects is published by the Population Division of the United Nations Department of Economic and Social Affairs (UN DESA). It has been issued regularly since 1988 with revised estimates and projections of the urban and rural populations for all countries of the world, and of their major urban agglomerations. The data set and related materials are available at: https://esa.un.org/unpd/wup/
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe African Cities Population Database (ACPD) has been produced by the Birkbeck College of the University of London in 1990 at the request of the United Nations Environment Programme (UNEP) in Nairobi, Kenya. The database contains head counts for 479 cities in Africa which either have a population of over 20,000 or are capitals of their nation state. Listed are the geographical location of the cities and their population sizes. The material is primarily derived from a 1988 report of the Economic Commission for Africa (ECA) and several issues of the United Nations Demographic Yearbook (1973-81). Severe problems were found with several countries such as Togo, Ghana and South Africa. For South Africa, the data were derived from the United Nations Demographic Yearbook 1987.
WCPD is an Arc/Info point coverage. It has no projection, as the cities are located on the basis of their latitude and longitude. Coordinates were assigned on the basis of gazetteers or African maps. Each record in the data base contains details of the city name, country name, latitude and longitude of the city, and its population at a defined time. The Arc/Info attribute table contains the following fields:
AREA Arc/Info item PERIMETER Arc/Info item ACPD# Arc/Info item ACPD-ID Arc/Info item ID-NUM Unique number for each city CITY City name COUNTRY Country name CITY-POP Population of city proper YEAR Latest available year of collection
ACPD comes as an Arc/Info EXPORT file originally called "ACPD.E00" and contains 67 Kb of data. The file has a record length of 80 and a block size of 8000 (blocking factor = 100). The file can be read from tape using Arc/Info's TAPEREAD command or any other generic copy utility. If distributed on a diskette it can be read using the ordinary DOS 'COPY' command. The file has to be converted to Arc/Info internal format using its IMPORT command.
References to the WCPD data set can be found in:
The source of the WCPD data set as held by GRID is Birkbeck College, University of London, Department of Geography, London, UK.