South Asia is one of the most densely populated regions in the world. This dataset comprehensively collects historical materials related to the population of South Asia and previous research results (see data description documents and references for details), carefully examines and estimates the population of South Asia (now India, Pakistan, Nepal, Bangladesh) from 640 to 1801 AD, and connects it with the population census data of British India from 1871 to 1941 (Nepal's data comes from Nepal's census data) and the United Nations World Population Prospects data from 1950 to 2020, obtaining the population of South Asia for a total of 22 periods (640, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1595, 1750, 1801, 1871, 1901, 1921, 1941, 1960, 1980, 2000, 2010, 2020) from 640 to 2020. Next, based on geographic detectors, select the dominant environmental factors that affect the spatial distribution of population, collect historical data on the distribution of residential areas (see data description document and references for details), and use a random forest regression model to spatialize the population size. On the basis of excluding uninhabited areas such as water bodies, glaciers, and bare/unused land, and determining the maximum historical population distribution range, a 1km resolution population dataset for South Asia from 640 to 2020 was developed. The leave one method was used to test the model, and the variance explained was 0.81, indicating good model accuracy. Compared with the existing HYDE historical population dataset, this study incorporates more historical materials and the latest research results in estimating the historical population; In using random forest regression for historical population spatial simulation, this study considers the changes in South Asian settlements over the past millennium, while the HYDE dataset only considers natural elements and considers them stable and unchanged. Therefore, this dataset is more reliable than the HYDE dataset and can more reasonably reveal the spatiotemporal characteristics of population changes in South Asia during historical periods. It is the basic data for the long-term evolution of human land relations, climate change attribution, and ecological protection research in South Asia.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
South Asia is one of the most densely populated regions in the world. This dataset comprehensively collects historical materials related to the population of South Asia and previous research results (see data description documents and references for details), carefully examines and estimates the population of South Asia (now India, Pakistan, Nepal, Bangladesh) from 640 to 1801 AD, and connects it with the population census data of British India from 1871 to 1941 (Nepal's data comes from Nepal's census data) and the United Nations World Population Prospects data from 1950 to 2020, obtaining the population of South Asia for a total of 22 periods (640, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1595, 1750, 1801, 1871, 1901, 1921, 1941, 1960, 1980, 2000, 2010, 2020) from 640 to 2020. Next, based on geographic detectors, select the dominant environmental factors that affect the spatial distribution of population, collect historical data on the distribution of residential areas (see data description document and references for details), and use a random forest regression model to spatialize the population size. On the basis of excluding uninhabited areas such as water bodies, glaciers, and bare/unused land, and determining the maximum historical population distribution range, a 1km resolution population dataset for South Asia from 640 to 2020 was developed. The leave one method was used to test the model, and the variance explained was 0.81, indicating good model accuracy. Compared with the existing HYDE historical population dataset, this study incorporates more historical materials and the latest research results in estimating the historical population; In using random forest regression for historical population spatial simulation, this study considers the changes in South Asian settlements over the past millennium, while the HYDE dataset only considers natural elements and considers them stable and unchanged. Therefore, this dataset is more reliable than the HYDE dataset and can more reasonably reveal the spatiotemporal characteristics of population changes in South Asia during historical periods. It is the basic data for the long-term evolution of human land relations, climate change attribution, and ecological protection research in South Asia.