I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.
Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.
Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.
https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">
You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.
Below is the code that I used to scrape the code from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">
Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.
As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.
Key Features
Country: Name of the country.
Density (P/Km2): Population density measured in persons per square kilometer.
Abbreviation: Abbreviation or code representing the country.
Agricultural Land (%): Percentage of land area used for agricultural purposes.
Land Area (Km2): Total land area of the country in square kilometers.
Armed Forces Size: Size of the armed forces in the country.
Birth Rate: Number of births per 1,000 population per year.
Calling Code: International calling code for the country.
Capital/Major City: Name of the capital or major city.
CO2 Emissions: Carbon dioxide emissions in tons.
CPI: Consumer Price Index, a measure of inflation and purchasing power.
CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
Currency_Code: Currency code used in the country.
Fertility Rate: Average number of children born to a woman during her lifetime.
Forested Area (%): Percentage of land area covered by forests.
Gasoline_Price: Price of gasoline per liter in local currency.
GDP: Gross Domestic Product, the total value of goods and services produced in the country.
Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
Largest City: Name of the country's largest city.
Life Expectancy: Average number of years a newborn is expected to live.
Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
Minimum Wage: Minimum wage level in local currency.
Official Language: Official language(s) spoken in the country.
Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
Physicians per Thousand: Number of physicians per thousand people.
Population: Total population of the country.
Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
Tax Revenue (%): Tax revenue as a percentage of GDP.
Total Tax Rate: Overall tax burden as a percentage of commercial profits.
Unemployment Rate: Percentage of the labor force that is unemployed.
Urban Population: Percentage of the population living in urban areas.
Latitude: Latitude coordinate of the country's location.
Longitude: Longitude coordinate of the country's location.
Potential Use Cases
Analyze population density and land area to study spatial distribution patterns.
Investigate the relationship between agricultural land and food security.
Examine carbon dioxide emissions and their impact on climate change.
Explore correlations between economic indicators such as GDP and various socio-economic factors.
Investigate educational enrollment rates and their implications for human capital development.
Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
Study labor market dynamics through indicators such as labor force participation and unemployment rates.
Investigate the role of taxation and its impact on economic development.
Explore urbanization trends and their social and environmental consequences.
The United Nations Energy Statistics Database (UNSTAT) is a comprehensive collection of international energy and demographic statistics prepared by the United Nations Statistics Division. The 2004 version represents the latest in the series of annual compilations which commenced under the title World Energy Supplies in Selected Years, 1929-1950. Supplementary series of monthly and quarterly data on production of energy may be found in the Monthly Bulletin of Statistics. The database contains comprehensive energy statistics for more than 215 countries or areas for production, trade and intermediate and final consumption (end-use) for primary and secondary conventional, non-conventional and new and renewable sources of energy. Mid-year population estimates are included to enable the computation of per capita data. Annual questionnaires sent to national statistical offices serve as the primary source of information. Supplementary data are also compiled from national, regional and international statistical publications. The Statistics Division prepares estimates where official data are incomplete or inconsistent. The database is updated on a continuous basis as new information and revisions are received. This metadata file represents the population statistics during the expressed time. For more information about the country site codes, click this link to the United Nations "Standard country or area codes for statistical use": https://unstats.un.org/unsd/methodology/m49/overview/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for POPULATION reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive list of countries and dependent territories worldwide, along with their most recent population estimates.The data is sourced from the Wikipedia page List of countries and dependencies by population, which compiles figures from national statistical offices and the United Nations Population Division
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 21 November 2021.
--- Dataset description provided by original source is as follows ---
I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.
Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.
Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.
https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">
You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.
Below is the code that I used to scrape the code from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">
Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.
As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting
--- Original source retains full ownership of the source dataset ---
Midyear population estimates and projections for all countries and areas of the world with a population of 5,000 or more // Source: U.S. Census Bureau, Population Division, International Programs Center // Note: Total population available from 1950 to 2100 for 227 countries and areas. Other demographic variables available from base year to 2100. Base year varies by country and therefore data are not available for all years for all countries. See methodology at https://www.census.gov/programs-surveys/international-programs/about/idb.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a hybrid gridded dataset of demographic data for the world, given as 5-year population bands at a 0.5 degree grid resolution.
This dataset combines the NASA SEDAC Gridded Population of the World version 4 (GPWv4) with the ISIMIP Histsoc gridded population data and the United Nations World Population Program (WPP) demographic modelling data.
Demographic fractions are given for the time period covered by the UN WPP model (1950-2050) while demographic totals are given for the time period covered by the combination of GPWv4 and Histsoc (1950-2020)
Method - demographic fractions
Demographic breakdown of country population by grid cell is calculated by combining the GPWv4 demographic data given for 2010 with the yearly country breakdowns from the UN WPP. This combines the spatial distribution of demographics from GPWv4 with the temporal trends from the UN WPP. This makes it possible to calculate exposure trends from 1980 to the present day.
To combine the UN WPP demographics with the GPWv4 demographics, we calculate for each country the proportional change in fraction of demographic in each age band relative to 2010 as:
(\delta_{year,\ country,age}^{\text{wpp}} = f_{year,\ country,age}^{\text{wpp}}/f_{2010,country,age}^{\text{wpp}})
Where:
(\delta_{year,\ country,age}^{\text{wpp}}) is the ratio of change in demographic for a given age and and country from the UN WPP dataset.
(f_{year,\ country,age}^{\text{wpp}}) is the fraction of population in the UN WPP dataset for a given age band, country, and year.
(f_{2010,country,age}^{\text{wpp}}) is the fraction of population in the UN WPP dataset for a given age band, country for the year 2020.
The gridded demographic fraction is then calculated relative to the 2010 demographic data given by GPWv4.
For each subset of cells corresponding to a given country c, the fraction of population in a given age band is calculated as:
(f_{year,c,age}^{\text{gpw}} = \delta_{year,\ country,age}^{\text{wpp}}*f_{2010,c,\text{age}}^{\text{gpw}})
Where:
(f_{year,c,age}^{\text{gpw}}) is the fraction of the population in a given age band for given year, for the grid cell c.
(f_{2010,c,age}^{\text{gpw}}) is the fraction of the population in a given age band for 2010, for the grid cell c.
The matching between grid cells and country codes is performed using the GPWv4 gridded country code lookup data and country name lookup table. The final dataset is assembled by combining the cells from all countries into a single gridded time series. This time series covers the whole period from 1950-2050, corresponding to the data available in the UN WPP model.
Method - demographic totals
Total population data from 1950 to 1999 is drawn from ISIMIP Histsoc, while data from 2000-2020 is drawn from GPWv4. These two gridded time series are simply joined at the cut-over date to give a single dataset covering 1950-2020.
The total population per age band per cell is calculated by multiplying the population fractions by the population totals per grid cell.
Note that as the total population data only covers until 2020, the time span covered by the demographic population totals data is 1950-2020 (not 1950-2050).
Disclaimer
This dataset is a hybrid of different datasets with independent methodologies. No guarantees are made about the spatial or temporal consistency across dataset boundaries. The dataset may contain outlier points (e.g single cells with demographic fractions >1). This dataset is produced on a 'best effort' basis and has been found to be broadly consistent with other approaches, but may contain inconsistencies which not been identified.
Population of Urban Agglomerations with 300,000 Inhabitants or more in 2014, by city, 1950-2030 (thousands). Data for 1,692 cities contained in the Excel file. Note: Each country has its own definition of what is 'urban' and therefore use exercise caution when comparing cities in different countries. Data available from the United Nations, Department of Economic and Social Affairs, Population Division (2014). World Urbanization Prospects: The 2014 Revision, CD-ROM Edition. Further detail of population estimates, land area, and population density for world urban areas with over 500,000 people (924 areas) is available with Demographia's World Urban Areas report (2014). Much of this data is based on the UN urban agglomerations, though a range of other sources are also used.
The Africa Population Distribution Database provides decadal population density data for African administrative units for the period 1960-1990. The databsae was prepared for the United Nations Environment Programme / Global Resource Information Database (UNEP/GRID) project as part of an ongoing effort to improve global, spatially referenced demographic data holdings. The database is useful for a variety of applications including strategic-level agricultural research and applications in the analysis of the human dimensions of global change.
This documentation describes the third version of a database of administrative units and associated population density data for Africa. The first version was compiled for UNEP's Global Desertification Atlas (UNEP, 1997; Deichmann and Eklundh, 1991), while the second version represented an update and expansion of this first product (Deichmann, 1994; WRI, 1995). The current work is also related to National Center for Geographic Information and Analysis (NCGIA) activities to produce a global database of subnational population estimates (Tobler et al., 1995), and an improved database for the Asian continent (Deichmann, 1996). The new version for Africa provides considerably more detail: more than 4700 administrative units, compared to about 800 in the first and 2200 in the second version. In addition, for each of these units a population estimate was compiled for 1960, 70, 80 and 90 which provides an indication of past population dynamics in Africa. Forthcoming are population count data files as download options.
African population density data were compiled from a large number of heterogeneous sources, including official government censuses and estimates/projections derived from yearbooks, gazetteers, area handbooks, and other country studies. The political boundaries template (PONET) of the Digital Chart of the World (DCW) was used delineate national boundaries and coastlines for African countries.
For more information on African population density and administrative boundary data sets, see metadata files at [http://na.unep.net/datasets/datalist.php3] which provide information on file identification, format, spatial data organization, distribution, and metadata reference.
References:
Deichmann, U. 1994. A medium resolution population database for Africa, Database documentation and digital database, National Center for Geographic Information and Analysis, University of California, Santa Barbara.
Deichmann, U. and L. Eklundh. 1991. Global digital datasets for land degradation studies: A GIS approach, GRID Case Study Series No. 4, Global Resource Information Database, United Nations Environment Programme, Nairobi.
UNEP. 1997. World Atlas of Desertification, 2nd Ed., United Nations Environment Programme, Edward Arnold Publishers, London.
WRI. 1995. Africa data sampler, Digital database and documentation, World Resources Institute, Washington, D.C.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This Dataset contains details of World Population by country. According to the worldometer, the current population of the world is 8.2 billion people. Highest populated country is India followed by China and USA.
Attribute Information
Acknowledgements
https://www.worldometers.info/world-population/population-by-country/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Town And Country population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Town And Country across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of Town And Country was 11,553, a 0.28% decrease year-by-year from 2022. Previously, in 2022, Town And Country population was 11,585, a decline of 0.46% compared to a population of 11,638 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Town And Country increased by 600. In this period, the peak population was 11,644 in the year 2020. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Town And Country Population by Year. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for POPULATION reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
This dataset shows different breakdowns of London's resident population by their country of birth. Data used comes from ONS' Annual Population Survey (APS). The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. 95% confidence interval levels are provided. Numbers have been rounded to the nearest thousand and figures for smaller populations have been suppressed. Four files are available for download: Country of Birth - Borough: Shows country of birth estimates in their broad groups such as European Union, South East Asia, North Africa, etc. broken down to borough level. Detailed Country of Birth - London: Shows country of birth estimates for specific countries such as France, Bangladesh, Nigeria, etc. available for London as a whole Demography Update 09-2015: A GLA Demography report that uses APS data to analyse the trends in London for the period 2004 to 2014. A supporting data file is also provided. Country of Birth Borough 2004-2016 Analysis Tool: A tool produced by GLA Demography that allows users to explore different breakdowns of country of birth data. An accompanying Tableau visualisation tool has also been produced which maps data from 2004 to 2015. Nationality data can be found here: https://data.london.gov.uk/dataset/nationality Nationality refers to that stated by the respondent during the interview. Country of birth is the country in which they were born. It is possible that an individual’s nationality may change, but the respondent’s country of birth cannot change. This means that country of birth gives a more robust estimate of change over time. Data and Resources Country of Birth - Borough Shows estimates of the population by their country/region of birth by Borough
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Hill Country Village population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Hill Country Village across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of Hill Country Village was 946, a 0.21% increase year-by-year from 2021. Previously, in 2021, Hill Country Village population was 944, a decline of 0.11% compared to a population of 945 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Hill Country Village decreased by 78. In this period, the peak population was 1,130 in the year 2009. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Hill Country Village Population by Year. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population is a key indicator of the size and growth of a country's economy and society. The population of a country can influence a range of economic, social, and political factors, including resource availability, demographic trends, and political representation. Accurate and up-to-date population data is essential for effective policy planning and decision-making.
The population numbers per country dataset provides a comprehensive overview of the population of each country. The dataset includes information on the total population, population density, population growth rates, and other related metrics, covering all countries in the world. It is compiled from various sources, including national statistical agencies, the United Nations Population Division, and other relevant data sources.
The population numbers per country dataset can be used by researchers, policymakers, and the general public to gain insight into the size and growth of different populations and to compare the relative levels of population across the world. It can also be used to monitor changes in population size and demographic trends over time and to evaluate the effectiveness of policies and strategies aimed at managing population growth and promoting sustainable development.
Overall, the population numbers per country dataset is an important resource for understanding the dynamics of population growth and for developing policies and strategies that promote sustainable economic and social development for all.
This dataset contains population and population density data from the world bank. The world bank has accurate data from the year 1950, and this data set contains projections from the year 2021 onwards. (see my notebook for more) This dataset also contains the female and male population spilts.
Thanks to the world bank: https://data.worldbank.org/indicator/SP.POP.TOTL
This is a very simple data set aimed at users who wan to get involved with cleaning and visualisations data in python/pandas. See my code for inspiration.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Brazos Country population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Brazos Country across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of Brazos Country was 518, a 1.57% increase year-by-year from 2021. Previously, in 2021, Brazos Country population was 510, a decline of 1.16% compared to a population of 516 in 2020. Over the last 20 plus years, between 2000 and 2022, population of Brazos Country increased by 181. In this period, the peak population was 518 in the year 2022. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Brazos Country Population by Year. You can refer the same here
This dataset shows different breakdowns of London's resident population by their nationality. Data used comes from ONS' Annual Population Survey (APS). The APS has a sample of around 320,000 people in the UK (around 28,000 in London). As such all figures must be treated with some caution. 95% confidence interval levels are provided. Numbers have been rounded to the nearest thousand and figures for smaller populations have been suppressed. Two files are available to download: Nationality - Borough: Shows nationality estimates in their broad groups such as European Union, South East Asia, North Africa, etc. broken down to borough level. Detailed Nationality - London: Shows nationality estimates for specific countries such as France, Bangladesh, Nigeria, etc. available for London as a whole. A Tableau visualisation tool is also available. Country of Birth data can be found here: https://data.london.gov.uk/dataset/country-of-birth Nationality refers to that stated by the respondent during the interview. Country of birth is the country in which they were born. It is possible that an individual’s nationality may change, but the respondent’s country of birth cannot change. This means that country of birth gives a more robust estimate of change over time.
I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.
Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.
Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.
https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">
You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.
Below is the code that I used to scrape the code from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">
Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.
As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting