100+ datasets found

Country metadata
kaggle.com
Updated May 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treich (2020). Country metadata [Dataset]. https://www.kaggle.com/datasets/treich/country-metadata/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Treich
Description
Context

This dataset simply combines publicly available data to characterise a country based on healthcare factors, economy, government and demographics.

Content

All data are given per 100.000 inhabitants where this is appropriate scores are given as absolute values and so are spending and demographics. Each row represents one country. Data that is included covers the following topics:

Healthcare: - Staff including: Nurses and Physicians per 100.000 inhabitants - Infrastructure including: Beds, Chnage of beds between 2018 and 2019 and the change of bed numbers since 2013, Intensive Care Unit (ICU) beds, ventilators and Extra Corporal Membrane Oxygenation (ECMO), machines per 100.000 inhabitants - Total spending on healthcare in US dollars per capita.

Demographics: - The median age for entire population and each gender - The percentage of the population within age brackets - Total population - Population per km2 - Population change between 2018 and 2019

Government The used scores are from the Economist intelligence unit and describe how democratic a country is and how the government works. These can be used to compare countries based on their government type.

Acknowledgements

All data is publicly available and just has been brought together in one place. The sources are:

Population densities

Populations

ECMO machines

Hospital beds

Number of physicians

Health expenditure per capita

Democracy indexes

Median age

OECD nurses per capita

Inspiration

These data are meant as metadata to decide which countries are comparable. I am working on healthcare data so the inspiration is to compare health statistics between countries and make an informed decision about how comparable they are. Could be used for any non healthcare related task as well.
COVID-19 cases and deaths per million in 210 countries as of July 13, 2022
statista.com
ai-chatbox.pro
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). COVID-19 cases and deaths per million in 210 countries as of July 13, 2022 [Dataset]. https://www.statista.com/statistics/1104709/coronavirus-deaths-worldwide-per-million-inhabitants/
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Based on a comparison of coronavirus deaths in 210 countries relative to their population, Peru had the most losses to COVID-19 up until July 13, 2022. As of the same date, the virus had infected over 557.8 million people worldwide, and the number of deaths had totaled more than 6.3 million. Note, however, that COVID-19 test rates can vary per country. Additionally, big differences show up between countries when combining the number of deaths against confirmed COVID-19 cases. The source seemingly does not differentiate between "the Wuhan strain" (2019-nCOV) of COVID-19, "the Kent mutation" (B.1.1.7) that appeared in the UK in late 2020, the 2021 Delta variant (B.1.617.2) from India or the Omicron variant (B.1.1.529) from South Africa.

The difficulties of death figures

This table aims to provide a complete picture on the topic, but it very much relies on data that has become more difficult to compare. As the coronavirus pandemic developed across the world, countries already used different methods to count fatalities, and they sometimes changed them during the course of the pandemic. On April 16, for example, the Chinese city of Wuhan added a 50 percent increase in their death figures to account for community deaths. These deaths occurred outside of hospitals and went unaccounted for so far. The state of New York did something similar two days before, revising their figures with 3,700 new deaths as they started to include “assumed” coronavirus victims. The United Kingdom started counting deaths in care homes and private households on April 29, adjusting their number with about 5,000 new deaths (which were corrected lowered again by the same amount on August 18). This makes an already difficult comparison even more difficult. Belgium, for example, counts suspected coronavirus deaths in their figures, whereas other countries have not done that (yet). This means two things. First, it could have a big impact on both current as well as future figures. On April 16 already, UK health experts stated that if their numbers were corrected for community deaths like in Wuhan, the UK number would change from 205 to “above 300”. This is exactly what happened two weeks later. Second, it is difficult to pinpoint exactly which countries already have “revised” numbers (like Belgium, Wuhan or New York) and which ones do not. One work-around could be to look at (freely accessible) timelines that track the reported daily increase of deaths in certain countries. Several of these are available on our platform, such as for Belgium, Italy and Sweden. A sudden large increase might be an indicator that the domestic sources changed their methodology.

Where are these numbers coming from?

The numbers shown here were collected by Johns Hopkins University, a source that manually checks the data with domestic health authorities. For the majority of countries, this is from national authorities. In some cases, like China, the United States, Canada or Australia, city reports or other various state authorities were consulted. In this statistic, these separately reported numbers were put together. For more information or other freely accessible content, please visit our dedicated Facts and Figures page.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+1more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Most surveilled countries worldwide 2022, by number of people affected
statista.com
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Most surveilled countries worldwide 2022, by number of people affected [Dataset]. https://www.statista.com/statistics/1290708/top-surveilled-countries-worldwide/
Explore at:
Dataset updated
Dec 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
Worldwide
Description
In 2022, China topped the list of surveilled countries worldwide. Nearly the whole population in the country was affected by the internet usage restrictions set by the government. India and Pakistan followed as the governmental authorities in these countries also put limitations on internet usage for their citizens.
World Population Statistics - 2023
kaggle.com
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). World Population Statistics - 2023 [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/world-population-statistics-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
The current US Census Bureau world population estimate in June 2019 shows that the current global population is 7,577,130,400 people on Earth, which far exceeds the world population of 7.2 billion in 2015. Our estimate based on UN data shows the world's population surpassing 7.7 billion.

China is the most populous country in the world with a population exceeding 1.4 billion. It is one of just two countries with a population of more than 1 billion, with India being the second. As of 2018, India has a population of over 1.355 billion people, and its population growth is expected to continue through at least 2050. By the year 2030, India is expected to become the most populous country in the world. This is because India’s population will grow, while China is projected to see a loss in population.

The following 11 countries that are the most populous in the world each have populations exceeding 100 million. These include the United States, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Japan, Ethiopia, and the Philippines. Of these nations, all are expected to continue to grow except Russia and Japan, which will see their populations drop by 2030 before falling again significantly by 2050.

Many other nations have populations of at least one million, while there are also countries that have just thousands. The smallest population in the world can be found in Vatican City, where only 801 people reside.

In 2018, the world’s population growth rate was 1.12%. Every five years since the 1970s, the population growth rate has continued to fall. The world’s population is expected to continue to grow larger but at a much slower pace. By 2030, the population will exceed 8 billion. In 2040, this number will grow to more than 9 billion. In 2055, the number will rise to over 10 billion, and another billion people won’t be added until near the end of the century. The current annual population growth estimates from the United Nations are in the millions - estimating that over 80 million new lives are added yearly.

This population growth will be significantly impacted by nine specific countries which are situated to contribute to the population growth more quickly than other nations. These nations include the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America. Particularly of interest, India is on track to overtake China's position as the most populous country by 2030. Additionally, multiple nations within Africa are expected to double their populations before fertility rates begin to slow entirely.

Content

In this Dataset, we have Historical Population data for every Country/Territory in the world by different parameters like Area Size of the Country/Territory, Name of the Continent, Name of the Capital, Density, Population Growth Rate, Ranking based on Population, World Population Percentage, etc. >Dataset Glossary (Column-Wise):

Rank: Rank by Population.

CCA3: 3 Digit Country/Territories Code.

Country/Territories: Name of the Country/Territories.

Capital: Name of the Capital.

Continent: Name of the Continent.

2022 Population: Population of the Country/Territories in the year 2022.

2020 Population: Population of the Country/Territories in the year 2020.

2015 Population: Population of the Country/Territories in the year 2015.

2010 Population: Population of the Country/Territories in the year 2010.

2000 Population: Population of the Country/Territories in the year 2000.

1990 Population: Population of the Country/Territories in the year 1990.

1980 Population: Population of the Country/Territories in the year 1980.

1970 Population: Population of the Country/Territories in the year 1970.

Area (km²): Area size of the Country/Territories in square kilometers.

Density (per km²): Population Density per square kilometer.

Growth Rate: Population Growth Rate by Country/Territories.

World Population Percentage: The population percentage by each Country/Territories.
Worldwide COVID-19 Data from WHO (2025 Edition)
kaggle.com
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Shamim (2025). Worldwide COVID-19 Data from WHO (2025 Edition) [Dataset]. https://www.kaggle.com/datasets/adilshamim8/worldwide-covid-19-data-from-who
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2025
Dataset provided by
Kaggle
Authors
Adil Shamim
Description
Dataset Overview

This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.

Source Information

Website: WHO COVID-19 Dashboard

Organization: World Health Organization (WHO)

Data Coverage: Global (by country/territory)

Time Period: Up to 2025

The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.

Dataset Contents

Country/Region: The name of the country or territory.

Date: Reporting date.

New Cases: Number of new confirmed COVID-19 cases.

Cumulative Cases: Total confirmed COVID-19 cases to date.

New Deaths: Number of new confirmed deaths due to COVID-19.

Cumulative Deaths: Total deaths reported to date.

Additional fields may include population, rates per 100,000, and more (see data files for details).

How to Use

This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting

Data Reliability

The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.

Acknowledgements

Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.
d
The Marshall Project: COVID Cases in Prisons
data.world
csv, zip
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2023). The Marshall Project: COVID Cases in Prisons [Dataset]. https://data.world/associatedpress/marshall-project-covid-cases-in-prisons
Explore at:
csv, zipAvailable download formats
Dataset updated
Apr 6, 2023
Authors
The Associated Press
Time period covered
Jul 31, 2019 - Aug 1, 2021
Description
Overview

The Marshall Project, the nonprofit investigative newsroom dedicated to the U.S. criminal justice system, has partnered with The Associated Press to compile data on the prevalence of COVID-19 infection in prisons across the country. The Associated Press is sharing this data as the most comprehensive current national source of COVID-19 outbreaks in state and federal prisons.

Lawyers, criminal justice reform advocates and families of the incarcerated have worried about what was happening in prisons across the nation as coronavirus began to take hold in the communities outside. Data collected by The Marshall Project and AP shows that hundreds of thousands of prisoners, workers, correctional officers and staff have caught the illness as prisons became the center of some of the country’s largest outbreaks. And thousands of people — most of them incarcerated — have died.

In December, as COVID-19 cases spiked across the U.S., the news organizations also shared cumulative rates of infection among prison populations, to better gauge the total effects of the pandemic on prison populations. The analysis found that by mid-December, one in five state and federal prisoners in the United States had tested positive for the coronavirus -- a rate more than four times higher than the general population.

This data, which is updated weekly, is an effort to track how those people have been affected and where the crisis has hit the hardest.

Methodology and Caveats

The data tracks the number of COVID-19 tests administered to people incarcerated in all state and federal prisons, as well as the staff in those facilities. It is collected on a weekly basis by Marshall Project and AP reporters who contact each prison agency directly and verify published figures with officials.

Each week, the reporters ask every prison agency for the total number of coronavirus tests administered to its staff members and prisoners, the cumulative number who tested positive among staff and prisoners, and the numbers of deaths for each group.

The time series data is aggregated to the system level; there is one record for each prison agency on each date of collection. Not all departments could provide data for the exact date requested, and the data indicates the date for the figures.

To estimate the rate of infection among prisoners, we collected population data for each prison system before the pandemic, roughly in mid-March, in April, June, July, August, September and October. Beginning the week of July 28, we updated all prisoner population numbers, reflecting the number of incarcerated adults in state or federal prisons. Prior to that, population figures may have included additional populations, such as prisoners housed in other facilities, which were not captured in our COVID-19 data. In states with unified prison and jail systems, we include both detainees awaiting trial and sentenced prisoners.

To estimate the rate of infection among prison employees, we collected staffing numbers for each system. Where current data was not publicly available, we acquired other numbers through our reporting, including calling agencies or from state budget documents. In six states, we were unable to find recent staffing figures: Alaska, Hawaii, Kentucky, Maryland, Montana, Utah.

To calculate the cumulative COVID-19 impact on prisoner and prison worker populations, we aggregated prisoner and staff COVID case and death data up through Dec. 15. Because population snapshots do not account for movement in and out of prisons since March, and because many systems have significantly slowed the number of new people being sent to prison, it’s difficult to estimate the total number of people who have been held in a state system since March. To be conservative, we calculated our rates of infection using the largest prisoner population snapshots we had during this time period.

As with all COVID-19 data, our understanding of the spread and impact of the virus is limited by the availability of testing. Epidemiology and public health experts say that aside from a few states that have recently begun aggressively testing in prisons, it is likely that there are more cases of COVID-19 circulating undetected in facilities. Sixteen prison systems, including the Federal Bureau of Prisons, would not release information about how many prisoners they are testing.

Corrections departments in Indiana, Kansas, Montana, North Dakota and Wisconsin report coronavirus testing and case data for juvenile facilities; West Virginia reports figures for juvenile facilities and jails. For consistency of comparison with other state prison systems, we removed those facilities from our data that had been included prior to July 28. For these states we have also removed staff data. Similarly, Pennsylvania’s coronavirus data includes testing and cases for those who have been released on parole. We removed these tests and cases for prisoners from the data prior to July 28. The staff cases remain.

About the Data

There are four tables in this data:

covid_prison_cases.csv contains weekly time series data on tests, infections and deaths in prisons. The first dates in the table are on March 26. Any questions that a prison agency could not or would not answer are left blank.

prison_populations.csv contains snapshots of the population of people incarcerated in each of these prison systems for whom data on COVID testing and cases are available. This varies by state and may not always be the entire number of people incarcerated in each system. In some states, it may include other populations, such as those on parole or held in state-run jails. This data is primarily for use in calculating rates of testing and infection, and we would not recommend using these numbers to compare the change in how many people are being held in each prison system.

staff_populations.csv contains a one-time, recent snapshot of the headcount of workers for each prison agency, collected as close to April 15 as possible.

covid_prison_rates.csv contains the rates of cases and deaths for prisoners. There is one row for every state and federal prison system and an additional row with the National totals.

Queries

The Associated Press and The Marshall Project have created several queries to help you use this data:

Get your state's prison COVID data: Provides each week's data from just your state and calculates a cases-per-100000-prisoners rate, a deaths-per-100000-prisoners rate, a cases-per-100000-workers rate and a deaths-per-100000-workers rate here

Rank all systems' most recent data by cases per 100,000 prisoners here

Find what percentage of your state's total cases and deaths -- as reported by Johns Hopkins University -- occurred within the prison system here

Attribution

In stories, attribute this data to: “According to an analysis of state prison cases by The Marshall Project, a nonprofit investigative newsroom dedicated to the U.S. criminal justice system, and The Associated Press.”

Contributors

Many reporters and editors at The Marshall Project and The Associated Press contributed to this data, including: Katie Park, Tom Meagher, Weihua Li, Gabe Isman, Cary Aspinwall, Keri Blakinger, Jake Bleiberg, Andrew R. Calderón, Maurice Chammah, Andrew DeMillo, Eli Hager, Jamiles Lartey, Claudia Lauer, Nicole Lewis, Humera Lodhi, Colleen Long, Joseph Neff, Michelle Pitcher, Alysia Santo, Beth Schwartzapfel, Damini Sharma, Colleen Slevin, Christie Thompson, Abbie VanSickle, Adria Watson, Andrew Welsh-Huggins.

Questions

If you have questions about the data, please email The Marshall Project at info+covidtracker@themarshallproject.org or file a Github issue.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Leading countries by number of data centers 2025
statista.com
ai-chatbox.pro
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading countries by number of data centers 2025 [Dataset]. https://www.statista.com/statistics/1228433/data-centers-worldwide-by-country/
Explore at:
Dataset updated
Mar 21, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2025
Area covered
Worldwide
Description
As of March 2025, there were a reported 5,426 data centers in the United States, the most of any country worldwide. A further 529 were located in Germany, while 523 were located in the United Kingdom. What is a data center? A data center is a network of computing and storage resources that enables the delivery of shared software applications and data. These facilities can house large amounts of critical and important data, and therefore are vital to the daily functions of companies and consumers alike. As a result, whether it is a cloud, colocation, or managed service, data center real estate will have increasing importance worldwide. Hyperscale data centers In the past, data centers were highly controlled physical infrastructures, but the cloud has since changed that model. A cloud data service is a remote version of a data center – located somewhere away from a company's physical premises. Cloud IT infrastructure spending has grown and is forecast to rise further in the coming years. The evolution of technology, along with the rapid growth in demand for data across the globe, is largely driven by the leading hyperscale data center providers.
P
Data from: Coastal proximity of populations in 22 Pacific Island Countries...
pacificdata.org
kiribati-data.sprep.org
+14more
html, pdf, xlsx
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pacific Data Hub (2022). Coastal proximity of populations in 22 Pacific Island Countries and Territories [Dataset]. https://pacificdata.org/data/dataset/coastal-proximity-of-populations-in-22-pacific-island-countries9f18b716-d636-4e9a-b628-0e5cd9757dec
Explore at:
pdf, xlsx, htmlAvailable download formats
Dataset updated
Feb 11, 2022
Dataset provided by
Pacific Data Hub
License
https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588
Description
A recently published paper, titled “Coastal proximity of populations in 22 Pacific Island Countries and Territories” details the methodology used to undertake the analysis and presents the findings. Purpose * This analysis aims to estimate populations settled in coastal areas in 22 Pacific Island Countries and Territories (PICTS) using the data currently available. In addition to the coastal population estimates, the study compares the results obtained from the use of national population datasets (census) with those derived from the use of global population grids. * Accuracy and reliability from national and global datasets derived results have been evaluated to identify the most suitable options to estimate size and location of coastal populations in the region. A collaborative project between the Pacific Community (SPC), WorldFish and the University of Wollongong has produced the first detailed population estimates of people living close to the coast in the 22 Pacific Island Countries and Territories (PICTs).
d
Geo-Refugee: A Refugee Location Dataset
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fisk, Kerstin C. (2023). Geo-Refugee: A Refugee Location Dataset [Dataset]. http://doi.org/10.7910/DVN/25952
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/25952
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Fisk, Kerstin C.
Time period covered
Jan 1, 2000 - Jan 1, 2010
Description
The refugee location data (Geo-Refugee) provides information on the geographical locations, population sizes and accommodation types of refugees and people in refugee-like situations throughout Africa. Based on the United Nations High Commissioner for Refugees' Location and Demographic Composition data as well as information contained in supplemental UNHCR resources, Geo-Refugee assigns administrative unit names and geographic coordinates to refugee camps/ centers, and locations hosting dispersed (self-settled) refugees. Geo-Refugee was collected for the purpose of investigating the relationship between refugees and armed conflict, but can be used for a number of refugee-related studies. The original data for the category refugees and people in a refugee-like situation by accommodation type and location name comes directly from the UNHCR. The category refugees includes: "individuals recognized under the 1951 Convention relating to the Status of Refugees and its 1967 Protocol; the 1969 OAU Convention Governing the Specific Aspects of Refugee Problems in Africa; those recognized in accordance with the UNHCR statute; individuals granted complementary forms of protection and those enjoying temporary protection.The category people in a refugee-like situation "is descriptive in nature and includes groups of people who are outside their country of origin and who face protection risks similar to those of refugees, but for whom refugee status has, for practical or other reasons, not been ascertained" (UNHCR http://www.unhcr.org/45c06c662.html). The unit of the data is the first-level administrative unit (province, region or state). A refugee location is defined as a unit with a known refugee population, as established by UNHCR country offices. The locations data was compiled using statistics provided by the UNHCR Division of Programme Support and Management. Several of the refugee sites in the original UNHCR data are camp names or other lo cations which are not immediately traceable to a particular location using even the most established geographical databases like that of the National Geospatial Intelligence Agency (NGA). Thus, unit-level location of refugees was established and confirmed using supplementary resources including reports, maps, and policy documents compiled by the UNHCR and contained in the Refworld database (see http://www.unhcr.org/cgi-bin/texis/vtx/refworld/rwmain). Refworld was the primary database used for this project. Geographic coordinates were assigned using the database of the National Geospatial-Intelligence Agency. See https://www1.nga.mil/Pages/default.aspx for more information. All attempts were made to find precise coordinates, including cross-referencing with Google Maps. The current version of the data covers 43 African countries and encompasses the period 2000 to 2010. The UNHCR began systematically collecting information on the locations and demographic compositions of refugee populations in 2000.
Population and Population Density Dataset.
kaggle.com
Updated Jul 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zoraiz Azeem (2021). Population and Population Density Dataset. [Dataset]. https://www.kaggle.com/zoraizazeem/population-and-population-density-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2021
Dataset provided by
Kaggle
Authors
Zoraiz Azeem
Description
Content

This dataset contains population and population density data from the world bank. The world bank has accurate data from the year 1950, and this data set contains projections from the year 2021 onwards. (see my notebook for more) This dataset also contains the female and male population spilts.

Acknowledgements

Thanks to the world bank: https://data.worldbank.org/indicator/SP.POP.TOTL

Inspiration

This is a very simple data set aimed at users who wan to get involved with cleaning and visualisations data in python/pandas. See my code for inspiration.
d
Africa Population Distribution Database
search.dataone.org
Updated Nov 17, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deichmann, Uwe; Nelson, Andy (2014). Africa Population Distribution Database [Dataset]. https://search.dataone.org/view/Africa_Population_Distribution_Database.xml
Explore at:
Dataset updated
Nov 17, 2014
Dataset provided by
Regional and Global Biogeochemical Dynamics Data (RGD)
Authors
Deichmann, Uwe; Nelson, Andy
Time period covered
Jan 1, 1960 - Dec 31, 1997
Area covered

Description
The Africa Population Distribution Database provides decadal population density data for African administrative units for the period 1960-1990. The databsae was prepared for the United Nations Environment Programme / Global Resource Information Database (UNEP/GRID) project as part of an ongoing effort to improve global, spatially referenced demographic data holdings. The database is useful for a variety of applications including strategic-level agricultural research and applications in the analysis of the human dimensions of global change.

This documentation describes the third version of a database of administrative units and associated population density data for Africa. The first version was compiled for UNEP's Global Desertification Atlas (UNEP, 1997; Deichmann and Eklundh, 1991), while the second version represented an update and expansion of this first product (Deichmann, 1994; WRI, 1995). The current work is also related to National Center for Geographic Information and Analysis (NCGIA) activities to produce a global database of subnational population estimates (Tobler et al., 1995), and an improved database for the Asian continent (Deichmann, 1996). The new version for Africa provides considerably more detail: more than 4700 administrative units, compared to about 800 in the first and 2200 in the second version. In addition, for each of these units a population estimate was compiled for 1960, 70, 80 and 90 which provides an indication of past population dynamics in Africa. Forthcoming are population count data files as download options.

African population density data were compiled from a large number of heterogeneous sources, including official government censuses and estimates/projections derived from yearbooks, gazetteers, area handbooks, and other country studies. The political boundaries template (PONET) of the Digital Chart of the World (DCW) was used delineate national boundaries and coastlines for African countries.

For more information on African population density and administrative boundary data sets, see metadata files at [http://na.unep.net/datasets/datalist.php3] which provide information on file identification, format, spatial data organization, distribution, and metadata reference.

References:

Deichmann, U. 1994. A medium resolution population database for Africa, Database documentation and digital database, National Center for Geographic Information and Analysis, University of California, Santa Barbara.

Deichmann, U. and L. Eklundh. 1991. Global digital datasets for land degradation studies: A GIS approach, GRID Case Study Series No. 4, Global Resource Information Database, United Nations Environment Programme, Nairobi.

UNEP. 1997. World Atlas of Desertification, 2nd Ed., United Nations Environment Programme, Edward Arnold Publishers, London.

WRI. 1995. Africa data sampler, Digital database and documentation, World Resources Institute, Washington, D.C.
w
Afrobarometer Survey 2002-2004, Merged Round 2 Data (16 Countries) -...
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Apr 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute for Democracy in South Africa (IDASA) (2021). Afrobarometer Survey 2002-2004, Merged Round 2 Data (16 Countries) - Botswana, Cabo Verde, Ghana, Kenya, Lesotho, Mali, Mozambique, Malawi, Namibia, Nigeria, Senegal, Tanzania, Uganda, South Africa, Zambia, Zimbabwe [Dataset]. https://microdata.worldbank.org/index.php/catalog/886
Explore at:
Dataset updated
Apr 27, 2021
Dataset provided by
Institute for Democracy in South Africa (IDASA)
Ghana Centre for Democratic Development (CDD-Ghana)
Michigan State University (MSU)
Time period covered
2002 - 2004
Area covered
Senegal, Nigeria, Ghana, Cabo Verde, Mali, Mozambique, Malawi, Namibia, Lesotho, Botswana
Description
Abstract

The Afrobarometer project assesses attitudes and public opinion on democracy, markets, and civil society in several sub-Saharan African.This dataset was compiled from the studies in Round 2 of the Afrobarometer, conducted from 2002-2004 in 16 countries, including Botswana, Cape Verde, Ghana, Kenya, Lesotho, Malawi, Mali, Mozambique, Namibia, Nigeria, Senegal, South Africa, Tanzania, Uganda, Zambia, and Zimbabwe

Geographic coverage

The Round 2 Afrobarometer surveys have national coverage for the following countries: Botswana, Ghana, Kenya, Lesotho, Malawi, Mali, Mozambique, Namibia, Nigeria, Republic of Cabo Verde, Senegal, South Africa, Tanzania, Uganda, Zambia, Zimbabwe.

Analysis unit

Individuals

Universe

The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.

What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.

Kind of data

Sample survey data [ssd]

Sampling procedure

Afrobarometer uses national probability samples designed to meet the following criteria. Samples are designed to generate a sample that is a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of being selected for an interview. They achieve this by:

• using random selection methods at every stage of sampling; • sampling at all stages with probability proportionate to population size wherever possible to ensure that larger (i.e., more populated) geographic units have a proportionally greater probability of being chosen into the sample.

The sampling universe normally includes all citizens age 18 and older. As a standard practice, we exclude people living in institutionalized settings, such as students in dormitories, patients in hospitals, and persons in prisons or nursing homes. Occasionally, we must also exclude people living in areas determined to be inaccessible due to conflict or insecurity. Any such exclusion is noted in the technical information report (TIR) that accompanies each data set.

Sample size and design Samples usually include either 1,200 or 2,400 cases. A randomly selected sample of n=1200 cases allows inferences to national adult populations with a margin of sampling error of no more than +/-2.8% with a confidence level of 95 percent. With a sample size of n=2400, the margin of error decreases to +/-2.0% at 95 percent confidence level.

The sample design is a clustered, stratified, multi-stage, area probability sample. Specifically, we first stratify the sample according to the main sub-national unit of government (state, province, region, etc.) and by urban or rural location.

Area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. Afrobarometer occasionally purposely oversamples certain populations that are politically significant within a country to ensure that the size of the sub-sample is large enough to be analysed. Any oversamples is noted in the TIR.

Sample stages Samples are drawn in either four or five stages:

Stage 1: In rural areas only, the first stage is to draw secondary sampling units (SSUs). SSUs are not used in urban areas, and in some countries they are not used in rural areas. See the TIR that accompanies each data set for specific details on the sample in any given country. Stage 2: We randomly select primary sampling units (PSU). Stage 3: We then randomly select sampling start points. Stage 4: Interviewers then randomly select households. Stage 5: Within the household, the interviewer randomly selects an individual respondent. Each interviewer alternates in each household between interviewing a man and interviewing a woman to ensure gender balance in the sample.

To keep the costs and logistics of fieldwork within manageable limits, eight interviews are clustered within each selected PSU.

Data weights For some national surveys, data are weighted to correct for over or under-sampling or for household size. "Withinwt" should be turned on for all national -level descriptive statistics in countries that contain this weighting variable. It is included as the last variable in the data set, with details described in the codebook. For merged data sets, "Combinwt" should be turned on for cross-national comparisons of descriptive statistics. Note: this weighting variable standardizes each national sample as if it were equal in size.

Further information on sampling protocols, including full details of the methodologies used for each stage of sample selection, can be found at https://afrobarometer.org/surveys-and-methods/sampling-principles

Mode of data collection

Face-to-face [f2f]

Research instrument

Certain questions in the questionnaires for the Afrobarometer 2 survey addressed country-specific issues, but many of the same questions were asked across surveys. Citizens of the 16 countries were asked questions about their economic and social situations, and their opinions were elicited on recent political and economic changes within their country.
World Bank: International Debt Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). World Bank: International Debt Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-debt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

Content

This dataset contains both national and regional debt statistics captured by over 200 economic indicators. Time series data is available for those indicators from 1970 to 2015 for reporting countries.

For more information, see the World Bank website.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_intl_debt

https://cloud.google.com/bigquery/public-data/world-bank-international-debt

Citation: The World Bank: International Debt Statistics

Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @till_indeman from Unplash.

Inspiration

What countries have the largest outstanding debt?

https://cloud.google.com/bigquery/images/outstanding-debt.png" alt="enter image description here"> https://cloud.google.com/bigquery/images/outstanding-debt.png
T
PERSONAL SAVINGS by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated May 28, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2017). PERSONAL SAVINGS by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/personal-savings
Explore at:
json, excel, xml, csvAvailable download formats
Dataset updated
May 28, 2017
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for PERSONAL SAVINGS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
COVID-19 Trends in Each Country
hub.arcgis.com
coronavirus-resources.esri.com
+2more
Updated Mar 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Urban Observatory by Esri (2020). COVID-19 Trends in Each Country [Dataset]. https://hub.arcgis.com/maps/a16bb8b137ba4d8bbe645301b80e5740
Explore at:
Dataset updated
Mar 27, 2020
Dataset provided by
Esrihttp://esri.com/
Authors
Urban Observatory by Esri
Area covered
Earth
Description
On March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data. For updated cases, deaths, and vaccine data please visit: World Health Organization (WHO)For more information, visit the Johns Hopkins Coronavirus Resource Center.COVID-19 Trends MethodologyOur goal is to analyze and present daily updates in the form of recent trends within countries, states, or counties during the COVID-19 global pandemic. The data we are analyzing is taken directly from the Johns Hopkins University Coronavirus COVID-19 Global Cases Dashboard, though we expect to be one day behind the dashboard’s live feeds to allow for quality assurance of the data.DOI: https://doi.org/10.6084/m9.figshare.125529863/7/2022 - Adjusted the rate of active cases calculation in the U.S. to reflect the rates of serious and severe cases due nearly completely dominant Omicron variant.6/24/2020 - Expanded Case Rates discussion to include fix on 6/23 for calculating active cases.6/22/2020 - Added Executive Summary and Subsequent Outbreaks sectionsRevisions on 6/10/2020 based on updated CDC reporting. This affects the estimate of active cases by revising the average duration of cases with hospital stays downward from 30 days to 25 days. The result shifted 76 U.S. counties out of Epidemic to Spreading trend and no change for national level trends.Methodology update on 6/2/2020: This sets the length of the tail of new cases to 6 to a maximum of 14 days, rather than 21 days as determined by the last 1/3 of cases. This was done to align trends and criteria for them with U.S. CDC guidance. The impact is areas transition into Controlled trend sooner for not bearing the burden of new case 15-21 days earlier.Correction on 6/1/2020Discussion of our assertion of an abundance of caution in assigning trends in rural counties added 5/7/2020. Revisions added on 4/30/2020 are highlighted.Revisions added on 4/23/2020 are highlighted.Executive SummaryCOVID-19 Trends is a methodology for characterizing the current trend for places during the COVID-19 global pandemic. Each day we assign one of five trends: Emergent, Spreading, Epidemic, Controlled, or End Stage to geographic areas to geographic areas based on the number of new cases, the number of active cases, the total population, and an algorithm (described below) that contextualize the most recent fourteen days with the overall COVID-19 case history. Currently we analyze the countries of the world and the U.S. Counties. The purpose is to give policymakers, citizens, and analysts a fact-based data driven sense for the direction each place is currently going. When a place has the initial cases, they are assigned Emergent, and if that place controls the rate of new cases, they can move directly to Controlled, and even to End Stage in a short time. However, if the reporting or measures to curtail spread are not adequate and significant numbers of new cases continue, they are assigned to Spreading, and in cases where the spread is clearly uncontrolled, Epidemic trend.We analyze the data reported by Johns Hopkins University to produce the trends, and we report the rates of cases, spikes of new cases, the number of days since the last reported case, and number of deaths. We also make adjustments to the assignments based on population so rural areas are not assigned trends based solely on case rates, which can be quite high relative to local populations.Two key factors are not consistently known or available and should be taken into consideration with the assigned trend. First is the amount of resources, e.g., hospital beds, physicians, etc.that are currently available in each area. Second is the number of recoveries, which are often not tested or reported. On the latter, we provide a probable number of active cases based on CDC guidance for the typical duration of mild to severe cases.Reasons for undertaking this work in March of 2020:The popular online maps and dashboards show counts of confirmed cases, deaths, and recoveries by country or administrative sub-region. Comparing the counts of one country to another can only provide a basis for comparison during the initial stages of the outbreak when counts were low and the number of local outbreaks in each country was low. By late March 2020, countries with small populations were being left out of the mainstream news because it was not easy to recognize they had high per capita rates of cases (Switzerland, Luxembourg, Iceland, etc.). Additionally, comparing countries that have had confirmed COVID-19 cases for high numbers of days to countries where the outbreak occurred recently is also a poor basis for comparison.The graphs of confirmed cases and daily increases in cases were fit into a standard size rectangle, though the Y-axis for one country had a maximum value of 50, and for another country 100,000, which potentially misled people interpreting the slope of the curve. Such misleading circumstances affected comparing large population countries to small population counties or countries with low numbers of cases to China which had a large count of cases in the early part of the outbreak. These challenges for interpreting and comparing these graphs represent work each reader must do based on their experience and ability. Thus, we felt it would be a service to attempt to automate the thought process experts would use when visually analyzing these graphs, particularly the most recent tail of the graph, and provide readers with an a resulting synthesis to characterize the state of the pandemic in that country, state, or county.The lack of reliable data for confirmed recoveries and therefore active cases. Merely subtracting deaths from total cases to arrive at this figure progressively loses accuracy after two weeks. The reason is 81% of cases recover after experiencing mild symptoms in 10 to 14 days. Severe cases are 14% and last 15-30 days (based on average days with symptoms of 11 when admitted to hospital plus 12 days median stay, and plus of one week to include a full range of severely affected people who recover). Critical cases are 5% and last 31-56 days. Sources:U.S. CDC. April 3, 2020 Interim Clinical Guidance for Management of Patients with Confirmed Coronavirus Disease (COVID-19). Accessed online. Initial older guidance was also obtained online. Additionally, many people who recover may not be tested, and many who are, may not be tracked due to privacy laws. Thus, the formula used to compute an estimate of active cases is: Active Cases = 100% of new cases in past 14 days + 19% from past 15-25 days + 5% from past 26-49 days - total deaths. On 3/17/2022, the U.S. calculation was adjusted to: Active Cases = 100% of new cases in past 14 days + 6% from past 15-25 days + 3% from past 26-49 days - total deaths. Sources: https://www.cdc.gov/mmwr/volumes/71/wr/mm7104e4.htm https://covid.cdc.gov/covid-data-tracker/#variant-proportions If a new variant arrives and appears to cause higher rates of serious cases, we will roll back this adjustment. We’ve never been inside a pandemic with the ability to learn of new cases as they are confirmed anywhere in the world. After reviewing epidemiological and pandemic scientific literature, three needs arose. We need to specify which portions of the pandemic lifecycle this map cover. The World Health Organization (WHO) specifies six phases. The source data for this map begins just after the beginning of Phase 5: human to human spread and encompasses Phase 6: pandemic phase. Phase six is only characterized in terms of pre- and post-peak. However, these two phases are after-the-fact analyses and cannot ascertained during the event. Instead, we describe (below) a series of five trends for Phase 6 of the COVID-19 pandemic.Choosing terms to describe the five trends was informed by the scientific literature, particularly the use of epidemic, which signifies uncontrolled spread. The five trends are: Emergent, Spreading, Epidemic, Controlled, and End Stage. Not every locale will experience all five, but all will experience at least three: emergent, controlled, and end stage.This layer presents the current trends for the COVID-19 pandemic by country (or appropriate level). There are five trends:Emergent: Early stages of outbreak. Spreading: Early stages and depending on an administrative area’s capacity, this may represent a manageable rate of spread. Epidemic: Uncontrolled spread. Controlled: Very low levels of new casesEnd Stage: No New cases These trends can be applied at several levels of administration: Local: Ex., City, District or County – a.k.a. Admin level 2State: Ex., State or Province – a.k.a. Admin level 1National: Country – a.k.a. Admin level 0Recommend that at least 100,000 persons be represented by a unit; granted this may not be possible, and then the case rate per 100,000 will become more important.Key Concepts and Basis for Methodology: 10 Total Cases minimum threshold: Empirically, there must be enough cases to constitute an outbreak. Ideally, this would be 5.0 per 100,000, but not every area has a population of 100,000 or more. Ten, or fewer, cases are also relatively less difficult to track and trace to sources. 21 Days of Cases minimum threshold: Empirically based on COVID-19 and would need to be adjusted for any other event. 21 days is also the minimum threshold for analyzing the “tail” of the new cases curve, providing seven cases as the basis for a likely trend (note that 21 days in the tail is preferred). This is the minimum needed to encompass the onset and duration of a normal case (5-7 days plus 10-14 days). Specifically, a median of 5.1 days incubation time, and 11.2 days for 97.5% of cases to incubate. This is also driven by pressure to understand trends and could easily be adjusted to 28 days. Source
VOTP Dataset
kaggle.com
Updated Apr 10, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sdorius (2017). VOTP Dataset [Dataset]. https://www.kaggle.com/datasets/sdorius/votpharm/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 10, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sdorius
Description
This is an integration of 10 independent multi-country, multi-region, multi-cultural social surveys fielded by Gallup International between 2000 and 2013. The integrated data file contains responses from 535,159 adults living in 103 countries. In total, the harmonization project combined 571 social surveys.

These data have value in a number of longitudinal multi-country, multi-regional, and multi-cultural (L3M) research designs. Understood as independent, though non-random, L3M samples containing a number of multiple indicator ASQ (ask same questions) and ADQ (ask different questions) measures of human development, the environment, international relations, gender equality, security, international organizations, and democracy, to name a few [see full list below].

The data can be used for exploratory and descriptive analysis, with greatest utility at low levels of resolution (e.g. nation-states, supranational groupings). Level of resolution in analysis of these data should be sufficiently low to approximate confidence intervals.

These data can be used for teaching 3M methods, including data harmonization in L3M, 3M research design, survey design, 3M measurement invariance, analysis, and visualization, and reporting. Opportunities to teach about para data, meta data, and data management in L3M designs.

The country units are an unbalanced panel derived from non-probability samples of countries and respondents> Panels (countries) have left and right censorship and are thusly unbalanced. This design limitation can be overcome to the extent that VOTP panels are harmonized with public measurements from other 3M surveys to establish balance in terms of panels and occasions of measurement. Should L3M harmonization occur, these data can be assigned confidence weights to reflect the amount of error in these surveys.

Pooled public opinion surveys (country means), when combine with higher quality country measurements of the same concepts (ASQ, ADQ), can be leveraged to increase the statistical power of pooled publics opinion research designs (multiple L3M datasets)…that is, in studies of public, rather than personal, beliefs.

The Gallup Voice of the People survey data are based on uncertain sampling methods based on underspecified methods. Country sampling is non-random. The sampling method appears be primarily probability and quota sampling, with occasional oversample of urban populations in difficult to survey populations. The sampling units (countries and individuals) are poorly defined, suggesting these data have more value in research designs calling for independent samples replication and repeated-measures frameworks.

**The Voice of the People Survey Series is WIN/Gallup International Association's End of Year survey and is a global study that collects the public's view on the challenges that the world faces today. Ongoing since 1977, the purpose of WIN/Gallup International's End of Year survey is to provide a platform for respondents to speak out concerning government and corporate policies. The Voice of the People, End of Year Surveys for 2012, fielded June 2012 to February 2013, were conducted in 56 countries to solicit public opinion on social and political issues. Respondents were asked whether their country was governed by the will of the people, as well as their attitudes about their society. Additional questions addressed respondents' living conditions and feelings of safety around their living area, as well as personal happiness. Respondents' opinions were also gathered in relation to business development and their views on the effectiveness of the World Health Organization. Respondents were also surveyed on ownership and use of mobile devices. Demographic information includes sex, age, income, education level, employment status, and type of living area.
f
People Data | Global |Reach - 900 Million Records for Comprehensive Consumer...
factori.ai
Updated Dec 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). People Data | Global |Reach - 900 Million Records for Comprehensive Consumer Insights & Data Enrichment [Dataset]. https://www.factori.ai/datasets/people-data/
Explore at:
Dataset updated
Dec 24, 2024
License
https://www.factori.ai/privacy-policyhttps://www.factori.ai/privacy-policy
Area covered
Global
Description
Our proprietary People Data is a mobile user dataset that connects anonymous IDs to a wide range of attributes, including demographics, device ownership, audience segments, key locations, and more. This rich dataset allows our partner brands to gain a comprehensive view of consumers based on their personas, enabling them to derive actionable insights swiftly.

People Data Graph

Record Count: 900 Million

Capturing Frequency: Once per Event

Delivering Frequency: Once per Month

Updated: Monthly

People Data

Reach Our extensive data reach covers a variety of categories, encompassing user demographics, Mobile Advertising IDs (MAID), device details, locations, affluence, interests, traveled countries, and more. Data Export Methodology We dynamically collect and provide the most updated data and insights through the best-suited method at appropriate intervals, whether daily, weekly, monthly, or quarterly.

Business Needs

Our People Data caters to various business needs, offering valuable insights for consumer analysis, data enrichment, sales forecasting, and retail analytics, empowering brands to make informed decisions and optimize their strategies.
w
Afrobarometer Survey 1 1999-2000, Merged 7 Country - Botswana, Lesotho,...
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Apr 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute for Democracy in South Africa (IDASA) (2021). Afrobarometer Survey 1 1999-2000, Merged 7 Country - Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia, Zimbabwe [Dataset]. https://microdata.worldbank.org/index.php/catalog/889
Explore at:
Dataset updated
Apr 27, 2021
Dataset provided by
Institute for Democracy in South Africa (IDASA)
Ghana Centre for Democratic Development (CDD-Ghana)
Michigan State University (MSU)
Time period covered
1999 - 2000
Area covered
Malawi, Zambia, Namibia, Africa, Lesotho, Botswana, South Africa, Zimbabwe
Description
Abstract

Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.

The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire

Geographic coverage

Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe

Analysis unit

Basic units of analysis that the study investigates include: individuals and groups

Kind of data

Sample survey data [ssd]

Sampling procedure

A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.

The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.

Sample Universe

The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.

What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.

Sample Design

The sample design is a clustered, stratified, multi-stage, area probability sample.

To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.

In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:

The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages

A first-stage to stratify and randomly select primary sampling units;

A second-stage to randomly select sampling start-points;

A third stage to randomly choose households;

A final-stage involving the random selection of individual respondents

We shall deal with each of these stages in turn.

STAGE ONE: Selection of Primary Sampling Units (PSUs)

The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.

We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.

Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.

Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.

Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.

Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.

The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.

These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.

The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will
Countries with the highest number of internet users 2025
statista.com
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Countries with the highest number of internet users 2025 [Dataset]. https://www.statista.com/statistics/262966/number-of-internet-users-in-selected-countries/
Explore at:
Dataset updated
Feb 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
World
Description
As of February 2025, China ranked first among the countries with the most internet users worldwide. The world's most populated country had 1.11 billion internet users, more than triple the third-ranked United States, with just around 322 million internet users. Overall, all BRIC markets had over two billion internet users, accounting for four of the ten countries with more than 100 million internet users. Worldwide internet usage As of October 2024, there were more than five billion internet users worldwide. There are, however, stark differences in user distribution according to region. Eastern Asia is home to 1.34 billion internet users, while African and Middle Eastern regions had lower user figures. Moreover, the urban areas showed a higher percentage of internet access than rural areas. Internet use in China China ranks first in the list of countries with the most internet users. Due to its ongoing and fast-paced economic development and a cultural inclination towards technology, more than a billion of the estimated 1.4 billion population in China are online. As of the third quarter of 2023, around 87 percent of Chinese internet users stated using WeChat, the most popular social network in the country. On average, Chinese internet users spent five hours and 33 minutes online daily.

Facebook

Twitter

Click to copy link

Link copied

Cite

Treich (2020). Country metadata [Dataset]. https://www.kaggle.com/datasets/treich/country-metadata/discussion

Country metadata

Collection of public data for most countries

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 26, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Treich

Description

Context

This dataset simply combines publicly available data to characterise a country based on healthcare factors, economy, government and demographics.

Content

All data are given per 100.000 inhabitants where this is appropriate scores are given as absolute values and so are spending and demographics. Each row represents one country. Data that is included covers the following topics:

Healthcare: - Staff including: Nurses and Physicians per 100.000 inhabitants - Infrastructure including: Beds, Chnage of beds between 2018 and 2019 and the change of bed numbers since 2013, Intensive Care Unit (ICU) beds, ventilators and Extra Corporal Membrane Oxygenation (ECMO), machines per 100.000 inhabitants - Total spending on healthcare in US dollars per capita.

Demographics: - The median age for entire population and each gender - The percentage of the population within age brackets - Total population - Population per km2 - Population change between 2018 and 2019

Government The used scores are from the Economist intelligence unit and describe how democratic a country is and how the government works. These can be used to compare countries based on their government type.

Acknowledgements

All data is publicly available and just has been brought together in one place. The sources are:

Inspiration

These data are meant as metadata to decide which countries are comparable. I am working on healthcare data so the inspiration is to compare health statistics between countries and make an informed decision about how comparable they are. Could be used for any non healthcare related task as well.

Clear search

Close search

Google apps

Main menu

Country metadata

Context

Content

Acknowledgements

Inspiration

COVID-19 cases and deaths per million in 210 countries as of July 13, 2022

Geonames - All Cities with a population > 1000

Most surveilled countries worldwide 2022, by number of people affected

World Population Statistics - 2023

Content

Worldwide COVID-19 Data from WHO (2025 Edition)

Dataset Overview

Source Information

Dataset Contents

How to Use

Data Reliability

Acknowledgements

The Marshall Project: COVID Cases in Prisons

Overview

Methodology and Caveats

About the Data

Queries

Attribution

Contributors

Questions

Leading countries by number of data centers 2025

Data from: Coastal proximity of populations in 22 Pacific Island Countries...

Geo-Refugee: A Refugee Location Dataset

Population and Population Density Dataset.

Content

Acknowledgements

Inspiration

Africa Population Distribution Database

Afrobarometer Survey 2002-2004, Merged Round 2 Data (16 Countries) -...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

World Bank: International Debt Data

Context

Content

Acknowledgements

Inspiration

PERSONAL SAVINGS by Country Dataset

COVID-19 Trends in Each Country

VOTP Dataset

People Data | Global |Reach - 900 Million Records for Comprehensive Consumer...

People Data Graph

People Data

Business Needs

Afrobarometer Survey 1 1999-2000, Merged 7 Country - Botswana, Lesotho,...

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Countries with the highest number of internet users 2025

Country metadata

Collection of public data for most countries

Context

Content

Acknowledgements

Inspiration