2 datasets found

w
Synthetic Data for an Imaginary Country, Full Population, 2023 - World
microdata.worldbank.org
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Full Population, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5908
Explore at:
Dataset updated
Jul 3, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World, World
Description
Abstract

The dataset is a relational dataset of 10,003,891 individuals (2,501,755 households), representing the entire population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

A sample dataset of 8000 households was created out of this full-population dataset, and is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the full national population of this country, by province and district (equivalent to admin1 and admin2 levels) and by urban/rural areas of residence.

Analysis unit

household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

cen

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
w
Synthetic Data for an Imaginary Country, Sample, 2023 - World
microdata.worldbank.org
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
Explore at:
Dataset updated
Jul 7, 2023
Dataset authored and provided by
Development Data Group, Data Analytics Unit
Time period covered
2023
Area covered
World, World
Description
Abstract

The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

The full-population dataset (with about 10 million individuals) is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

Analysis unit

Household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

ssd

Sampling procedure

The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Response rate

This is a synthetic dataset; the "response rate" is 100%.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Full Population, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5908

Synthetic Data for an Imaginary Country, Full Population, 2023 - World

Explore at:

Dataset updated

Jul 3, 2023

Dataset authored and provided by

Development Data Group, Data Analytics Unit

Time period covered

2023

Area covered

World, World

Description

Abstract

The dataset is a relational dataset of 10,003,891 individuals (2,501,755 households), representing the entire population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

A sample dataset of 8000 households was created out of this full-population dataset, and is also distributed as open data.

Geographic coverage

The dataset is a synthetic dataset for an imaginary country. It was created to represent the full national population of this country, by province and district (equivalent to admin1 and admin2 levels) and by urban/rural areas of residence.

Analysis unit

household, Individual

Universe

The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

Kind of data

cen

Mode of data collection

other

Research instrument

The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

Cleaning operations

The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

Clear search

Close search

Google apps

Main menu

Synthetic Data for an Imaginary Country, Full Population, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Mode of data collection

Research instrument

Cleaning operations

Synthetic Data for an Imaginary Country, Sample, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Synthetic Data for an Imaginary Country, Full Population, 2023 - World

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Mode of data collection

Research instrument

Cleaning operations