100+ datasets found

Our World in Data - COVID-19
kaggle.com
zip
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mario Caesar (2023). Our World in Data - COVID-19 [Dataset]. https://www.kaggle.com/datasets/caesarmario/our-world-in-data-covid19-dataset/code
Explore at:
zip(14235238 bytes)Available download formats
Dataset updated
Oct 25, 2023
Authors
Mario Caesar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Our World in Data - COVID-19

▶ About Our World in Data 🏢

Our World in Data website

Our World in Data GitHub

▶ Similar Datasets 📄

COVID-19 World Vaccination Progress

The Our World in Data COVID Vaccination Data

Data on COVID-19 (coronavirus)

COVID-19 dataset by Our World in Data

▶ Context 📝

The complete COVID-19 dataset is a collection of the COVID-19 data maintained and provided by Our World in Data. Our World in Data team will update it daily throughout the duration of the COVID-19 pandemic.

▶ Content 📃

These are the following information that includes in the dataset: | Metrics | Source | Updated | Countries | | --- | --- | | Vaccinations | Official data collated by the Our World in Data team | Daily | 218 | | Tests & positivity | Official data collated by the Our World in Data team | Weekly | 139 | | Hospital & ICU | Official data collated by the Our World in Data team | Weekly | 39 | | Confirmed cases | JHU CSSE COVID-19 Data | Daily | 196 | | Confirmed deaths | JHU CSSE COVID-19 Data | Daily | 196 | | Reproduction rate | Arroyo-Marioli F, Bullano F, Kucinskas S, Rondón-Moreno C | Daily | 185 | | Policy responses | Oxford COVID-19 Government Response Tracker | Daily | 186 | | Other variables of interest | International organizations (UN, World Bank, OECD, IHME…) | Fixed |

Data dictionary is available below ⤵

▶ Acknowledgements 🙏

I'd like to clarify that I'm only making data about vaccines collected by Our World in Data available to Kaggle community. This dataset is gathered, integrated, and posted the new version on a daily basis, as maintained by Our World in Data on their GitHub repository.

▶ Inspiration 💭

Forecasting daily new confirmed cases of COVID-19 in specific country.

Perform data analysis/data visualization of COVID-19 cases/death/etc.

📷 Images by Fusion Medical Animation.
Data generation volume worldwide 2010-2029
statista.com
Updated Nov 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Nov 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.
The World Bank Listening to LAC (L2L) Pilot 2012 - Honduras
microdata.worldbank.org
catalog.ihsn.org
Updated Jul 8, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2014). The World Bank Listening to LAC (L2L) Pilot 2012 - Honduras [Dataset]. https://microdata.worldbank.org/index.php/catalog/2021
Explore at:
Dataset updated
Jul 8, 2014
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
Authors
World Bank
Time period covered
2012
Area covered
Honduras
Description
Abstract

The rapid and massive dissemination of mobile phones in the developing world is creating new opportunities for the discipline of survey research. The World Bank is interested in leveraging mobile phone technology as a means of direct communication with poor households in the developing world in order to gather rapid feedback on the impact of economic crises and other events on the economy of such households.

The World Bank commissioned Gallup to conduct the Listening to LAC (L2L) pilot program, a research project aimed at testing the feasibility of mobile phone technology as a way of data collection for conducting quick turnaround, self-administered, longitudinal surveys among households in Peru and Honduras.

The project used face-to-face interviews as its benchmark, and included Short Message Service (SMS), Interactive Voice Response (IVR) and Computer Assisted Telephone Interviews (CATI) as test methods of data collection.

The pilot was designed in a way that allowed testing the response rates and the quality of data, while also providing information on the cost of collecting data using mobile phones. Researchers also evaluated if providing incentives affected panel attrition rates. The Honduras design was a test-retest design, which is closely related to the difference-in-difference methodology of experimental evaluation.

The random stratified multistage sampling technique was used to select a nationally representative sample of 1,500 households. During the initial face-to-face interviews, researchers gathered information on the socio-economic characteristics of households and recruited participants for follow-up research. Questions wording was the same in all modes of data collection.

In Honduras, after the initial face-to-face interviews, respondents were exposed to the remaining three methodologies according to a randomized scheme (three rotations, one methodology per week). Panelists in Honduras were surveyed for four and a half months, starting in February 2012.

Geographic coverage

Includes the entire national territory, with the exception of neighborhoods where access of interviewers is extremely difficult, due to lack of transportation infrastructure or for situations that threaten the physical integrity of the interviewers and supervisors (i.e. extremely high crime rate, warfare, etc.)

Analysis unit

Households

Universe

All the households that exist in the neighborhoods of Honduras, as reported by the 2001 Census. Institutions such as military, religious or educational living quarters are not included in the universe.

Kind of data

Sample survey data [ssd]

Sampling procedure

Honduras did not have an income oversample because the poverty rate is 60 percent, so oversampling 20 percent above the poverty rate would include a large portion of the middle class, which are not the most vulnerable in times of crisis.

The Honduras panel was built on a nationally representative sample of 1,500 households. The sample was drawn by means of a random, stratified, multistage design. The pilot used Gallup World Poll sampling frame.

Census-defined municipalities were classified into five strata according to population size: I. Municipalities with 500,000 to 999,000 inhabitants II. Municipalities with 100,000 to 499,000 inhabitants III. Municipalities with 50,000 to 99,000 inhabitants IV. Municipalities with 10,000 and 49,000 inhabitants V. Municipalities with less than 10,000 inhabitants

Interviews were then proportionally allocated to these five strata according to their share among the country's population.

The first stage of the design consisted of a random selection of Primary Sampling Units (PSU's) within each of the five strata previously defined.

In the second stage, in each PSU, one or more Secondary Sampling Units (SSU's) were then selected.

Once SSU's were selected, interviewers were sent to the field to proceed with the third stage of the sample design, which consisted of selecting households using a systematic "random route" procedure. Interviewers started from the previously selected "random origin" and walked around the block in clockwise direction, selecting every third household on their right hand side. They were also trained to handle vacant, nonresponsive, non-cooperative households, as well as other failed attempts, in a systematic manner.

Mode of data collection

Other [oth]

Research instrument

The following survey instruments were used in the project:

1) Initial face-to-face questionnaire

In Peru, the starting point was the ENAHO (National Household Survey) questionnaire. Step-wise regressions were done to select the set of questions that best predicted consumption. For the purposes of robustness, the regressions were also done with questions that best predicted income, which yielded the same results. A similar procedure was done in Honduras, using the latest household survey deployed by the Honduran Statistics Institute, except that only best predictors of income were chosen, because Honduras did not have a recent consumption aggregate.

The survey gathered information on households' demographics, household infrastructure, employment, remittances, income, accidents, food security, self-perceptions on poverty, Internet access and cellphones use.

2) Monthly questionnaires (SMS, IVR, CATI)

The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters. A maximum of 10 questions had to be chosen for the monthly questionnaire. In addition, two questions sought to ensure the validity of the responses by testing if the respondent was a member of the household. Most questions were time-variant and each questionnaire was repeated to observe if answers changed over time. All questions related to variables that strongly affect household welfare and are likely to change in times of crisis.

3) Final face-to-face questionnaire

Gallup conducted face-to-face closing surveys among 700 panelists. The researchers asked about issues the respondets had with mobile phones and coverage during the test. Panelists were also asked what would motivate them to keep on participating in a project like this in the future.

The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters, unlike IVR and CATI.

Response rate

In Honduras, 41% of recruited households failed to answer the first round of follow-up surveys. The attrition rate from the initial face-to-face interview to the end of panel study was 50%.

As part of the survey administration process Gallup implemented a number of mechanisms to maximize the response rate and panelist retention. The following strategies were applied to respondents who did not replay first time:

The surveys were left open for responses for up to 2 weeks after the original transmission of the survey (from original call in the case of IVR and CATI).

First reminder was sent within 72 hours of first attempt (SMS and IVR).

Second reminder was sent within 144 hours of first attempt (SMS and IVR).

Call backs were made within 72 and 144 hours of first attempt (CATI); or

Up to 2 call backs were made per appointment with respondent (CATI).

Also, in order to minimize non-response, three types of incentives were given. First, households that did not own a mobile phone were provided one for free. Approximately 127 phones were donated in Honduras. Second, all communications between the interviewers and the households were free to the respondents. Finally, households were randomly assigned to one of three incentive levels: one-third of households received US$1 in free airtime for each questionnaire they answered, one-third received US$5 in free airtime, and one-third received no financial incentive (the control group).
World Bank Indicators (1960‑Present)
kaggle.com
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George DiNicola (2025). World Bank Indicators (1960‑Present) [Dataset]. https://www.kaggle.com/datasets/georgejdinicola/world-bank-indicators
Explore at:
zip(52559856 bytes)Available download formats
Dataset updated
May 29, 2025
Authors
George DiNicola
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Overview

This dataset provides a comprehensive collection of time series data sourced from the World Bank Open Data Platform, covering a wide range of global indicators from 1960 to the most recently published year. It includes economic, social, environmental, and demographic metrics, making it an ideal resource for researchers, data scientists, and policymakers interested in global development trends, economic forecasting, or socio-economic analysis.

A tutorial on how to combined the dataset topics together into one large dataset can be found here

Why this Dataset?

My motivation for this project was to curate a high-quality collection of datasets for World Bank indicators organized by topics and structured in time-series, making them more accessible for data science projects. Since the World Bank’s Kaggle datasets have not been updated since 2019 https://www.kaggle.com/organizations/theworldbank, I saw an opportunity to provide more current data for the data analysis community.

Dataset Collection Contents

This collection brings together more than 800 World Bank indicators organized into 18 topic‑specific CSV files. Each file is structured as a country‑year panel: every row represents a unique combination of year (1960‑present) and ISO‑3 country code, while the columns hold the topic’s indicators.

The collection includes datasets with a variety of indicators, such as: - Economic Metrics: GDP growth (%), GDP per capita, consumer price inflation, merchandise trade, gross capital formation, and more.
- Social Metrics: School enrollment (primary, secondary, tertiary), infant mortality rate, maternal mortality rate, poverty headcount, and more.
- Environmental Metrics: Forest area, renewable energy consumption, food production indices, and more.
- Demographic Metrics: Urban population, life expectancy, net migration, and more.

Usage

This dataset is ideal for a variety of applications, including: - Economic forecasting and trend analysis (e.g., GDP growth, inflation).
- Socio-economic studies (e.g., education, health, poverty).
- Environmental impact analysis (e.g., renewable energy adoption).
- Demographic research (e.g., population trends, migration).

Topic datasets can be merged with each other using year and country code. This tutorial with notebook code can help you get started quickly.

Collection Methodology

The data is collected via a custom software application that discovers and groups high-quality indicators with rules-based logic & artificial intelligence, generates metadata, and performs ETL for the data from the World Bank API. The result is a clean, up‑to‑date collection of World Bank indicators in time-series format that is ready for analysis—no manual downloads or data wrangling required.

Modifications

The original World Bank data has been aggregated and transformed for ease of use. Missing values have been preserved as provided by the World Bank, and no significant transformations have been applied beyond formatting and aggregation into a single file.

Source & Attribution

The World Bank: World Development Indicators

This dataset is publicly available and sourced from the World Bank Open Data Platform and is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. When using this data, please attribute the World Bank as follows: "Data sourced from the World Bank, licensed under CC BY 4.0." For more details on the World Bank’s terms of use, visit: https://www.worldbank.org/en/about/legal/terms-of-use-for-datasets.

License

This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Feel free to use this data in Kaggle notebooks, academic research, or policy analysis. If you create a derived dataset or analysis, I encourage you to share it with the Kaggle community.
Data from: World Data Bank II: North America, South America, Europe, Africa,...
icpsr.umich.edu
ascii
Updated Jan 18, 2006
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States. Central Intelligence Agency (2006). World Data Bank II: North America, South America, Europe, Africa, Asia [Dataset]. http://doi.org/10.3886/ICPSR08376.v1
Explore at:
asciiAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR08376.v1
Dataset updated
Jan 18, 2006
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States. Central Intelligence Agency
License
https://www.icpsr.umich.edu/web/ICPSR/studies/8376/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8376/terms
Area covered
Africa, Asia, Americas, South America, North America, Europe, United States, Canada
Description
The boundaries of five different geographic areas -- North America, South America, Europe, Africa, and Asia -- are digitally represented in this collection of data files that can be used in the production of computer maps. Each of the five areas is encoded in three distinct files: (1) coastline, islands, and lakes, (2) rivers, and (3) international boundaries. There is an additional file for North America (Part 4: North America: Internal Boundaries) delineating state lines in the United States and provincial boundaries in Canada. The data in each of the files is hierarchically structured into subordinate geographic features and ranks, which may be used for output plotting symbol definition. The mapping scale used to encode the data ranged from 1:1 million to 1:4 million.
The World Bank Listening to LAC (L2L) Pilot 2011 - Peru
microdata.worldbank.org
datacatalog.ihsn.org
+1more
Updated Jul 8, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2014). The World Bank Listening to LAC (L2L) Pilot 2011 - Peru [Dataset]. https://microdata.worldbank.org/index.php/catalog/2022
Explore at:
Dataset updated
Jul 8, 2014
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
Authors
World Bank
Time period covered
2011 - 2012
Area covered
Peru
Description
Abstract

The rapid and massive dissemination of mobile phones in the developing world is creating new opportunities for the discipline of survey research. The World Bank is interested in leveraging mobile phone technology as a means of direct communication with poor households in the developing world in order to gather rapid feedback on the impact of economic crises and other events on the economy of such households.

The World Bank commissioned Gallup to conduct the Listening to LAC (L2L) pilot program, a research project aimed at testing the feasibility of mobile phone technology as a way of data collection for conducting quick turnaround, self-administered, longitudinal surveys among households in Peru and Honduras.

The project used face-to-face interviews as its benchmark, and included Short Message Service (SMS), Interactive Voice Response (IVR) and Computer Assisted Telephone Interviews (CATI) as test methods of data collection.

The pilot was designed in a way that allowed testing the response rates and the quality of data, while also providing information on the cost of collecting data using mobile phones. Researchers also evaluated if providing incentives affected panel attrition rates.

The random stratified multistage sampling technique was used to select a nationally representative sample of 1,500 households. During the initial face-to-face interviews, researchers gathered information on the socio-economic characteristics of households and recruited participants for follow-up research. Questions wording was the same in all modes of data collection.

In Peru, households were randomly assigned to a communication mode (SMS, IVR, CATI), which stayed constant for all rounds (waves) of the survey.

Geographic coverage

Includes the entire national territory, with the exception of neighborhoods where access of interviewers is extremely difficult, due to lack of transportation infrastructure or for situations that threaten the physical integrity of the interviewers and supervisors (i.e. extremely high crime rate, warfare, etc.)

Analysis unit

Households

Kind of data

Sample survey data [ssd]

Sampling procedure

The Peru panel was built on a nationally representative sample of 1,500 households. The sample was based on the sampling frame for the National Household Survey (ENAHO) conducted by the Peruvian National Statistics Office (INEI) every three months.

In Peru, the sample selection was guided by the following criteria: (i) the sample should be representative nationally, and in urban and rural areas, and (ii) households close to poverty line should be oversampled because policy decisions in time of crises need to be especially mindful of the poor and vulnerable. For the purposes of this project, "close to poverty line" was defined as 40 percent of consumption distribution that symmetrically band the national poverty line: 20 percent above and 20 percent below. In 27 percent of Peruvian households monthly per capita consumption was below the moderate poverty line in 2010 (ENAHO).Those households whose monthly per capita consumption falls between 7 and 47 percent of the national distribution were oversampled.

The L2L sample frame comprises all the panel conglomerados from the fourth trimester of ENAHO 2010, or 281 conglomerados.

Detailed information about the sampling procedure is available in "Listening to LAC: Using Mobile Phones for High Frequency Data Collection, Final Report" (p. 65-69) and "The World Bank Listening to LAC (L2L) Pilot Project Sample Design for Peru."

Sampling deviation

A number of restive communities in Peru did not allow Gallup's interviewers to enter the area. Where possible, these were replaced following INEI's standard methodology. When confronted with a problem in a particular location, INEI moves to the next "Centro Poblado" in the same "Conglomerado."

Mode of data collection

Other [oth]

Research instrument

The following survey instruments were used in the project:

1) Initial face-to-face questionnaire

In Peru, the starting point was the ENAHO (National Household Survey) questionnaire. Step-wise regressions were done to select the set of questions that best predicted consumption. For the purposes of robustness, the regressions were also done with questions that best predicted income, which yielded the same results.

The survey gathered information on households' demographics, household infrastructure, employment, remittances, income, accidents, food security, self-perceptions on poverty, Internet access and cellphones use.

2) Monthly questionnaires (SMS, IVR, CATI)

The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters. A maximum of 10 questions had to be chosen for the monthly questionnaire. In addition, two questions sought to ensure the validity of the responses by testing if the respondent was a member of the household. Most questions were time-variant and each questionnaire was repeated to observe if answers changed over time. All questions related to variables that strongly affect household welfare and are likely to change in times of crisis.

A maximum of 10 questions was chosen for the monthly questionnaire. In addition, two questions sought to ensure the validity of the responses by testing if the respondent was a member of the household. To accomplish this, the first two questions in each monthly questionnaire asked the respondent for their gender and year of birth, and the answers were compared to the household roster obtained during the face-to-face interview.

3) Final face-to-face questionnaire

Gallup conducted face-to-face closing surveys among 700 panelists. The researchers asked about issues the respondets had with mobile phones and coverage during the test. Panelists were also asked what would motivate them to keep on participating in a project like this in the future.

Response rate

In Peru, 67 percent of recruited households failed to answer the first round of follow-up surveys. Attrition slightly increased with each wave of the survey (between 1 and 3 percentage points per wave), reaching 75 percent in wave 6.

As part of the survey administration process Gallup implemented a number of mechanisms to maximize the response rate and panelist retention. The following strategies were applied to respondents who did not replay first time:

The surveys were left open for responses for up to 2 weeks after the original transmission of the survey (from original call in the case of IVR and CATI).

First reminder was sent within 72 hours of first attempt (SMS and IVR).

Second reminder was sent within 144 hours of first attempt (SMS and IVR).

Call backs were made within 72 and 144 hours of first attempt (CATI); or

Up to 2 call backs were made per appointment with respondent (CATI).

Also, in order to minimize non-response, three types of incentives were given. First, households that did not own a mobile phone were provided one for free. Approximately 200 phones were donated in Peru. Second, all communications between the interviewers and the households were free to the respondents. Finally, households were randomly assigned to one of three incentive levels: one-third of households received US$1 in free airtime for each questionnaire they answered, one-third received US$5 in free airtime, and one-third received no financial incentive (the control group).
Multi Country Study Survey 2000-2001 - Sweden
apps.who.int
catalog.ihsn.org
+1more
Updated Jan 23, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Health Organization (WHO) (2014). Multi Country Study Survey 2000-2001 - Sweden [Dataset]. https://apps.who.int/healthinfo/systems/surveydata/index.php/catalog/159
Explore at:
Dataset updated
Jan 23, 2014
Dataset provided by
World Health Organizationhttps://who.int/
Authors
World Health Organization (WHO)
Time period covered
2000 - 2001
Area covered
Sweden
Description
Abstract

In order to develop various methods of comparable data collection on health and health system responsiveness WHO started a scientific survey study in 2000-2001. This study has used a common survey instrument in nationally representative populations with modular structure for assessing health of indviduals in various domains, health system responsiveness, household health care expenditures, and additional modules in other areas such as adult mortality and health state valuations.

The health module of the survey instrument was based on selected domains of the International Classification of Functioning, Disability and Health (ICF) and was developed after a rigorous scientific review of various existing assessment instruments. The responsiveness module has been the result of ongoing work over the last 2 years that has involved international consultations with experts and key informants and has been informed by the scientific literature and pilot studies.

Questions on household expenditure and proportionate expenditure on health have been borrowed from existing surveys. The survey instrument has been developed in multiple languages using cognitive interviews and cultural applicability tests, stringent psychometric tests for reliability (i.e. test-retest reliability to demonstrate the stability of application) and most importantly, utilizing novel psychometric techniques for cross-population comparability.

The study was carried out in 61 countries completing 71 surveys because two different modes were intentionally used for comparison purposes in 10 countries. Surveys were conducted in different modes of in- person household 90 minute interviews in 14 countries; brief face-to-face interviews in 27 countries and computerized telephone interviews in 2 countries; and postal surveys in 28 countries. All samples were selected from nationally representative sampling frames with a known probability so as to make estimates based on general population parameters.

The survey study tested novel techniques to control the reporting bias between different groups of people in different cultures or demographic groups ( i.e. differential item functioning) so as to produce comparable estimates across cultures and groups. To achieve comparability, the selfreports of individuals of their own health were calibrated against well-known performance tests (i.e. self-report vision was measured against standard Snellen's visual acuity test) or against short descriptions in vignettes that marked known anchor points of difficulty (e.g. people with different levels of mobility such as a paraplegic person or an athlete who runs 4 km each day) so as to adjust the responses for comparability . The same method was also used for self-reports of individuals assessing responsiveness of their health systems where vignettes on different responsiveness domains describing different levels of responsiveness were used to calibrate the individual responses.

This data are useful in their own right to standardize indicators for different domains of health (such as cognition, mobility, self care, affect, usual activities, pain, social participation, etc.) but also provide a better measurement basis for assessing health of the populations in a comparable manner. The data from the surveys can be fed into composite measures such as "Healthy Life Expectancy" and improve the empirical data input for health information systems in different regions of the world. Data from the surveys were also useful to improve the measurement of the responsiveness of different health systems to the legitimate expectations of the population.

Kind of data

Sample survey data [ssd]

Sampling procedure

The metropolitan, urban and rural population and all .administrative regional units. as defined in Official Europe Union Statistics (NUTS 2) covered proportionately the respective population aged 18 and above. The country was divided into an appropriate number of areas, grouping NUTS regions at whatever level appropriately. The NUTS covered in Sweden were the following; Stockholm/Södertäjle A-Region, Gothenburgs A-Region, Malmö/Lund/Trelleborgs A-region, Semi urban area, Rural area.

The basic sample design was a multi-stage, random probability sample. 100 sampling points were drawn with probability proportional to population size, for a total coverage of the country. The sampling points were drawn after stratification by NUTS 2 region and by degree of urbanisation. They represented the whole territory of the country surveyed and are selected proportionally to the distribution of the population in terms of metropolitan, urban and rural areas. In each of the selected sampling points, one address was drawn at random. This starting address forms the first address of a cluster of a maximum of 20 addresses. The remainder of the cluster was selected as every Nth address by standard random route procedure from the initial address. In theory, there is no maximum number of addresses issued per country. Procedures for random household selection and random respondent selection are independent of the interviewer.s decision and controlled by the institute responsible. They should be as identical as possible from to country, full functional equivalence being a must.

At every address up to 4 recalls were made to attempt to achieve an interview with the selected respondent. There was only one interview per household. The final sample size is 1,000 completed interviews.

Mode of data collection

Face-to-face [f2f]

Cleaning operations

Data Coding At each site the data was coded by investigators to indicate the respondent status and the selection of the modules for each respondent within the survey design. After the interview was edited by the supervisor and considered adequate it was entered locally.

Data Entry Program A data entry program was developed in WHO specifically for the survey study and provided to the sites. It was developed using a database program called the I-Shell (short for Interview Shell), a tool designed for easy development of computerized questionnaires and data entry (34). This program allows for easy data cleaning and processing.

The data entry program checked for inconsistencies and validated the entries in each field by checking for valid response categories and range checks. For example, the program didn’t accept an age greater than 120. For almost all of the variables there existed a range or a list of possible values that the program checked for.

In addition, the data was entered twice to capture other data entry errors. The data entry program was able to warn the user whenever a value that did not match the first entry was entered at the second data entry. In this case the program asked the user to resolve the conflict by choosing either the 1st or the 2nd data entry value to be able to continue. After the second data entry was completed successfully, the data entry program placed a mark in the database in order to enable the checking of whether this process had been completed for each and every case.

Data Transfer The data entry program was capable of exporting the data that was entered into one compressed database file which could be easily sent to WHO using email attachments or a file transfer program onto a secure server no matter how many cases were in the file. The sites were allowed the use of as many computers and as many data entry personnel as they wanted. Each computer used for this purpose produced one file and they were merged once they were delivered to WHO with the help of other programs that were built for automating the process. The sites sent the data periodically as they collected it enabling the checking procedures and preliminary analyses in the early stages of the data collection.

Data quality checks Once the data was received it was analyzed for missing information, invalid responses and representativeness. Inconsistencies were also noted and reported back to sites.

Data Cleaning and Feedback After receipt of cleaned data from sites, another program was run to check for missing information, incorrect information (e.g. wrong use of center codes), duplicated data, etc. The output of this program was fed back to sites regularly. Mainly, this consisted of cases with duplicate IDs, duplicate cases (where the data for two respondents with different IDs were identical), wrong country codes, missing age, sex, education and some other important variables.
Component parts of the World Heat Flow Data Collection
doi.pangaea.de
dataone.org
html, tsv
Updated Apr 11, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Global Heat Flow Compilation Group (2013). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.810104
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.810104
Dataset updated
Apr 11, 2013
Dataset provided by
PANGAEA
Authors
Global Heat Flow Compilation Group
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered
Variables measured
Number, Comment, LATITUDE, ELEVATION, Heat flow, LONGITUDE, Area/locality, Depth, top/min, Method comment, Reference/source, and 8 more
Description
This data set is a compilation of heat flow data of uncertain origin. References as cited in Global Heat Flow Database were incomplete and thus could not be verified. This data compilation contains: data of unknown origin, unpublished data, data which has no full reference information or data which were extracted from other database. The remaining short citation and its related problem are listed in columns 18 and 19.
Multi Country Study Survey 2000-2001 - Iceland
apps.who.int
datacatalog.ihsn.org
+1more
Updated Jan 17, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Health Organization (WHO) (2014). Multi Country Study Survey 2000-2001 - Iceland [Dataset]. https://apps.who.int/healthinfo/systems/surveydata/index.php/catalog/174
Explore at:
Dataset updated
Jan 17, 2014
Dataset provided by
World Health Organizationhttps://who.int/
Authors
World Health Organization (WHO)
Time period covered
2000 - 2001
Area covered
Iceland
Description
Abstract

In order to develop various methods of comparable data collection on health and health system responsiveness WHO started a scientific survey study in 2000-2001. This study has used a common survey instrument in nationally representative populations with modular structure for assessing health of indviduals in various domains, health system responsiveness, household health care expenditures, and additional modules in other areas such as adult mortality and health state valuations.

The health module of the survey instrument was based on selected domains of the International Classification of Functioning, Disability and Health (ICF) and was developed after a rigorous scientific review of various existing assessment instruments. The responsiveness module has been the result of ongoing work over the last 2 years that has involved international consultations with experts and key informants and has been informed by the scientific literature and pilot studies.

Questions on household expenditure and proportionate expenditure on health have been borrowed from existing surveys. The survey instrument has been developed in multiple languages using cognitive interviews and cultural applicability tests, stringent psychometric tests for reliability (i.e. test-retest reliability to demonstrate the stability of application) and most importantly, utilizing novel psychometric techniques for cross-population comparability.

The study was carried out in 61 countries completing 71 surveys because two different modes were intentionally used for comparison purposes in 10 countries. Surveys were conducted in different modes of in- person household 90 minute interviews in 14 countries; brief face-to-face interviews in 27 countries and computerized telephone interviews in 2 countries; and postal surveys in 28 countries. All samples were selected from nationally representative sampling frames with a known probability so as to make estimates based on general population parameters.

The survey study tested novel techniques to control the reporting bias between different groups of people in different cultures or demographic groups ( i.e. differential item functioning) so as to produce comparable estimates across cultures and groups. To achieve comparability, the selfreports of individuals of their own health were calibrated against well-known performance tests (i.e. self-report vision was measured against standard Snellen's visual acuity test) or against short descriptions in vignettes that marked known anchor points of difficulty (e.g. people with different levels of mobility such as a paraplegic person or an athlete who runs 4 km each day) so as to adjust the responses for comparability . The same method was also used for self-reports of individuals assessing responsiveness of their health systems where vignettes on different responsiveness domains describing different levels of responsiveness were used to calibrate the individual responses.

This data are useful in their own right to standardize indicators for different domains of health (such as cognition, mobility, self care, affect, usual activities, pain, social participation, etc.) but also provide a better measurement basis for assessing health of the populations in a comparable manner. The data from the surveys can be fed into composite measures such as "Healthy Life Expectancy" and improve the empirical data input for health information systems in different regions of the world. Data from the surveys were also useful to improve the measurement of the responsiveness of different health systems to the legitimate expectations of the population.

Kind of data

Sample survey data [ssd]

Sampling procedure

The metropolitan, urban and rural population and all .administrative regional units. as defined in Official Europe Union Statistics (NUTS 2) covered proportionately the respective population aged 18 and above. The country was divided into an appropriate number of areas, grouping NUTS regions at whatever level appropriately. The NUTS covered in Iceland were the following; Reykjavik, Near Reykjavik and Sudurnes, West-Iceland, North-Iceland, East-Iceland, South-Iceland.

The basic sample design was a multi-stage, random probability sample. 50 sampling points were drawn with probability proportional to population size, for a total coverage of the country. The sampling points were drawn after stratification by NUTS 2 region and by degree of urbanisation. They represented the whole territory of the country surveyed and are selected proportionally to the distribution of the population in terms of metropolitan, urban and rural areas. In each of the selected sampling points, one address was drawn at random. This starting address forms the first address of a cluster of a maximum of 20 addresses. The remainder of the cluster was selected as every Nth address by standard random route procedure from the initial address. In theory, there is no maximum number of addresses issued per country. Procedures for random household selection and random respondent selection are independent of the interviewer.s decision and controlled by the institute responsible. They should be as identical as possible from to country, full functional equivalence being a must.

At every address up to 4 recalls were made to attempt to achieve an interview with the selected respondent. There was only one interview per household. The final sample size is 489 completed interviews.

Mode of data collection

Face-to-face [f2f]

Cleaning operations

Data Coding At each site the data was coded by investigators to indicate the respondent status and the selection of the modules for each respondent within the survey design. After the interview was edited by the supervisor and considered adequate it was entered locally.

Data Entry Program A data entry program was developed in WHO specifically for the survey study and provided to the sites. It was developed using a database program called the I-Shell (short for Interview Shell), a tool designed for easy development of computerized questionnaires and data entry (34). This program allows for easy data cleaning and processing.

The data entry program checked for inconsistencies and validated the entries in each field by checking for valid response categories and range checks. For example, the program didn’t accept an age greater than 120. For almost all of the variables there existed a range or a list of possible values that the program checked for.

In addition, the data was entered twice to capture other data entry errors. The data entry program was able to warn the user whenever a value that did not match the first entry was entered at the second data entry. In this case the program asked the user to resolve the conflict by choosing either the 1st or the 2nd data entry value to be able to continue. After the second data entry was completed successfully, the data entry program placed a mark in the database in order to enable the checking of whether this process had been completed for each and every case.

Data Transfer The data entry program was capable of exporting the data that was entered into one compressed database file which could be easily sent to WHO using email attachments or a file transfer program onto a secure server no matter how many cases were in the file. The sites were allowed the use of as many computers and as many data entry personnel as they wanted. Each computer used for this purpose produced one file and they were merged once they were delivered to WHO with the help of other programs that were built for automating the process. The sites sent the data periodically as they collected it enabling the checking procedures and preliminary analyses in the early stages of the data collection.

Data quality checks Once the data was received it was analyzed for missing information, invalid responses and representativeness. Inconsistencies were also noted and reported back to sites.

Data Cleaning and Feedback After receipt of cleaned data from sites, another program was run to check for missing information, incorrect information (e.g. wrong use of center codes), duplicated data, etc. The output of this program was fed back to sites regularly. Mainly, this consisted of cases with duplicate IDs, duplicate cases (where the data for two respondents with different IDs were identical), wrong country codes, missing age, sex, education and some other important variables.
Data from: EC-Earth-Consortium EC-Earth3 model output prepared for CMIP6...
wdc-climate.de
Updated 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EC-Earth Consortium (EC-Earth) (2019). EC-Earth-Consortium EC-Earth3 model output prepared for CMIP6 CMIP [Dataset]. http://doi.org/10.22033/ESGF/CMIP6.181
Explore at:
Unique identifier
https://doi.org/10.22033/ESGF/CMIP6.181
Dataset updated
2019
Dataset provided by
Earth System Grid
World Data Center for Climate (WDCC) at DKRZ
Authors
EC-Earth Consortium (EC-Earth)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Coupled Model Intercomparison Project Phase 6 (CMIP6) datasets. These data include all datasets published for 'CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3' with the full Data Reference Syntax following the template 'mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version'.

The EC Earth 3.3 climate model, released in 2019, includes the following components: atmos: IFS cy36r4 (TL255, linearly reduced Gaussian grid equivalent to 512 x 256 longitude/latitude; 91 levels; top level 0.01 hPa), land: HTESSEL (land surface scheme built in IFS), ocean: NEMO3.6 (ORCA1 tripolar primarily 1 deg with meridional refinement down to 1/3 degree in the tropics; 362 x 292 longitude/latitude; 75 levels; top grid cell 0-1 m), seaIce: LIM3. The model was run by the AEMET, Spain; BSC, Spain; CNR-ISAC, Italy; DMI, Denmark; ENEA, Italy; FMI, Finland; Geomar, Germany; ICHEC, Ireland; ICTP, Italy; IDL, Portugal; IMAU, The Netherlands; IPMA, Portugal; KIT, Karlsruhe, Germany; KNMI, The Netherlands; Lund University, Sweden; Met Eireann, Ireland; NLeSC, The Netherlands; NTNU, Norway; Oxford University, UK; surfSARA, The Netherlands; SMHI, Sweden; Stockholm University, Sweden; Unite ASTR, Belgium; University College Dublin, Ireland; University of Bergen, Norway; University of Copenhagen, Denmark; University of Helsinki, Finland; University of Santiago de Compostela, Spain; Uppsala University, Sweden; Utrecht University, The Netherlands; Vrije Universiteit Amsterdam, the Netherlands; Wageningen University, The Netherlands. Mailing address: EC-Earth consortium, Rossby Center, Swedish Meteorological and Hydrological Institute/SMHI, SE-601 76 Norrkoping, Sweden (EC-Earth-Consortium) in native nominal resolutions: atmos: 100 km, land: 100 km, ocean: 100 km, seaIce: 100 km.

Project: These data have been generated as part of the internationally-coordinated Coupled Model Intercomparison Project Phase 6 (CMIP6; see also GMD Special Issue: http://www.geosci-model-dev.net/special_issue590.html). The simulation data provides a basis for climate research designed to answer fundamental science questions and serves as resource for authors of the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC-AR6).

CMIP6 is a project coordinated by the Working Group on Coupled Modelling (WGCM) as part of the World Climate Research Programme (WCRP). Phase 6 builds on previous phases executed under the leadership of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and relies on the Earth System Grid Federation (ESGF) and the Centre for Environmental Data Analysis (CEDA) along with numerous related activities for implementation. The original data is hosted and partially replicated on a federated collection of data nodes, and most of the data relied on by the IPCC is being archived for long-term preservation at the IPCC Data Distribution Centre (IPCC DDC) hosted by the German Climate Computing Center (DKRZ).

The project includes simulations from about 120 global climate models and around 45 institutions and organizations worldwide. - Project website: https://pcmdi.llnl.gov/CMIP6.
Employment Of India CLeaned and Messy Data
kaggle.com
zip
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MANSI SHINDE (2025). Employment Of India CLeaned and Messy Data [Dataset]. https://www.kaggle.com/datasets/soniaaaaaaaa/employment-of-india-cleaned-and-messy-data/code
Explore at:
zip(29791 bytes)Available download formats
Dataset updated
Apr 7, 2025
Authors
MANSI SHINDE
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
India
Description
This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.

🔹 Dataset Composition:

It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.

Each record captures multiple attributes related to individuals in the Indian job market, including: - Age Group
- Employment Status (Employed/Unemployed)
- Monthly Salary (INR)
- Education Level
- Industry Sector
- Years of Experience
- Location
- Perceived AI Risk
- Date of Data Recording

Transformations & Cleaning Applied:

The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.

Purpose & Utility:

This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools

It's also useful for: - Training ML models with clean inputs
- Data storytelling with visual clarity
- Demonstrating reproducibility in data cleaning pipelines

By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.
International Data Base, World Population: 1983 Extract
icpsr.umich.edu
search.datacite.org
ascii
Updated Feb 16, 1992
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States. Bureau of the Census (1992). International Data Base, World Population: 1983 Extract [Dataset]. http://doi.org/10.3886/ICPSR08320.v1
Explore at:
asciiAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR08320.v1
Dataset updated
Feb 16, 1992
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States. Bureau of the Census
License
https://www.icpsr.umich.edu/web/ICPSR/studies/8320/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8320/terms
Time period covered
1950 - 1985
Area covered
World
Description
This aggregate data collection is an extract of the International Data Base (IDB), a computerized central repository of demographic, economic, and social data for all countries of the world. Data available in this collection include total midyear population estimates and projections (1950-1985), percent urban population, estimates and projections of crude birth rate, crude death rate, net migration rate, rate of natural increase, and annual growth rate, infant mortality rate and life expectancy at birth by sex, percent literate by sex, and percent of the labor force in agriculture.
World Religions Across Regions
kaggle.com
zip
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). World Religions Across Regions [Dataset]. https://www.kaggle.com/datasets/thedevastator/a-global-perspective-on-world-religions-1945-201
Explore at:
zip(213216 bytes)Available download formats
Dataset updated
Dec 6, 2022
Authors
The Devastator
Area covered
World
Description
World Religions Across Regions

Analyzing Adherence Across Regions, States and the Global System

By Correlates of War Project [source]

About this dataset

The World Religion Project (WRP) is an ambitious endeavor to conduct a comprehensive analysis of religious adherence throughout the world from 1945 to 2010. This cutting-edge project offers unparalleled insight into the religious behavior of people in different countries, regions, and continents during this time period. Its datasets provide important information about the numbers and percentages of adherents across a multitude of different religions, religion families, and non-religious affiliations.

The WRP consists of three distinct datasets: the national religion dataset, regional religion dataset, and global religion dataset. Each is focused on understanding individually specific realms for varied analysis approaches - from individual states to global systems. The national dataset provides data on number of adherents by state as well as percentage population practicing a given faith group in five-year increments; focusing attention to how this number evolves from nation to nation over time. Similarly, regional data is provided at five year intervals highlighting individual region designations with one modification – Pacific Ocean states have been reclassified into their own Oceania category according to Country Code Number 900 or above). Finally at a global level – all states are aggregated in order that we may understand a snapshot view at any five-year interval between 1945‐2010 regarding relationships between religions or religio‐families within one location or transnationally.

This project was developed in three stages: firstly forming a religions tree (a systematic classification), secondly collecting data such as this provided by WRP according to that classification structure – lastly cleaning the data so discrepancies may be reconciled and imported where needed with gaps selected when unknown values were encountered during collection process . We would encourage anyone wishing details undergoing more detailed reading/analysis relating various use applications for these rich datasets - please contact Zeev Maoz (University California Davis) & Errol A Henderson _(Pennsylvania State University)

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The World Religions Project (WRP) dataset offers a comprehensive look at religious adherence around the world within a single dataset. With this dataset, you can track global religious trends over a period of 65 years and explore how they’ve changed during that time. By exploring the WRP data set, you’ll gain insight into cross-regional and cross-time patterns in religious affiliation around the world.

Research Ideas

Analyzing historical patterns of religious growth and decline across different regions

Creating visualizations to compare religious adherence in various states, countries, or globally

Studying the impact of governmental policies on religious participation over time

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: WRP regional data.csv | Column name | Description | |:-----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------| | Year | Reference year for data collection. (Integer) | | Region | World region according to Correlates Of War (COW) Regional Systemizations with one modification (Oceania category for COW country code ...
Internet of Things - number of connected devices worldwide 2015-2025
statista.com
Updated Nov 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2016). Internet of Things - number of connected devices worldwide 2015-2025 [Dataset]. https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/
Explore at:
Dataset updated
Nov 27, 2016
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
By 2025, forecasts suggest that there will be more than ** billion Internet of Things (IoT) connected devices in use. This would be a nearly threefold increase from the IoT installed base in 2019. What is the Internet of Things? The IoT refers to a network of devices that are connected to the internet and can “communicate” with each other. Such devices include daily tech gadgets such as the smartphones and the wearables, smart home devices such as smart meters, as well as industrial devices like smart machines. These smart connected devices are able to gather, share, and analyze information and create actions accordingly. By 2023, global spending on IoT will reach *** trillion U.S. dollars. How does Internet of Things work? IoT devices make use of sensors and processors to collect and analyze data acquired from their environments. The data collected from the sensors will be shared by being sent to a gateway or to other IoT devices. It will then be either sent to and analyzed in the cloud or analyzed locally. By 2025, the data volume created by IoT connections is projected to reach a massive total of **** zettabytes. Privacy and security concerns   Given the amount of data generated by IoT devices, it is no wonder that data privacy and security are among the major concerns with regard to IoT adoption. Once devices are connected to the Internet, they become vulnerable to possible security breaches in the form of hacking, phishing, etc. Frequent data leaks from social media raise earnest concerns about information security standards in today’s world; were the IoT to become the next new reality, serious efforts to create strict security stands need to be prioritized.
w
Panel Data on International Migration 1975-2000 - Australia, Canada,...
microdata.worldbank.org
catalog.ihsn.org
Updated Apr 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maurice Schiff and Mirja Channa Sjoblom (2021). Panel Data on International Migration 1975-2000 - Australia, Canada, Germany, France, United Kingdom, United States [Dataset]. https://microdata.worldbank.org/index.php/catalog/390
Explore at:
Dataset updated
Apr 27, 2021
Dataset authored and provided by
Maurice Schiff and Mirja Channa Sjoblom
Time period covered
1975 - 2000
Area covered
France, Australia, United Kingdom, Germany, Canada, United States
Description
Abstract

This dataset, a product of the Trade Team - Development Research Group, is part of a larger effort in the group to measure the extent of the brain drain as part of the International Migration and Development Program. It measures international skilled migration for the years 1975-2000.

The methodology is explained in: "Tendance de long terme des migrations internationals. Analyse à partir des 6 principaux pays recerveurs", Cécily Defoort.

This data set uses the same methodology as used in the Docquier-Marfouk data set on international migration by educational attainment. The authors use data from 6 key receiving countries in the OECD: Australia, Canada, France, Germany, the UK and the US.

It is estimated that the data represent approximately 77 percent of the world’s migrant population.

Bilateral brain drain rates are estimated based observations for every five years, during the period 1975-2000.

Geographic coverage

Australia, Canada, France, Germany, UK and US

Kind of data

Aggregate data [agg]

Mode of data collection

Other [oth]
Multi Country Study Survey 2000-2001 - Chile
apps.who.int
catalog.ihsn.org
+1more
Updated Jan 17, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Health Organization (WHO) (2014). Multi Country Study Survey 2000-2001 - Chile [Dataset]. https://apps.who.int/healthinfo/systems/surveydata/index.php/catalog/149
Explore at:
Dataset updated
Jan 17, 2014
Dataset provided by
World Health Organizationhttps://who.int/
Authors
World Health Organization (WHO)
Time period covered
2000 - 2001
Area covered
Chile
Description
Abstract

In order to develop various methods of comparable data collection on health and health system responsiveness WHO started a scientific survey study in 2000-2001. This study has used a common survey instrument in nationally representative populations with modular structure for assessing health of indviduals in various domains, health system responsiveness, household health care expenditures, and additional modules in other areas such as adult mortality and health state valuations.

The health module of the survey instrument was based on selected domains of the International Classification of Functioning, Disability and Health (ICF) and was developed after a rigorous scientific review of various existing assessment instruments. The responsiveness module has been the result of ongoing work over the last 2 years that has involved international consultations with experts and key informants and has been informed by the scientific literature and pilot studies.

Questions on household expenditure and proportionate expenditure on health have been borrowed from existing surveys. The survey instrument has been developed in multiple languages using cognitive interviews and cultural applicability tests, stringent psychometric tests for reliability (i.e. test-retest reliability to demonstrate the stability of application) and most importantly, utilizing novel psychometric techniques for cross-population comparability.

The study was carried out in 61 countries completing 71 surveys because two different modes were intentionally used for comparison purposes in 10 countries. Surveys were conducted in different modes of in- person household 90 minute interviews in 14 countries; brief face-to-face interviews in 27 countries and computerized telephone interviews in 2 countries; and postal surveys in 28 countries. All samples were selected from nationally representative sampling frames with a known probability so as to make estimates based on general population parameters.

The survey study tested novel techniques to control the reporting bias between different groups of people in different cultures or demographic groups ( i.e. differential item functioning) so as to produce comparable estimates across cultures and groups. To achieve comparability, the selfreports of individuals of their own health were calibrated against well-known performance tests (i.e. self-report vision was measured against standard Snellen's visual acuity test) or against short descriptions in vignettes that marked known anchor points of difficulty (e.g. people with different levels of mobility such as a paraplegic person or an athlete who runs 4 km each day) so as to adjust the responses for comparability . The same method was also used for self-reports of individuals assessing responsiveness of their health systems where vignettes on different responsiveness domains describing different levels of responsiveness were used to calibrate the individual responses.

This data are useful in their own right to standardize indicators for different domains of health (such as cognition, mobility, self care, affect, usual activities, pain, social participation, etc.) but also provide a better measurement basis for assessing health of the populations in a comparable manner. The data from the surveys can be fed into composite measures such as "Healthy Life Expectancy" and improve the empirical data input for health information systems in different regions of the world. Data from the surveys were also useful to improve the measurement of the responsiveness of different health systems to the legitimate expectations of the population.

Kind of data

Sample survey data [ssd]

Sampling procedure

The telephone directory was used as the sampling frame since it is considered as the most reliable registry available.

Each region was divided into provinces. The provinces are composed of "comunas" or municipalities from within which individuals were randomly selected. However, with this design, there may be a bias towards the population without a telephone.

Final Sample Size=2,078

Mode of data collection

Mail Questionnaire [mail]

Cleaning operations

Data Coding At each site the data was coded by investigators to indicate the respondent status and the selection of the modules for each respondent within the survey design. After the interview was edited by the supervisor and considered adequate it was entered locally.

Data Entry Program A data entry program was developed in WHO specifically for the survey study and provided to the sites. It was developed using a database program called the I-Shell (short for Interview Shell), a tool designed for easy development of computerized questionnaires and data entry (34). This program allows for easy data cleaning and processing.

The data entry program checked for inconsistencies and validated the entries in each field by checking for valid response categories and range checks. For example, the program didn’t accept an age greater than 120. For almost all of the variables there existed a range or a list of possible values that the program checked for.

In addition, the data was entered twice to capture other data entry errors. The data entry program was able to warn the user whenever a value that did not match the first entry was entered at the second data entry. In this case the program asked the user to resolve the conflict by choosing either the 1st or the 2nd data entry value to be able to continue. After the second data entry was completed successfully, the data entry program placed a mark in the database in order to enable the checking of whether this process had been completed for each and every case.

Data Transfer The data entry program was capable of exporting the data that was entered into one compressed database file which could be easily sent to WHO using email attachments or a file transfer program onto a secure server no matter how many cases were in the file. The sites were allowed the use of as many computers and as many data entry personnel as they wanted. Each computer used for this purpose produced one file and they were merged once they were delivered to WHO with the help of other programs that were built for automating the process. The sites sent the data periodically as they collected it enabling the checking procedures and preliminary analyses in the early stages of the data collection.

Data quality checks Once the data was received it was analyzed for missing information, invalid responses and representativeness. Inconsistencies were also noted and reported back to sites.

Data Cleaning and Feedback After receipt of cleaned data from sites, another program was run to check for missing information, incorrect information (e.g. wrong use of center codes), duplicated data, etc. The output of this program was fed back to sites regularly. Mainly, this consisted of cases with duplicate IDs, duplicate cases (where the data for two respondents with different IDs were identical), wrong country codes, missing age, sex, education and some other important variables.
d
Component parts of the World Heat Flow Data Collection
search.dataone.org
Updated Jan 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clement, M; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G (2018). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.809582
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.809582
Dataset updated
Jan 6, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Clement, M; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G
Area covered

Description
No description is available. Visit https://dataone.org/datasets/5d041e4bfaf4ea361dd3135126134720 for complete metadata about this dataset.
Wonders of the World Image Dataset
kaggle.com
zip
Updated May 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bala Baskar (2022). Wonders of the World Image Dataset [Dataset]. https://www.kaggle.com/datasets/balabaskar/wonders-of-the-world-image-classification
Explore at:
zip(453078359 bytes)Available download formats
Dataset updated
May 3, 2022
Authors
Bala Baskar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introduction

The New 7 Wonders of the World was a campaign started in 2000 to choose Wonders of the World from a selection of 200 existing monuments. The popularity poll via free Web-based voting and small amounts of telephone voting was led by Canadian-Swiss Bernard Weber and organized by the New 7 Wonders Foundation (N7W) based in Zurich, Switzerland, with winners announced on 7 July 2007 in Lisbon, at Estádio da Luz. The poll was considered unscientific partly because it was possible for people to cast multiple votes.

Context

When someday, if we plan to go on a World tour, obviously there is going to be a bucket list of wonders or places around the world, that we wish to visit. Here, we have one set of "Wonders of the World" images scraped from Google Images. Let us use our deep learning skills to build multiclass classification to identify the place in the images.

Data Preparation

This dataset contains a total of 3846 images placed in folders, with which each folder representing one of the top new wonders of the world. Below is the list of wonders with images extracted from Google Images.

Venezuela Angel Falls

Taj Mahal

Stonehenge

Statue of Liberty

Chichen Itz

Christ the Redeemer

Pyramids of Giza

Eiffel Tower

Great Wall of China

Burj Khalifa

Roman Colosseum

Machu Pichu
d
Component parts of the World Heat Flow Data Collection
search.dataone.org
Updated Jan 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans, T R; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G (2018). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.806998
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.806998
Dataset updated
Jan 5, 2018
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Evans, T R; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G
Area covered

Description
No description is available. Visit https://dataone.org/datasets/c9d6507f203308063a16ce22ba032540 for complete metadata about this dataset.
d
Component parts of the World Heat Flow Data Collection
dataone.org
doi.pangaea.de
+1more
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roger H Morin; Richard P von Herzen (2025). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.805302
Explore at:
Unique identifier
https://doi.org/10.1594/PANGAEA.805302
Dataset updated
Nov 21, 2025
Dataset provided by
PANGAEA Data Publisher for Earth and Environmental Science
Authors
Roger H Morin; Richard P von Herzen
Area covered

Description
This dataset is about: Component parts of the World Heat Flow Data Collection.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mario Caesar (2023). Our World in Data - COVID-19 [Dataset]. https://www.kaggle.com/datasets/caesarmario/our-world-in-data-covid19-dataset/code

Our World in Data - COVID-19

COVID-19 Dataset by Our World in Data

Explore at:

zip(14235238 bytes)Available download formats

Dataset updated

Oct 25, 2023

Authors

Mario Caesar

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Our World in Data - COVID-19

▶ About Our World in Data 🏢

▶ Similar Datasets 📄

▶ Context 📝

The complete COVID-19 dataset is a collection of the COVID-19 data maintained and provided by Our World in Data. Our World in Data team will update it daily throughout the duration of the COVID-19 pandemic.

▶ Content 📃

These are the following information that includes in the dataset: | Metrics | Source | Updated | Countries | | --- | --- | | Vaccinations | Official data collated by the Our World in Data team | Daily | 218 | | Tests & positivity | Official data collated by the Our World in Data team | Weekly | 139 | | Hospital & ICU | Official data collated by the Our World in Data team | Weekly | 39 | | Confirmed cases | JHU CSSE COVID-19 Data | Daily | 196 | | Confirmed deaths | JHU CSSE COVID-19 Data | Daily | 196 | | Reproduction rate | Arroyo-Marioli F, Bullano F, Kucinskas S, Rondón-Moreno C | Daily | 185 | | Policy responses | Oxford COVID-19 Government Response Tracker | Daily | 186 | | Other variables of interest | International organizations (UN, World Bank, OECD, IHME…) | Fixed |

Data dictionary is available below ⤵

▶ Acknowledgements 🙏

I'd like to clarify that I'm only making data about vaccines collected by Our World in Data available to Kaggle community. This dataset is gathered, integrated, and posted the new version on a daily basis, as maintained by Our World in Data on their GitHub repository.

▶ Inspiration 💭

Forecasting daily new confirmed cases of COVID-19 in specific country.
Perform data analysis/data visualization of COVID-19 cases/death/etc.

📷 Images by Fusion Medical Animation.

Clear search

Close search

Google apps

Main menu

Our World in Data - COVID-19

Our World in Data - COVID-19

▶ About Our World in Data 🏢

▶ Similar Datasets 📄

▶ Context 📝

▶ Content 📃

▶ Acknowledgements 🙏

▶ Inspiration 💭

Data generation volume worldwide 2010-2029

The World Bank Listening to LAC (L2L) Pilot 2012 - Honduras

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

World Bank Indicators (1960‑Present)

Overview

Why this Dataset?

Dataset Collection Contents

Usage

Collection Methodology

Modifications

Source & Attribution

License

Data from: World Data Bank II: North America, South America, Europe, Africa,...

The World Bank Listening to LAC (L2L) Pilot 2011 - Peru

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Sampling deviation

Mode of data collection

Research instrument

Response rate

Multi Country Study Survey 2000-2001 - Sweden

Abstract

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Component parts of the World Heat Flow Data Collection

Multi Country Study Survey 2000-2001 - Iceland

Abstract

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

Data from: EC-Earth-Consortium EC-Earth3 model output prepared for CMIP6...

Employment Of India CLeaned and Messy Data

🔹 Dataset Composition:

Transformations & Cleaning Applied:

Purpose & Utility:

International Data Base, World Population: 1983 Extract

World Religions Across Regions

World Religions Across Regions

Analyzing Adherence Across Regions, States and the Global System

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Internet of Things - number of connected devices worldwide 2015-2025

Panel Data on International Migration 1975-2000 - Australia, Canada,...

Abstract

Geographic coverage

Kind of data

Mode of data collection

Multi Country Study Survey 2000-2001 - Chile

Abstract

Kind of data

Sampling procedure

Mode of data collection