100+ datasets found
  1. Our World in Data - COVID-19

    • kaggle.com
    zip
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Caesar (2023). Our World in Data - COVID-19 [Dataset]. https://www.kaggle.com/datasets/caesarmario/our-world-in-data-covid19-dataset/code
    Explore at:
    zip(14235238 bytes)Available download formats
    Dataset updated
    Oct 25, 2023
    Authors
    Mario Caesar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Our World in Data - COVID-19

    ▶ About Our World in Data 🏢

    ▶ Similar Datasets 📄

    ▶ Context 📝

    The complete COVID-19 dataset is a collection of the COVID-19 data maintained and provided by Our World in Data. Our World in Data team will update it daily throughout the duration of the COVID-19 pandemic.

    ▶ Content 📃

    These are the following information that includes in the dataset: | Metrics | Source | Updated | Countries | | --- | --- | | Vaccinations | Official data collated by the Our World in Data team | Daily | 218 | | Tests & positivity | Official data collated by the Our World in Data team | Weekly | 139 | | Hospital & ICU | Official data collated by the Our World in Data team | Weekly | 39 | | Confirmed cases | JHU CSSE COVID-19 Data | Daily | 196 | | Confirmed deaths | JHU CSSE COVID-19 Data | Daily | 196 | | Reproduction rate | Arroyo-Marioli F, Bullano F, Kucinskas S, Rondón-Moreno C | Daily | 185 | | Policy responses | Oxford COVID-19 Government Response Tracker | Daily | 186 | | Other variables of interest | International organizations (UN, World Bank, OECD, IHME…) | Fixed |

    Data dictionary is available below ⤵

    ▶ Acknowledgements 🙏

    I'd like to clarify that I'm only making data about vaccines collected by Our World in Data available to Kaggle community. This dataset is gathered, integrated, and posted the new version on a daily basis, as maintained by Our World in Data on their GitHub repository.

    ▶ Inspiration 💭

    • Forecasting daily new confirmed cases of COVID-19 in specific country.
    • Perform data analysis/data visualization of COVID-19 cases/death/etc.

    📷 Images by Fusion Medical Animation.

  2. Data generation volume worldwide 2010-2029

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

  3. The World Bank Listening to LAC (L2L) Pilot 2012 - Honduras

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Jul 8, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2014). The World Bank Listening to LAC (L2L) Pilot 2012 - Honduras [Dataset]. https://microdata.worldbank.org/index.php/catalog/2021
    Explore at:
    Dataset updated
    Jul 8, 2014
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    Time period covered
    2012
    Area covered
    Honduras
    Description

    Abstract

    The rapid and massive dissemination of mobile phones in the developing world is creating new opportunities for the discipline of survey research. The World Bank is interested in leveraging mobile phone technology as a means of direct communication with poor households in the developing world in order to gather rapid feedback on the impact of economic crises and other events on the economy of such households.

    The World Bank commissioned Gallup to conduct the Listening to LAC (L2L) pilot program, a research project aimed at testing the feasibility of mobile phone technology as a way of data collection for conducting quick turnaround, self-administered, longitudinal surveys among households in Peru and Honduras.

    The project used face-to-face interviews as its benchmark, and included Short Message Service (SMS), Interactive Voice Response (IVR) and Computer Assisted Telephone Interviews (CATI) as test methods of data collection.

    The pilot was designed in a way that allowed testing the response rates and the quality of data, while also providing information on the cost of collecting data using mobile phones. Researchers also evaluated if providing incentives affected panel attrition rates. The Honduras design was a test-retest design, which is closely related to the difference-in-difference methodology of experimental evaluation.

    The random stratified multistage sampling technique was used to select a nationally representative sample of 1,500 households. During the initial face-to-face interviews, researchers gathered information on the socio-economic characteristics of households and recruited participants for follow-up research. Questions wording was the same in all modes of data collection.

    In Honduras, after the initial face-to-face interviews, respondents were exposed to the remaining three methodologies according to a randomized scheme (three rotations, one methodology per week). Panelists in Honduras were surveyed for four and a half months, starting in February 2012.

    Geographic coverage

    Includes the entire national territory, with the exception of neighborhoods where access of interviewers is extremely difficult, due to lack of transportation infrastructure or for situations that threaten the physical integrity of the interviewers and supervisors (i.e. extremely high crime rate, warfare, etc.)

    Analysis unit

    • Households

    Universe

    All the households that exist in the neighborhoods of Honduras, as reported by the 2001 Census. Institutions such as military, religious or educational living quarters are not included in the universe.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Honduras did not have an income oversample because the poverty rate is 60 percent, so oversampling 20 percent above the poverty rate would include a large portion of the middle class, which are not the most vulnerable in times of crisis.

    The Honduras panel was built on a nationally representative sample of 1,500 households. The sample was drawn by means of a random, stratified, multistage design. The pilot used Gallup World Poll sampling frame.

    Census-defined municipalities were classified into five strata according to population size: I. Municipalities with 500,000 to 999,000 inhabitants II. Municipalities with 100,000 to 499,000 inhabitants III. Municipalities with 50,000 to 99,000 inhabitants IV. Municipalities with 10,000 and 49,000 inhabitants V. Municipalities with less than 10,000 inhabitants

    Interviews were then proportionally allocated to these five strata according to their share among the country's population.

    • The first stage of the design consisted of a random selection of Primary Sampling Units (PSU's) within each of the five strata previously defined.

    • In the second stage, in each PSU, one or more Secondary Sampling Units (SSU's) were then selected.

    • Once SSU's were selected, interviewers were sent to the field to proceed with the third stage of the sample design, which consisted of selecting households using a systematic "random route" procedure. Interviewers started from the previously selected "random origin" and walked around the block in clockwise direction, selecting every third household on their right hand side. They were also trained to handle vacant, nonresponsive, non-cooperative households, as well as other failed attempts, in a systematic manner.

    Mode of data collection

    Other [oth]

    Research instrument

    The following survey instruments were used in the project:

    1) Initial face-to-face questionnaire

    In Peru, the starting point was the ENAHO (National Household Survey) questionnaire. Step-wise regressions were done to select the set of questions that best predicted consumption. For the purposes of robustness, the regressions were also done with questions that best predicted income, which yielded the same results. A similar procedure was done in Honduras, using the latest household survey deployed by the Honduran Statistics Institute, except that only best predictors of income were chosen, because Honduras did not have a recent consumption aggregate.

    The survey gathered information on households' demographics, household infrastructure, employment, remittances, income, accidents, food security, self-perceptions on poverty, Internet access and cellphones use.

    2) Monthly questionnaires (SMS, IVR, CATI)

    The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters. A maximum of 10 questions had to be chosen for the monthly questionnaire. In addition, two questions sought to ensure the validity of the responses by testing if the respondent was a member of the household. Most questions were time-variant and each questionnaire was repeated to observe if answers changed over time. All questions related to variables that strongly affect household welfare and are likely to change in times of crisis.

    3) Final face-to-face questionnaire

    Gallup conducted face-to-face closing surveys among 700 panelists. The researchers asked about issues the respondets had with mobile phones and coverage during the test. Panelists were also asked what would motivate them to keep on participating in a project like this in the future.

    The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters, unlike IVR and CATI.

    Response rate

    In Honduras, 41% of recruited households failed to answer the first round of follow-up surveys. The attrition rate from the initial face-to-face interview to the end of panel study was 50%.

    As part of the survey administration process Gallup implemented a number of mechanisms to maximize the response rate and panelist retention. The following strategies were applied to respondents who did not replay first time:

    • The surveys were left open for responses for up to 2 weeks after the original transmission of the survey (from original call in the case of IVR and CATI).
    • First reminder was sent within 72 hours of first attempt (SMS and IVR).
    • Second reminder was sent within 144 hours of first attempt (SMS and IVR).
    • Call backs were made within 72 and 144 hours of first attempt (CATI); or
    • Up to 2 call backs were made per appointment with respondent (CATI).

    Also, in order to minimize non-response, three types of incentives were given. First, households that did not own a mobile phone were provided one for free. Approximately 127 phones were donated in Honduras. Second, all communications between the interviewers and the households were free to the respondents. Finally, households were randomly assigned to one of three incentive levels: one-third of households received US$1 in free airtime for each questionnaire they answered, one-third received US$5 in free airtime, and one-third received no financial incentive (the control group).

  4. World Bank Indicators (1960‑Present)

    • kaggle.com
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George DiNicola (2025). World Bank Indicators (1960‑Present) [Dataset]. https://www.kaggle.com/datasets/georgejdinicola/world-bank-indicators
    Explore at:
    zip(52559856 bytes)Available download formats
    Dataset updated
    May 29, 2025
    Authors
    George DiNicola
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset provides a comprehensive collection of time series data sourced from the World Bank Open Data Platform, covering a wide range of global indicators from 1960 to the most recently published year. It includes economic, social, environmental, and demographic metrics, making it an ideal resource for researchers, data scientists, and policymakers interested in global development trends, economic forecasting, or socio-economic analysis.

    A tutorial on how to combined the dataset topics together into one large dataset can be found here

    Why this Dataset?

    My motivation for this project was to curate a high-quality collection of datasets for World Bank indicators organized by topics and structured in time-series, making them more accessible for data science projects. Since the World Bank’s Kaggle datasets have not been updated since 2019 https://www.kaggle.com/organizations/theworldbank, I saw an opportunity to provide more current data for the data analysis community.

    Dataset Collection Contents

    This collection brings together more than 800 World Bank indicators organized into 18 topic‑specific CSV files. Each file is structured as a country‑year panel: every row represents a unique combination of year (1960‑present) and ISO‑3 country code, while the columns hold the topic’s indicators.

    The collection includes datasets with a variety of indicators, such as: - Economic Metrics: GDP growth (%), GDP per capita, consumer price inflation, merchandise trade, gross capital formation, and more.
    - Social Metrics: School enrollment (primary, secondary, tertiary), infant mortality rate, maternal mortality rate, poverty headcount, and more.
    - Environmental Metrics: Forest area, renewable energy consumption, food production indices, and more.
    - Demographic Metrics: Urban population, life expectancy, net migration, and more.

    Usage

    This dataset is ideal for a variety of applications, including: - Economic forecasting and trend analysis (e.g., GDP growth, inflation).
    - Socio-economic studies (e.g., education, health, poverty).
    - Environmental impact analysis (e.g., renewable energy adoption).
    - Demographic research (e.g., population trends, migration).

    Topic datasets can be merged with each other using year and country code. This tutorial with notebook code can help you get started quickly.

    Collection Methodology

    The data is collected via a custom software application that discovers and groups high-quality indicators with rules-based logic & artificial intelligence, generates metadata, and performs ETL for the data from the World Bank API. The result is a clean, up‑to‑date collection of World Bank indicators in time-series format that is ready for analysis—no manual downloads or data wrangling required.

    Modifications

    The original World Bank data has been aggregated and transformed for ease of use. Missing values have been preserved as provided by the World Bank, and no significant transformations have been applied beyond formatting and aggregation into a single file.

    Source & Attribution

    The World Bank: World Development Indicators

    This dataset is publicly available and sourced from the World Bank Open Data Platform and is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. When using this data, please attribute the World Bank as follows: "Data sourced from the World Bank, licensed under CC BY 4.0." For more details on the World Bank’s terms of use, visit: https://www.worldbank.org/en/about/legal/terms-of-use-for-datasets.

    License

    This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Feel free to use this data in Kaggle notebooks, academic research, or policy analysis. If you create a derived dataset or analysis, I encourage you to share it with the Kaggle community.

  5. Data from: World Data Bank II: North America, South America, Europe, Africa,...

    • icpsr.umich.edu
    ascii
    Updated Jan 18, 2006
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States. Central Intelligence Agency (2006). World Data Bank II: North America, South America, Europe, Africa, Asia [Dataset]. http://doi.org/10.3886/ICPSR08376.v1
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Jan 18, 2006
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States. Central Intelligence Agency
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/8376/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8376/terms

    Area covered
    Africa, Asia, Americas, South America, North America, Europe, United States, Canada
    Description

    The boundaries of five different geographic areas -- North America, South America, Europe, Africa, and Asia -- are digitally represented in this collection of data files that can be used in the production of computer maps. Each of the five areas is encoded in three distinct files: (1) coastline, islands, and lakes, (2) rivers, and (3) international boundaries. There is an additional file for North America (Part 4: North America: Internal Boundaries) delineating state lines in the United States and provincial boundaries in Canada. The data in each of the files is hierarchically structured into subordinate geographic features and ranks, which may be used for output plotting symbol definition. The mapping scale used to encode the data ranged from 1:1 million to 1:4 million.

  6. The World Bank Listening to LAC (L2L) Pilot 2011 - Peru

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Jul 8, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2014). The World Bank Listening to LAC (L2L) Pilot 2011 - Peru [Dataset]. https://microdata.worldbank.org/index.php/catalog/2022
    Explore at:
    Dataset updated
    Jul 8, 2014
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    Time period covered
    2011 - 2012
    Area covered
    Peru
    Description

    Abstract

    The rapid and massive dissemination of mobile phones in the developing world is creating new opportunities for the discipline of survey research. The World Bank is interested in leveraging mobile phone technology as a means of direct communication with poor households in the developing world in order to gather rapid feedback on the impact of economic crises and other events on the economy of such households.

    The World Bank commissioned Gallup to conduct the Listening to LAC (L2L) pilot program, a research project aimed at testing the feasibility of mobile phone technology as a way of data collection for conducting quick turnaround, self-administered, longitudinal surveys among households in Peru and Honduras.

    The project used face-to-face interviews as its benchmark, and included Short Message Service (SMS), Interactive Voice Response (IVR) and Computer Assisted Telephone Interviews (CATI) as test methods of data collection.

    The pilot was designed in a way that allowed testing the response rates and the quality of data, while also providing information on the cost of collecting data using mobile phones. Researchers also evaluated if providing incentives affected panel attrition rates.

    The random stratified multistage sampling technique was used to select a nationally representative sample of 1,500 households. During the initial face-to-face interviews, researchers gathered information on the socio-economic characteristics of households and recruited participants for follow-up research. Questions wording was the same in all modes of data collection.

    In Peru, households were randomly assigned to a communication mode (SMS, IVR, CATI), which stayed constant for all rounds (waves) of the survey.

    Geographic coverage

    Includes the entire national territory, with the exception of neighborhoods where access of interviewers is extremely difficult, due to lack of transportation infrastructure or for situations that threaten the physical integrity of the interviewers and supervisors (i.e. extremely high crime rate, warfare, etc.)

    Analysis unit

    • Households

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Peru panel was built on a nationally representative sample of 1,500 households. The sample was based on the sampling frame for the National Household Survey (ENAHO) conducted by the Peruvian National Statistics Office (INEI) every three months.

    In Peru, the sample selection was guided by the following criteria: (i) the sample should be representative nationally, and in urban and rural areas, and (ii) households close to poverty line should be oversampled because policy decisions in time of crises need to be especially mindful of the poor and vulnerable. For the purposes of this project, "close to poverty line" was defined as 40 percent of consumption distribution that symmetrically band the national poverty line: 20 percent above and 20 percent below. In 27 percent of Peruvian households monthly per capita consumption was below the moderate poverty line in 2010 (ENAHO).Those households whose monthly per capita consumption falls between 7 and 47 percent of the national distribution were oversampled.

    The L2L sample frame comprises all the panel conglomerados from the fourth trimester of ENAHO 2010, or 281 conglomerados.

    Detailed information about the sampling procedure is available in "Listening to LAC: Using Mobile Phones for High Frequency Data Collection, Final Report" (p. 65-69) and "The World Bank Listening to LAC (L2L) Pilot Project Sample Design for Peru."

    Sampling deviation

    A number of restive communities in Peru did not allow Gallup's interviewers to enter the area. Where possible, these were replaced following INEI's standard methodology. When confronted with a problem in a particular location, INEI moves to the next "Centro Poblado" in the same "Conglomerado."

    Mode of data collection

    Other [oth]

    Research instrument

    The following survey instruments were used in the project:

    1) Initial face-to-face questionnaire

    In Peru, the starting point was the ENAHO (National Household Survey) questionnaire. Step-wise regressions were done to select the set of questions that best predicted consumption. For the purposes of robustness, the regressions were also done with questions that best predicted income, which yielded the same results.

    The survey gathered information on households' demographics, household infrastructure, employment, remittances, income, accidents, food security, self-perceptions on poverty, Internet access and cellphones use.

    2) Monthly questionnaires (SMS, IVR, CATI)

    The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters. A maximum of 10 questions had to be chosen for the monthly questionnaire. In addition, two questions sought to ensure the validity of the responses by testing if the respondent was a member of the household. Most questions were time-variant and each questionnaire was repeated to observe if answers changed over time. All questions related to variables that strongly affect household welfare and are likely to change in times of crisis.

    A maximum of 10 questions was chosen for the monthly questionnaire. In addition, two questions sought to ensure the validity of the responses by testing if the respondent was a member of the household. To accomplish this, the first two questions in each monthly questionnaire asked the respondent for their gender and year of birth, and the answers were compared to the household roster obtained during the face-to-face interview.

    3) Final face-to-face questionnaire

    Gallup conducted face-to-face closing surveys among 700 panelists. The researchers asked about issues the respondets had with mobile phones and coverage during the test. Panelists were also asked what would motivate them to keep on participating in a project like this in the future.

    Response rate

    In Peru, 67 percent of recruited households failed to answer the first round of follow-up surveys. Attrition slightly increased with each wave of the survey (between 1 and 3 percentage points per wave), reaching 75 percent in wave 6.

    As part of the survey administration process Gallup implemented a number of mechanisms to maximize the response rate and panelist retention. The following strategies were applied to respondents who did not replay first time:

    • The surveys were left open for responses for up to 2 weeks after the original transmission of the survey (from original call in the case of IVR and CATI).
    • First reminder was sent within 72 hours of first attempt (SMS and IVR).
    • Second reminder was sent within 144 hours of first attempt (SMS and IVR).
    • Call backs were made within 72 and 144 hours of first attempt (CATI); or
    • Up to 2 call backs were made per appointment with respondent (CATI).

    Also, in order to minimize non-response, three types of incentives were given. First, households that did not own a mobile phone were provided one for free. Approximately 200 phones were donated in Peru. Second, all communications between the interviewers and the households were free to the respondents. Finally, households were randomly assigned to one of three incentive levels: one-third of households received US$1 in free airtime for each questionnaire they answered, one-third received US$5 in free airtime, and one-third received no financial incentive (the control group).

  7. Multi Country Study Survey 2000-2001 - Sweden

    • apps.who.int
    • catalog.ihsn.org
    • +1more
    Updated Jan 23, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2014). Multi Country Study Survey 2000-2001 - Sweden [Dataset]. https://apps.who.int/healthinfo/systems/surveydata/index.php/catalog/159
    Explore at:
    Dataset updated
    Jan 23, 2014
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2000 - 2001
    Area covered
    Sweden
    Description

    Abstract

    In order to develop various methods of comparable data collection on health and health system responsiveness WHO started a scientific survey study in 2000-2001. This study has used a common survey instrument in nationally representative populations with modular structure for assessing health of indviduals in various domains, health system responsiveness, household health care expenditures, and additional modules in other areas such as adult mortality and health state valuations.

    The health module of the survey instrument was based on selected domains of the International Classification of Functioning, Disability and Health (ICF) and was developed after a rigorous scientific review of various existing assessment instruments. The responsiveness module has been the result of ongoing work over the last 2 years that has involved international consultations with experts and key informants and has been informed by the scientific literature and pilot studies.

    Questions on household expenditure and proportionate expenditure on health have been borrowed from existing surveys. The survey instrument has been developed in multiple languages using cognitive interviews and cultural applicability tests, stringent psychometric tests for reliability (i.e. test-retest reliability to demonstrate the stability of application) and most importantly, utilizing novel psychometric techniques for cross-population comparability.

    The study was carried out in 61 countries completing 71 surveys because two different modes were intentionally used for comparison purposes in 10 countries. Surveys were conducted in different modes of in- person household 90 minute interviews in 14 countries; brief face-to-face interviews in 27 countries and computerized telephone interviews in 2 countries; and postal surveys in 28 countries. All samples were selected from nationally representative sampling frames with a known probability so as to make estimates based on general population parameters.

    The survey study tested novel techniques to control the reporting bias between different groups of people in different cultures or demographic groups ( i.e. differential item functioning) so as to produce comparable estimates across cultures and groups. To achieve comparability, the selfreports of individuals of their own health were calibrated against well-known performance tests (i.e. self-report vision was measured against standard Snellen's visual acuity test) or against short descriptions in vignettes that marked known anchor points of difficulty (e.g. people with different levels of mobility such as a paraplegic person or an athlete who runs 4 km each day) so as to adjust the responses for comparability . The same method was also used for self-reports of individuals assessing responsiveness of their health systems where vignettes on different responsiveness domains describing different levels of responsiveness were used to calibrate the individual responses.

    This data are useful in their own right to standardize indicators for different domains of health (such as cognition, mobility, self care, affect, usual activities, pain, social participation, etc.) but also provide a better measurement basis for assessing health of the populations in a comparable manner. The data from the surveys can be fed into composite measures such as "Healthy Life Expectancy" and improve the empirical data input for health information systems in different regions of the world. Data from the surveys were also useful to improve the measurement of the responsiveness of different health systems to the legitimate expectations of the population.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The metropolitan, urban and rural population and all .administrative regional units. as defined in Official Europe Union Statistics (NUTS 2) covered proportionately the respective population aged 18 and above. The country was divided into an appropriate number of areas, grouping NUTS regions at whatever level appropriately. The NUTS covered in Sweden were the following; Stockholm/Södertäjle A-Region, Gothenburgs A-Region, Malmö/Lund/Trelleborgs A-region, Semi urban area, Rural area.

    The basic sample design was a multi-stage, random probability sample. 100 sampling points were drawn with probability proportional to population size, for a total coverage of the country. The sampling points were drawn after stratification by NUTS 2 region and by degree of urbanisation. They represented the whole territory of the country surveyed and are selected proportionally to the distribution of the population in terms of metropolitan, urban and rural areas. In each of the selected sampling points, one address was drawn at random. This starting address forms the first address of a cluster of a maximum of 20 addresses. The remainder of the cluster was selected as every Nth address by standard random route procedure from the initial address. In theory, there is no maximum number of addresses issued per country. Procedures for random household selection and random respondent selection are independent of the interviewer.s decision and controlled by the institute responsible. They should be as identical as possible from to country, full functional equivalence being a must.

    At every address up to 4 recalls were made to attempt to achieve an interview with the selected respondent. There was only one interview per household. The final sample size is 1,000 completed interviews.

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    Data Coding At each site the data was coded by investigators to indicate the respondent status and the selection of the modules for each respondent within the survey design. After the interview was edited by the supervisor and considered adequate it was entered locally.

    Data Entry Program A data entry program was developed in WHO specifically for the survey study and provided to the sites. It was developed using a database program called the I-Shell (short for Interview Shell), a tool designed for easy development of computerized questionnaires and data entry (34). This program allows for easy data cleaning and processing.

    The data entry program checked for inconsistencies and validated the entries in each field by checking for valid response categories and range checks. For example, the program didn’t accept an age greater than 120. For almost all of the variables there existed a range or a list of possible values that the program checked for.

    In addition, the data was entered twice to capture other data entry errors. The data entry program was able to warn the user whenever a value that did not match the first entry was entered at the second data entry. In this case the program asked the user to resolve the conflict by choosing either the 1st or the 2nd data entry value to be able to continue. After the second data entry was completed successfully, the data entry program placed a mark in the database in order to enable the checking of whether this process had been completed for each and every case.

    Data Transfer The data entry program was capable of exporting the data that was entered into one compressed database file which could be easily sent to WHO using email attachments or a file transfer program onto a secure server no matter how many cases were in the file. The sites were allowed the use of as many computers and as many data entry personnel as they wanted. Each computer used for this purpose produced one file and they were merged once they were delivered to WHO with the help of other programs that were built for automating the process. The sites sent the data periodically as they collected it enabling the checking procedures and preliminary analyses in the early stages of the data collection.

    Data quality checks Once the data was received it was analyzed for missing information, invalid responses and representativeness. Inconsistencies were also noted and reported back to sites.

    Data Cleaning and Feedback After receipt of cleaned data from sites, another program was run to check for missing information, incorrect information (e.g. wrong use of center codes), duplicated data, etc. The output of this program was fed back to sites regularly. Mainly, this consisted of cases with duplicate IDs, duplicate cases (where the data for two respondents with different IDs were identical), wrong country codes, missing age, sex, education and some other important variables.

  8. Component parts of the World Heat Flow Data Collection

    • doi.pangaea.de
    • dataone.org
    html, tsv
    Updated Apr 11, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global Heat Flow Compilation Group (2013). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.810104
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    Apr 11, 2013
    Dataset provided by
    PANGAEA
    Authors
    Global Heat Flow Compilation Group
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Variables measured
    Number, Comment, LATITUDE, ELEVATION, Heat flow, LONGITUDE, Area/locality, Depth, top/min, Method comment, Reference/source, and 8 more
    Description

    This data set is a compilation of heat flow data of uncertain origin. References as cited in Global Heat Flow Database were incomplete and thus could not be verified. This data compilation contains: data of unknown origin, unpublished data, data which has no full reference information or data which were extracted from other database. The remaining short citation and its related problem are listed in columns 18 and 19.

  9. Multi Country Study Survey 2000-2001 - Iceland

    • apps.who.int
    • datacatalog.ihsn.org
    • +1more
    Updated Jan 17, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2014). Multi Country Study Survey 2000-2001 - Iceland [Dataset]. https://apps.who.int/healthinfo/systems/surveydata/index.php/catalog/174
    Explore at:
    Dataset updated
    Jan 17, 2014
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2000 - 2001
    Area covered
    Iceland
    Description

    Abstract

    In order to develop various methods of comparable data collection on health and health system responsiveness WHO started a scientific survey study in 2000-2001. This study has used a common survey instrument in nationally representative populations with modular structure for assessing health of indviduals in various domains, health system responsiveness, household health care expenditures, and additional modules in other areas such as adult mortality and health state valuations.

    The health module of the survey instrument was based on selected domains of the International Classification of Functioning, Disability and Health (ICF) and was developed after a rigorous scientific review of various existing assessment instruments. The responsiveness module has been the result of ongoing work over the last 2 years that has involved international consultations with experts and key informants and has been informed by the scientific literature and pilot studies.

    Questions on household expenditure and proportionate expenditure on health have been borrowed from existing surveys. The survey instrument has been developed in multiple languages using cognitive interviews and cultural applicability tests, stringent psychometric tests for reliability (i.e. test-retest reliability to demonstrate the stability of application) and most importantly, utilizing novel psychometric techniques for cross-population comparability.

    The study was carried out in 61 countries completing 71 surveys because two different modes were intentionally used for comparison purposes in 10 countries. Surveys were conducted in different modes of in- person household 90 minute interviews in 14 countries; brief face-to-face interviews in 27 countries and computerized telephone interviews in 2 countries; and postal surveys in 28 countries. All samples were selected from nationally representative sampling frames with a known probability so as to make estimates based on general population parameters.

    The survey study tested novel techniques to control the reporting bias between different groups of people in different cultures or demographic groups ( i.e. differential item functioning) so as to produce comparable estimates across cultures and groups. To achieve comparability, the selfreports of individuals of their own health were calibrated against well-known performance tests (i.e. self-report vision was measured against standard Snellen's visual acuity test) or against short descriptions in vignettes that marked known anchor points of difficulty (e.g. people with different levels of mobility such as a paraplegic person or an athlete who runs 4 km each day) so as to adjust the responses for comparability . The same method was also used for self-reports of individuals assessing responsiveness of their health systems where vignettes on different responsiveness domains describing different levels of responsiveness were used to calibrate the individual responses.

    This data are useful in their own right to standardize indicators for different domains of health (such as cognition, mobility, self care, affect, usual activities, pain, social participation, etc.) but also provide a better measurement basis for assessing health of the populations in a comparable manner. The data from the surveys can be fed into composite measures such as "Healthy Life Expectancy" and improve the empirical data input for health information systems in different regions of the world. Data from the surveys were also useful to improve the measurement of the responsiveness of different health systems to the legitimate expectations of the population.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The metropolitan, urban and rural population and all .administrative regional units. as defined in Official Europe Union Statistics (NUTS 2) covered proportionately the respective population aged 18 and above. The country was divided into an appropriate number of areas, grouping NUTS regions at whatever level appropriately. The NUTS covered in Iceland were the following; Reykjavik, Near Reykjavik and Sudurnes, West-Iceland, North-Iceland, East-Iceland, South-Iceland.

    The basic sample design was a multi-stage, random probability sample. 50 sampling points were drawn with probability proportional to population size, for a total coverage of the country. The sampling points were drawn after stratification by NUTS 2 region and by degree of urbanisation. They represented the whole territory of the country surveyed and are selected proportionally to the distribution of the population in terms of metropolitan, urban and rural areas. In each of the selected sampling points, one address was drawn at random. This starting address forms the first address of a cluster of a maximum of 20 addresses. The remainder of the cluster was selected as every Nth address by standard random route procedure from the initial address. In theory, there is no maximum number of addresses issued per country. Procedures for random household selection and random respondent selection are independent of the interviewer.s decision and controlled by the institute responsible. They should be as identical as possible from to country, full functional equivalence being a must.

    At every address up to 4 recalls were made to attempt to achieve an interview with the selected respondent. There was only one interview per household. The final sample size is 489 completed interviews.

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    Data Coding At each site the data was coded by investigators to indicate the respondent status and the selection of the modules for each respondent within the survey design. After the interview was edited by the supervisor and considered adequate it was entered locally.

    Data Entry Program A data entry program was developed in WHO specifically for the survey study and provided to the sites. It was developed using a database program called the I-Shell (short for Interview Shell), a tool designed for easy development of computerized questionnaires and data entry (34). This program allows for easy data cleaning and processing.

    The data entry program checked for inconsistencies and validated the entries in each field by checking for valid response categories and range checks. For example, the program didn’t accept an age greater than 120. For almost all of the variables there existed a range or a list of possible values that the program checked for.

    In addition, the data was entered twice to capture other data entry errors. The data entry program was able to warn the user whenever a value that did not match the first entry was entered at the second data entry. In this case the program asked the user to resolve the conflict by choosing either the 1st or the 2nd data entry value to be able to continue. After the second data entry was completed successfully, the data entry program placed a mark in the database in order to enable the checking of whether this process had been completed for each and every case.

    Data Transfer The data entry program was capable of exporting the data that was entered into one compressed database file which could be easily sent to WHO using email attachments or a file transfer program onto a secure server no matter how many cases were in the file. The sites were allowed the use of as many computers and as many data entry personnel as they wanted. Each computer used for this purpose produced one file and they were merged once they were delivered to WHO with the help of other programs that were built for automating the process. The sites sent the data periodically as they collected it enabling the checking procedures and preliminary analyses in the early stages of the data collection.

    Data quality checks Once the data was received it was analyzed for missing information, invalid responses and representativeness. Inconsistencies were also noted and reported back to sites.

    Data Cleaning and Feedback After receipt of cleaned data from sites, another program was run to check for missing information, incorrect information (e.g. wrong use of center codes), duplicated data, etc. The output of this program was fed back to sites regularly. Mainly, this consisted of cases with duplicate IDs, duplicate cases (where the data for two respondents with different IDs were identical), wrong country codes, missing age, sex, education and some other important variables.

  10. Data from: EC-Earth-Consortium EC-Earth3 model output prepared for CMIP6...

    • wdc-climate.de
    Updated 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EC-Earth Consortium (EC-Earth) (2019). EC-Earth-Consortium EC-Earth3 model output prepared for CMIP6 CMIP [Dataset]. http://doi.org/10.22033/ESGF/CMIP6.181
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    Earth System Grid
    World Data Center for Climate (WDCC) at DKRZ
    Authors
    EC-Earth Consortium (EC-Earth)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coupled Model Intercomparison Project Phase 6 (CMIP6) datasets. These data include all datasets published for 'CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3' with the full Data Reference Syntax following the template 'mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version'.

    The EC Earth 3.3 climate model, released in 2019, includes the following components: atmos: IFS cy36r4 (TL255, linearly reduced Gaussian grid equivalent to 512 x 256 longitude/latitude; 91 levels; top level 0.01 hPa), land: HTESSEL (land surface scheme built in IFS), ocean: NEMO3.6 (ORCA1 tripolar primarily 1 deg with meridional refinement down to 1/3 degree in the tropics; 362 x 292 longitude/latitude; 75 levels; top grid cell 0-1 m), seaIce: LIM3. The model was run by the AEMET, Spain; BSC, Spain; CNR-ISAC, Italy; DMI, Denmark; ENEA, Italy; FMI, Finland; Geomar, Germany; ICHEC, Ireland; ICTP, Italy; IDL, Portugal; IMAU, The Netherlands; IPMA, Portugal; KIT, Karlsruhe, Germany; KNMI, The Netherlands; Lund University, Sweden; Met Eireann, Ireland; NLeSC, The Netherlands; NTNU, Norway; Oxford University, UK; surfSARA, The Netherlands; SMHI, Sweden; Stockholm University, Sweden; Unite ASTR, Belgium; University College Dublin, Ireland; University of Bergen, Norway; University of Copenhagen, Denmark; University of Helsinki, Finland; University of Santiago de Compostela, Spain; Uppsala University, Sweden; Utrecht University, The Netherlands; Vrije Universiteit Amsterdam, the Netherlands; Wageningen University, The Netherlands. Mailing address: EC-Earth consortium, Rossby Center, Swedish Meteorological and Hydrological Institute/SMHI, SE-601 76 Norrkoping, Sweden (EC-Earth-Consortium) in native nominal resolutions: atmos: 100 km, land: 100 km, ocean: 100 km, seaIce: 100 km.

    Project: These data have been generated as part of the internationally-coordinated Coupled Model Intercomparison Project Phase 6 (CMIP6; see also GMD Special Issue: http://www.geosci-model-dev.net/special_issue590.html). The simulation data provides a basis for climate research designed to answer fundamental science questions and serves as resource for authors of the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC-AR6).

    CMIP6 is a project coordinated by the Working Group on Coupled Modelling (WGCM) as part of the World Climate Research Programme (WCRP). Phase 6 builds on previous phases executed under the leadership of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and relies on the Earth System Grid Federation (ESGF) and the Centre for Environmental Data Analysis (CEDA) along with numerous related activities for implementation. The original data is hosted and partially replicated on a federated collection of data nodes, and most of the data relied on by the IPCC is being archived for long-term preservation at the IPCC Data Distribution Centre (IPCC DDC) hosted by the German Climate Computing Center (DKRZ).

    The project includes simulations from about 120 global climate models and around 45 institutions and organizations worldwide. - Project website: https://pcmdi.llnl.gov/CMIP6.

  11. Employment Of India CLeaned and Messy Data

    • kaggle.com
    zip
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MANSI SHINDE (2025). Employment Of India CLeaned and Messy Data [Dataset]. https://www.kaggle.com/datasets/soniaaaaaaaa/employment-of-india-cleaned-and-messy-data/code
    Explore at:
    zip(29791 bytes)Available download formats
    Dataset updated
    Apr 7, 2025
    Authors
    MANSI SHINDE
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    India
    Description

    This dataset presents a dual-version representation of employment-related data from India, crafted to highlight the importance of data cleaning and transformation in any real-world data science or analytics project.

    🔹 Dataset Composition:

    It includes two parallel datasets: 1. Messy Dataset (Raw) – Represents a typical unprocessed dataset often encountered in data collection from surveys, databases, or manual entries. 2. Cleaned Dataset – This version demonstrates how proper data preprocessing can significantly enhance the quality and usability of data for analytical and visualization purposes.

    Each record captures multiple attributes related to individuals in the Indian job market, including: - Age Group
    - Employment Status (Employed/Unemployed)
    - Monthly Salary (INR)
    - Education Level
    - Industry Sector
    - Years of Experience
    - Location
    - Perceived AI Risk
    - Date of Data Recording

    Transformations & Cleaning Applied:

    The raw dataset underwent comprehensive transformations to convert it into its clean, analysis-ready form: - Missing Values: Identified and handled using either row elimination (where critical data was missing) or imputation techniques. - Duplicate Records: Identified using row comparison and removed to prevent analytical skew. - Inconsistent Formatting: Unified inconsistent naming in columns (like 'monthly_salary_(inr)' → 'Monthly Salary (INR)'), capitalization, and string spacing. - Incorrect Data Types: Converted columns like salary from string/object to float for numerical analysis. - Outliers: Detected and handled based on domain logic and distribution analysis. - Categorization: Converted numeric ages into grouped age categories for comparative analysis. - Standardization: Uniform labels for employment status, industry names, education, and AI risk levels were applied for visualization clarity.

    Purpose & Utility:

    This dataset is ideal for learners and professionals who want to understand: - The impact of messy data on visualization and insights - How transformation steps can dramatically improve data interpretation - Practical examples of preprocessing techniques before feeding into ML models or BI tools

    It's also useful for: - Training ML models with clean inputs
    - Data storytelling with visual clarity
    - Demonstrating reproducibility in data cleaning pipelines

    By examining both the messy and clean datasets, users gain a deeper appreciation for why “garbage in, garbage out” rings true in the world of data science.

  12. International Data Base, World Population: 1983 Extract

    • icpsr.umich.edu
    • search.datacite.org
    ascii
    Updated Feb 16, 1992
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States. Bureau of the Census (1992). International Data Base, World Population: 1983 Extract [Dataset]. http://doi.org/10.3886/ICPSR08320.v1
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Feb 16, 1992
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States. Bureau of the Census
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/8320/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8320/terms

    Time period covered
    1950 - 1985
    Area covered
    World
    Description

    This aggregate data collection is an extract of the International Data Base (IDB), a computerized central repository of demographic, economic, and social data for all countries of the world. Data available in this collection include total midyear population estimates and projections (1950-1985), percent urban population, estimates and projections of crude birth rate, crude death rate, net migration rate, rate of natural increase, and annual growth rate, infant mortality rate and life expectancy at birth by sex, percent literate by sex, and percent of the labor force in agriculture.

  13. World Religions Across Regions

    • kaggle.com
    zip
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). World Religions Across Regions [Dataset]. https://www.kaggle.com/datasets/thedevastator/a-global-perspective-on-world-religions-1945-201
    Explore at:
    zip(213216 bytes)Available download formats
    Dataset updated
    Dec 6, 2022
    Authors
    The Devastator
    Area covered
    World
    Description

    World Religions Across Regions

    Analyzing Adherence Across Regions, States and the Global System

    By Correlates of War Project [source]

    About this dataset

    The World Religion Project (WRP) is an ambitious endeavor to conduct a comprehensive analysis of religious adherence throughout the world from 1945 to 2010. This cutting-edge project offers unparalleled insight into the religious behavior of people in different countries, regions, and continents during this time period. Its datasets provide important information about the numbers and percentages of adherents across a multitude of different religions, religion families, and non-religious affiliations.

    The WRP consists of three distinct datasets: the national religion dataset, regional religion dataset, and global religion dataset. Each is focused on understanding individually specific realms for varied analysis approaches - from individual states to global systems. The national dataset provides data on number of adherents by state as well as percentage population practicing a given faith group in five-year increments; focusing attention to how this number evolves from nation to nation over time. Similarly, regional data is provided at five year intervals highlighting individual region designations with one modification – Pacific Ocean states have been reclassified into their own Oceania category according to Country Code Number 900 or above). Finally at a global level – all states are aggregated in order that we may understand a snapshot view at any five-year interval between 1945‐2010 regarding relationships between religions or religio‐families within one location or transnationally.

    This project was developed in three stages: firstly forming a religions tree (a systematic classification), secondly collecting data such as this provided by WRP according to that classification structure – lastly cleaning the data so discrepancies may be reconciled and imported where needed with gaps selected when unknown values were encountered during collection process . We would encourage anyone wishing details undergoing more detailed reading/analysis relating various use applications for these rich datasets - please contact Zeev Maoz (University California Davis) & Errol A Henderson _(Pennsylvania State University)

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    The World Religions Project (WRP) dataset offers a comprehensive look at religious adherence around the world within a single dataset. With this dataset, you can track global religious trends over a period of 65 years and explore how they’ve changed during that time. By exploring the WRP data set, you’ll gain insight into cross-regional and cross-time patterns in religious affiliation around the world.

    Research Ideas

    • Analyzing historical patterns of religious growth and decline across different regions
    • Creating visualizations to compare religious adherence in various states, countries, or globally
    • Studying the impact of governmental policies on religious participation over time

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: WRP regional data.csv | Column name | Description | |:-----------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------| | Year | Reference year for data collection. (Integer) | | Region | World region according to Correlates Of War (COW) Regional Systemizations with one modification (Oceania category for COW country code ...

  14. Internet of Things - number of connected devices worldwide 2015-2025

    • statista.com
    Updated Nov 27, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Internet of Things - number of connected devices worldwide 2015-2025 [Dataset]. https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/
    Explore at:
    Dataset updated
    Nov 27, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    By 2025, forecasts suggest that there will be more than ** billion Internet of Things (IoT) connected devices in use. This would be a nearly threefold increase from the IoT installed base in 2019. What is the Internet of Things? The IoT refers to a network of devices that are connected to the internet and can “communicate” with each other. Such devices include daily tech gadgets such as the smartphones and the wearables, smart home devices such as smart meters, as well as industrial devices like smart machines. These smart connected devices are able to gather, share, and analyze information and create actions accordingly. By 2023, global spending on IoT will reach *** trillion U.S. dollars. How does Internet of Things work? IoT devices make use of sensors and processors to collect and analyze data acquired from their environments. The data collected from the sensors will be shared by being sent to a gateway or to other IoT devices. It will then be either sent to and analyzed in the cloud or analyzed locally. By 2025, the data volume created by IoT connections is projected to reach a massive total of **** zettabytes. Privacy and security concerns   Given the amount of data generated by IoT devices, it is no wonder that data privacy and security are among the major concerns with regard to IoT adoption. Once devices are connected to the Internet, they become vulnerable to possible security breaches in the form of hacking, phishing, etc. Frequent data leaks from social media raise earnest concerns about information security standards in today’s world; were the IoT to become the next new reality, serious efforts to create strict security stands need to be prioritized.

  15. w

    Panel Data on International Migration 1975-2000 - Australia, Canada,...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Apr 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maurice Schiff and Mirja Channa Sjoblom (2021). Panel Data on International Migration 1975-2000 - Australia, Canada, Germany, France, United Kingdom, United States [Dataset]. https://microdata.worldbank.org/index.php/catalog/390
    Explore at:
    Dataset updated
    Apr 27, 2021
    Dataset authored and provided by
    Maurice Schiff and Mirja Channa Sjoblom
    Time period covered
    1975 - 2000
    Area covered
    France, Australia, United Kingdom, Germany, Canada, United States
    Description

    Abstract

    This dataset, a product of the Trade Team - Development Research Group, is part of a larger effort in the group to measure the extent of the brain drain as part of the International Migration and Development Program. It measures international skilled migration for the years 1975-2000.

    The methodology is explained in: "Tendance de long terme des migrations internationals. Analyse à partir des 6 principaux pays recerveurs", Cécily Defoort.

    This data set uses the same methodology as used in the Docquier-Marfouk data set on international migration by educational attainment. The authors use data from 6 key receiving countries in the OECD: Australia, Canada, France, Germany, the UK and the US.

    It is estimated that the data represent approximately 77 percent of the world’s migrant population.

    Bilateral brain drain rates are estimated based observations for every five years, during the period 1975-2000.

    Geographic coverage

    Australia, Canada, France, Germany, UK and US

    Kind of data

    Aggregate data [agg]

    Mode of data collection

    Other [oth]

  16. Multi Country Study Survey 2000-2001 - Chile

    • apps.who.int
    • catalog.ihsn.org
    • +1more
    Updated Jan 17, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Health Organization (WHO) (2014). Multi Country Study Survey 2000-2001 - Chile [Dataset]. https://apps.who.int/healthinfo/systems/surveydata/index.php/catalog/149
    Explore at:
    Dataset updated
    Jan 17, 2014
    Dataset provided by
    World Health Organizationhttps://who.int/
    Authors
    World Health Organization (WHO)
    Time period covered
    2000 - 2001
    Area covered
    Chile
    Description

    Abstract

    In order to develop various methods of comparable data collection on health and health system responsiveness WHO started a scientific survey study in 2000-2001. This study has used a common survey instrument in nationally representative populations with modular structure for assessing health of indviduals in various domains, health system responsiveness, household health care expenditures, and additional modules in other areas such as adult mortality and health state valuations.

    The health module of the survey instrument was based on selected domains of the International Classification of Functioning, Disability and Health (ICF) and was developed after a rigorous scientific review of various existing assessment instruments. The responsiveness module has been the result of ongoing work over the last 2 years that has involved international consultations with experts and key informants and has been informed by the scientific literature and pilot studies.

    Questions on household expenditure and proportionate expenditure on health have been borrowed from existing surveys. The survey instrument has been developed in multiple languages using cognitive interviews and cultural applicability tests, stringent psychometric tests for reliability (i.e. test-retest reliability to demonstrate the stability of application) and most importantly, utilizing novel psychometric techniques for cross-population comparability.

    The study was carried out in 61 countries completing 71 surveys because two different modes were intentionally used for comparison purposes in 10 countries. Surveys were conducted in different modes of in- person household 90 minute interviews in 14 countries; brief face-to-face interviews in 27 countries and computerized telephone interviews in 2 countries; and postal surveys in 28 countries. All samples were selected from nationally representative sampling frames with a known probability so as to make estimates based on general population parameters.

    The survey study tested novel techniques to control the reporting bias between different groups of people in different cultures or demographic groups ( i.e. differential item functioning) so as to produce comparable estimates across cultures and groups. To achieve comparability, the selfreports of individuals of their own health were calibrated against well-known performance tests (i.e. self-report vision was measured against standard Snellen's visual acuity test) or against short descriptions in vignettes that marked known anchor points of difficulty (e.g. people with different levels of mobility such as a paraplegic person or an athlete who runs 4 km each day) so as to adjust the responses for comparability . The same method was also used for self-reports of individuals assessing responsiveness of their health systems where vignettes on different responsiveness domains describing different levels of responsiveness were used to calibrate the individual responses.

    This data are useful in their own right to standardize indicators for different domains of health (such as cognition, mobility, self care, affect, usual activities, pain, social participation, etc.) but also provide a better measurement basis for assessing health of the populations in a comparable manner. The data from the surveys can be fed into composite measures such as "Healthy Life Expectancy" and improve the empirical data input for health information systems in different regions of the world. Data from the surveys were also useful to improve the measurement of the responsiveness of different health systems to the legitimate expectations of the population.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The telephone directory was used as the sampling frame since it is considered as the most reliable registry available.

    Each region was divided into provinces. The provinces are composed of "comunas" or municipalities from within which individuals were randomly selected. However, with this design, there may be a bias towards the population without a telephone.

    Final Sample Size=2,078

    Mode of data collection

    Mail Questionnaire [mail]

    Cleaning operations

    Data Coding At each site the data was coded by investigators to indicate the respondent status and the selection of the modules for each respondent within the survey design. After the interview was edited by the supervisor and considered adequate it was entered locally.

    Data Entry Program A data entry program was developed in WHO specifically for the survey study and provided to the sites. It was developed using a database program called the I-Shell (short for Interview Shell), a tool designed for easy development of computerized questionnaires and data entry (34). This program allows for easy data cleaning and processing.

    The data entry program checked for inconsistencies and validated the entries in each field by checking for valid response categories and range checks. For example, the program didn’t accept an age greater than 120. For almost all of the variables there existed a range or a list of possible values that the program checked for.

    In addition, the data was entered twice to capture other data entry errors. The data entry program was able to warn the user whenever a value that did not match the first entry was entered at the second data entry. In this case the program asked the user to resolve the conflict by choosing either the 1st or the 2nd data entry value to be able to continue. After the second data entry was completed successfully, the data entry program placed a mark in the database in order to enable the checking of whether this process had been completed for each and every case.

    Data Transfer The data entry program was capable of exporting the data that was entered into one compressed database file which could be easily sent to WHO using email attachments or a file transfer program onto a secure server no matter how many cases were in the file. The sites were allowed the use of as many computers and as many data entry personnel as they wanted. Each computer used for this purpose produced one file and they were merged once they were delivered to WHO with the help of other programs that were built for automating the process. The sites sent the data periodically as they collected it enabling the checking procedures and preliminary analyses in the early stages of the data collection.

    Data quality checks Once the data was received it was analyzed for missing information, invalid responses and representativeness. Inconsistencies were also noted and reported back to sites.

    Data Cleaning and Feedback After receipt of cleaned data from sites, another program was run to check for missing information, incorrect information (e.g. wrong use of center codes), duplicated data, etc. The output of this program was fed back to sites regularly. Mainly, this consisted of cases with duplicate IDs, duplicate cases (where the data for two respondents with different IDs were identical), wrong country codes, missing age, sex, education and some other important variables.

  17. d

    Component parts of the World Heat Flow Data Collection

    • search.dataone.org
    Updated Jan 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clement, M; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G (2018). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.809582
    Explore at:
    Dataset updated
    Jan 6, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Clement, M; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G
    Area covered
    Description

    No description is available. Visit https://dataone.org/datasets/5d041e4bfaf4ea361dd3135126134720 for complete metadata about this dataset.

  18. Wonders of the World Image Dataset

    • kaggle.com
    zip
    Updated May 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bala Baskar (2022). Wonders of the World Image Dataset [Dataset]. https://www.kaggle.com/datasets/balabaskar/wonders-of-the-world-image-classification
    Explore at:
    zip(453078359 bytes)Available download formats
    Dataset updated
    May 3, 2022
    Authors
    Bala Baskar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    The New 7 Wonders of the World was a campaign started in 2000 to choose Wonders of the World from a selection of 200 existing monuments. The popularity poll via free Web-based voting and small amounts of telephone voting was led by Canadian-Swiss Bernard Weber and organized by the New 7 Wonders Foundation (N7W) based in Zurich, Switzerland, with winners announced on 7 July 2007 in Lisbon, at Estádio da Luz. The poll was considered unscientific partly because it was possible for people to cast multiple votes.

    Context

    When someday, if we plan to go on a World tour, obviously there is going to be a bucket list of wonders or places around the world, that we wish to visit. Here, we have one set of "Wonders of the World" images scraped from Google Images. Let us use our deep learning skills to build multiclass classification to identify the place in the images.

    Data Preparation

    This dataset contains a total of 3846 images placed in folders, with which each folder representing one of the top new wonders of the world. Below is the list of wonders with images extracted from Google Images.

    • Venezuela Angel Falls
    • Taj Mahal
    • Stonehenge
    • Statue of Liberty
    • Chichen Itz
    • Christ the Redeemer
    • Pyramids of Giza
    • Eiffel Tower
    • Great Wall of China
    • Burj Khalifa
    • Roman Colosseum
    • Machu Pichu
  19. d

    Component parts of the World Heat Flow Data Collection

    • search.dataone.org
    Updated Jan 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans, T R; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G (2018). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.806998
    Explore at:
    Dataset updated
    Jan 5, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Evans, T R; International Heat Flow Commission, I H F C; Jessop, Alan M; Hobart, Michael A; Sclater, John G
    Area covered
    Description

    No description is available. Visit https://dataone.org/datasets/c9d6507f203308063a16ce22ba032540 for complete metadata about this dataset.

  20. d

    Component parts of the World Heat Flow Data Collection

    • dataone.org
    • doi.pangaea.de
    • +1more
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roger H Morin; Richard P von Herzen (2025). Component parts of the World Heat Flow Data Collection [Dataset]. http://doi.org/10.1594/PANGAEA.805302
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Roger H Morin; Richard P von Herzen
    Area covered
    Description

    This dataset is about: Component parts of the World Heat Flow Data Collection.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mario Caesar (2023). Our World in Data - COVID-19 [Dataset]. https://www.kaggle.com/datasets/caesarmario/our-world-in-data-covid19-dataset/code
Organization logo

Our World in Data - COVID-19

COVID-19 Dataset by Our World in Data

Explore at:
zip(14235238 bytes)Available download formats
Dataset updated
Oct 25, 2023
Authors
Mario Caesar
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Our World in Data - COVID-19

▶ About Our World in Data 🏢

▶ Similar Datasets 📄

▶ Context 📝

The complete COVID-19 dataset is a collection of the COVID-19 data maintained and provided by Our World in Data. Our World in Data team will update it daily throughout the duration of the COVID-19 pandemic.

▶ Content 📃

These are the following information that includes in the dataset: | Metrics | Source | Updated | Countries | | --- | --- | | Vaccinations | Official data collated by the Our World in Data team | Daily | 218 | | Tests & positivity | Official data collated by the Our World in Data team | Weekly | 139 | | Hospital & ICU | Official data collated by the Our World in Data team | Weekly | 39 | | Confirmed cases | JHU CSSE COVID-19 Data | Daily | 196 | | Confirmed deaths | JHU CSSE COVID-19 Data | Daily | 196 | | Reproduction rate | Arroyo-Marioli F, Bullano F, Kucinskas S, Rondón-Moreno C | Daily | 185 | | Policy responses | Oxford COVID-19 Government Response Tracker | Daily | 186 | | Other variables of interest | International organizations (UN, World Bank, OECD, IHME…) | Fixed |

Data dictionary is available below ⤵

▶ Acknowledgements 🙏

I'd like to clarify that I'm only making data about vaccines collected by Our World in Data available to Kaggle community. This dataset is gathered, integrated, and posted the new version on a daily basis, as maintained by Our World in Data on their GitHub repository.

▶ Inspiration 💭

  • Forecasting daily new confirmed cases of COVID-19 in specific country.
  • Perform data analysis/data visualization of COVID-19 cases/death/etc.

📷 Images by Fusion Medical Animation.

Search
Clear search
Close search
Google apps
Main menu