Facebook
TwitterIn 2001, the World Bank in co-operation with the Republika Srpska Institute of Statistics (RSIS), the Federal Institute of Statistics (FOS) and the Agency for Statistics of BiH (BHAS), carried out a Living Standards Measurement Survey (LSMS). The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:
To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs.
To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.
To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analyzed data.
The Department for International Development, UK (DFID) contributed funding to the LSMS and provided funding for a further two years of data collection for a panel survey, known as the Household Survey Panel Series (HSPS). Birks Sinclair & Associates Ltd. were responsible for the management of the HSPS with technical advice and support provided by the Institute for Social and Economic Research (ISER), University of Essex, UK. The panel survey provides longitudinal data through re-interviewing approximately half the LSMS respondents for two years following the LSMS, in the autumn of 2002 and 2003. The LSMS constitutes Wave 1 of the panel survey so there are three years of panel data available for analysis. For the purposes of this documentation we are using the following convention to describe the different rounds of the panel survey: - Wave 1 LSMS conducted in 2001 forms the baseline survey for the panel - Wave 2 Second interview of 50% of LSMS respondents in Autumn/ Winter 2002 - Wave 3 Third interview with sub-sample respondents in Autumn/ Winter 2003
The panel data allows the analysis of key transitions and events over this period such as labour market or geographical mobility and observe the consequent outcomes for the well-being of individuals and households in the survey. The panel data provides information on income and labour market dynamics within FBiH and RS. A key policy area is developing strategies for the reduction of poverty within FBiH and RS. The panel will provide information on the extent to which continuous poverty is experienced by different types of households and individuals over the three year period. And most importantly, the co-variates associated with moves into and out of poverty and the relative risks of poverty for different people can be assessed. As such, the panel aims to provide data, which will inform the policy debates within FBiH and RS at a time of social reform and rapid change. KIND OF DATA
National coverage. Domains: Urban/rural/mixed; Federation; Republic
Households
Sample survey data [ssd]
The Wave 3 sample consisted of 2878 households who had been interviewed at Wave 2 and a further 73 households who were interviewed at Wave 1 but were non-contact at Wave 2 were issued. A total of 2951 households (1301 in the RS and 1650 in FBiH) were issued for Wave 3. As at Wave 2, the sample could not be replaced with any other households.
Panel design
Eligibility for inclusion
The household and household membership definitions are the same standard definitions as a Wave 2. While the sample membership status and eligibility for interview are as follows: i) All members of households interviewed at Wave 2 have been designated as original sample members (OSMs). OSMs include children within households even if they are too young for interview. ii) Any new members joining a household containing at least one OSM, are eligible for inclusion and are designated as new sample members (NSMs). iii) At each wave, all OSMs and NSMs are eligible for inclusion, apart from those who move outof-scope (see discussion below). iv) All household members aged 15 or over are eligible for interview, including OSMs and NSMs.
Following rules
The panel design means that sample members who move from their previous wave address must be traced and followed to their new address for interview. In some cases the whole household will move together but in others an individual member may move away from their previous wave household and form a new split-off household of their own. All sample members, OSMs and NSMs, are followed at each wave and an interview attempted. This method has the benefit of maintaining the maximum number of respondents within the panel and being relatively straightforward to implement in the field.
Definition of 'out-of-scope'
It is important to maintain movers within the sample to maintain sample sizes and reduce attrition and also for substantive research on patterns of geographical mobility and migration. The rules for determining when a respondent is 'out-of-scope' are as follows:
i. Movers out of the country altogether i.e. outside FBiH and RS. This category of mover is clear. Sample members moving to another country outside FBiH and RS will be out-of-scope for that year of the survey and not eligible for interview.
ii. Movers between entities Respondents moving between entities are followed for interview. The personal details of the respondent are passed between the statistical institutes and a new interviewer assigned in that entity.
iii. Movers into institutions Although institutional addresses were not included in the original LSMS sample, Wave 3 individuals who have subsequently moved into some institutions are followed. The definitions for which institutions are included are found in the Supervisor Instructions.
iv. Movers into the district of Brcko are followed for interview. When coding entity Brcko is treated as the entity from which the household who moved into Brcko originated.
Face-to-face [f2f]
Data entry
As at Wave 2 CSPro was the chosen data entry software. The CSPro program consists of two main features to reduce to number of keying errors and to reduce the editing required following data entry: - Data entry screens that included all skip patterns. - Range checks for each question (allowing three exceptions for inappropriate, don't know and missing codes). The Wave 3 data entry program had more checks than at Wave 2 and DE staff were instructed to get all anomalies cleared by SIG fieldwork. The program was extensively tested prior to DE. Ten computer staff were employed in each Field Office and as all had worked on Wave 2 training was not undertaken.
Editing
Editing Instructions were compiled (Annex G) and sent to Supervisors. For Wave 3 Supervisors were asked to take more time to edit every questionnaire returned by their interviewers. The FBTSA examined the work twelve of the twenty-two Supervisors. All Supervisors made occasional errors with the Control Form so a further 100% check of Control Forms and Module 1 was undertaken by the FBTSA and SIG members.
The panel survey has enjoyed high response rates throughout the three years of data collection with the wave 3 response rates being slightly higher than those achieved at wave 2. At wave 3, 1650 households in the FBiH and 1300 households in the RS were issued for interview. Since there may be new households created from split-off movers it is possible for the number of households to increase during fieldwork. A similar number of new households were formed in each entity; 62 in the FBiH and 63 in the RS. This means that 3073 households were identified during fieldwork. Of these, 3003 were eligible for interview, 70 households having either moved out of BiH, institutionalised or deceased (34 in the RS and 36 in the FBiH).
Interviews were achieved in 96% of eligible households, an extremely high response rate by international standards for a survey of this type.
In total, 8712 individuals (including children) were enumerated within the sample households (4796 in the FBiH and 3916 in the RS). Within in the 3003 eligible households, 7781 individuals aged 15 or over were eligible for interview with 7346 (94.4%) being successfully interviewed. Within cooperating households (where there was at least one interview) the interview rate was higher (98.8%).
A very important measure in longitudinal surveys is the annual individual re-interview rate. This is because a high attrition rate, where large numbers of respondents drop out of the survey over time, can call into question the quality of the data collected. In BiH the individual re-interview rates have been high for the survey. The individual re-interview rate is the proportion of people who gave an interview at time t-1 who also give an interview at t. Of those who gave a full interview at wave 2, 6653 also gave a full interview at wave 3. This represents a re-interview rate of 97.9% - which is extremely high by international standards. When we look at those respondents who have been interviewed at all three years of the survey there are 6409 cases which are available for longitudinal analysis, 2881 in the RS and 3528 in the FBiH. This represents 82.8% of the responding wave 1 sample, a
Facebook
TwitterBy US Open Data Portal, data.gov [source]
This dataset provides a list of all Home Health Agencies registered with Medicare. Contained within this dataset is information on each agency's address, phone number, type of ownership, quality measure ratings and other associated data points. With this valuable insight into the operations of each Home Health Care Agency, you can make informed decisions about your care needs. Learn more about the services offered at each agency and how they are rated according to their quality measure ratings. From dedicated nursing care services to speech pathology to medical social services - get all the information you need with this comprehensive look at U.S.-based Home Health Care Agencies!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Are you looking to learn more about Home Health Care Agencies registered with Medicare? This dataset can provide quality measure ratings, addresses, phone numbers, types of services offered and other information that may be helpful when researching Home Health Care Agencies.
This guide will explain how to use the data in this dataset to gain a better understanding of Home Health Care Agencies registered with Medicare.
First, you will need to become familiar with the columns in the dataset. A list of all columns and their associated descriptions is provided above for your reference. Once you understand each column’s purpose, it will be easier for you to decide what metrics or variables are most important for your own research.
Next, use this data to compare various facets between different Home Health Care Agencies such as type of ownership, services offered and quality measure ratings like star rating or CMS certification number (from 0-5 stars). Collecting information from multiple sources such as public reviews or customer feedback can help supplement these numerical metrics in order to paint a more accurate picture about each agency's performance and customer satisfaction level.
Finally once you have collected enough data points on one particular agency or a comparison between multiple agencies then conduct more analysis using statistical methods like correlation matrices in order to determine any patterns that exist within the data set which may reveal valuable insights into topic of research at hand
- Using the data to compare quality of care ratings between agencies, so people can make better informed decisions about which agency to hire for home health services.
- Analyzing the costs associated with different types of home health care services, such as nursing care and physical therapy, in order to determine where money could be saved in health care budgets.
- Evaluating the performance of certain agencies by analyzing the number of episodes billed to Medicare compared to their national averages, allowing agencies with lower numbers of billing episodes to be identified and monitored more closely if necessary
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: csv-1.csv | Column name | Description | |:----------------------------------------...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.
Facebook
Twitterhttps://www.ons.gov.uk/methodology/geography/licenceshttps://www.ons.gov.uk/methodology/geography/licences
This document is the Standard Area Measurements - 2015 User Guide. It provides information regarding the Standard Area Measurements (SAM) products including: types of measurement; data tolerance, accuracy and currency; guidance on the use of measurements for statistical purposes; and conditions of use. (File Size - 561 KB)
Facebook
TwitterThis document is the Standard Area Measurements (2025) User Guide. It provides information regarding the Standard Area Measurements (SAM) products including: types of measurement; data tolerance, accuracy and currency; guidance on the use of measurements for statistical purposes; and conditions of use. (File Size - 408 KB)
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This article looks at how benefits in kind households receive from the state are measured within the Effects of Taxes and Benefits on Household Income publication and how the methodology behind these measurements has changed between the years 2005/06 and 2010/11. We also retrospectively apply the current methodology back to the year 2005/06 in order to produce a consistent series showing how the size and distribution of benefits in kind received by households have changed over time. Source agency: Office for National Statistics Designation: Supporting material Language: English Alternative title: Methodological Changes in the Measurement of Benefits in Kind
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The body shape dataset consists of a compilation of people's photos along with their corresponding body measurements. It is designed to provide information and insights into the physical appearances and body characteristics of individuals. The body diversification dataset includes a diverse range of subjects representing different age groups, genders, and ethnicities.
The photos are captured in a standardized manner, depicting individuals in a front and side positions. The images aim to capture the subjects' physical appearance using appropriate lighting and angles that showcase their body proportions accurately.
The dataset serves various purposes, including: - research projects - body measurement analysis - fashion or apparel industry applications - fitness and wellness studies - anthropometric studies for ergonomic design in various fields
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F618942%2Fef4464044f810b1ada765ae99597b15a%2FMacBook%20Air%20-%201%20(3).png?generation=1688983133539816&alt=media" alt="">
files folder and includes additional photos of people taking measurementsproofs folderincludes the following information for each media file:
keywords: body girth measurements, body composition, waist circumference, percent body fat, human body size, body mass index, bmi, visual human body, e-commerce, human body shape classification, height, weight,measurement points, body build weight, body dataset, human part dataset, human body data, deep learning, computer vision, people images dataset, biometric data dataset, biometric dataset, images database, image-to-image, machine learning, human detection dataset, human body dataset, body recognition dataset, body shape dataset, body size dataset, body types dataset, body measurements dataset, anthropometric data set
Facebook
TwitterWe designed two new samplers for monitoring airborne particulates, including fungal and fern spores and plant pollen, that rely on natural wind currents (Passive Environmental Sampler) or a battery operated fan (Active Environmental Sampler). Both samplers are modeled after commercial devices such as the Rotorod® and the Burkard® samplers, but are more economical and require less maintenance than commercial devices. We compared our two new samplers to Rotorod® samplers using Xyleborus spp. boring dust (frass) known to contain fungi responsible for Rapid Ohia Death. The comparison was done in a large outdoor field cage to determine relative effectiveness of the three samplers for capturing windblown boring dust. The dataset contains measurements of boring dust particles that were captured by the three types of samples over the course of twelve trials.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A common feature of preclinical animal experiments is repeated measurement of the outcome, e.g., body weight measured in mice pups weekly for 20 weeks. Separate time point analysis or repeated measures analysis approaches can be used to analyze such data. Each approach requires assumptions about the underlying data and violations of these assumptions have implications for estimation of precision, and type I and type II error rates. Given the ethical responsibilities to maximize valid results obtained from animals used in research, our objective was to evaluate approaches to reporting repeated measures design used by investigators and to assess how assumptions about variation in the outcome over time impact type I and II error rates and precision of estimates. We assessed the reporting of repeated measures designs of 58 studies in preclinical animal experiments. We used simulation modelling to evaluate three approaches to statistical analysis of repeated measurement data. In particular, we assessed the impact of (a) repeated measure analysis assuming that the outcome had non-constant variation at all time points (heterogeneous variance) (b) repeated measure analysis assuming constant variation in the outcome (homogeneous variance), (c) separate ANOVA at individual time point in repeated measures designs. The evaluation of the three model fitting was based on comparing the p-values distributions, the type I and type II error rates and by implication, the shrinkage or inflation of standard error estimates from 1000 simulated dataset. Of 58 studies with repeated measures design, three provided a rationale for repeated measurement and 23 studies reported using a repeated-measures analysis approach. Of the 35 studies that did not use repeated-measures analysis, fourteen studies used only two time points to calculate weight change which potentially means collected data was not fully utilized. Other studies reported only select time points (n = 12) raising the issue of selective reporting. Simulation studies showed that an incorrect assumption about the variance structure resulted in modified error rates and precision estimates. The reporting of the validity of assumptions for repeated measurement data is very poor. The homogeneous variation assumption, which is often invalid for body weight measurements, should be confirmed prior to conducting the repeated-measures analysis using homogeneous covariance structure and adjusting the analysis using corrections or model specifications if this is not met.
Facebook
TwitterThe World Bank's Living Standard Measurement Study (LSMS) was adapted for use in Guyana and administered in early 1993 as part of the Guyana Bureau of Statistics' year-long Household Income and Expenditure Survey (HIES). Because the LSMS survey was to take place at about the same time as the Household Income and Expenditure Survey (HIES), it was decided to link the two surveys. The HIES questionnaire substituted for the normal LSMS modules on income, expenditures, labor activities, household businesses, housing, durable goods, and savings. The LSMS questionnaire focused only on health, education, migration, fertility, and anthropometrics.
National
Sample survey data [ssd]
The LSMS survey was administered in conjunction with the third subround of the HIES. This required an additional visit by interviewers to each of the approximately 1,800 households selected in this round. Another subsequent visit was made to households with children younger than 5 years of age to collect anthropometric data. Fieldwork was carried out between January and July 1993. Data entry began in March 1993 and concluded in August 1993.
Face-to-face [f2f]
The LSMS questionnaire contained the following modules:
Household Roster: The LSMS was administered approximately two weeks following the HIES. Much of the cover section of the LSMS household information was copied directly from the HIES prior to the interview. This was to minimize the possibility of a household's LSMS not being matched with its HIES. Following the cover section, the LSMS contained five modules. The "Interviewer Manual" provides detailed descriptions of all items in the survey.
Health Module : This schedule contained items concerning the health status of all members of the household, specifically about any illness experienced during the previous 30 days: its length; the use of medical facilities for the illness; the amount spent on its treatment; the kinds and costs of medications; and the satisfaction with medical interventions. Also collected were data concerning the use of the National Insurance Scheme, preventive care, and specific items regarding pregnancy, breastfeeding and the presence of infants.
Education Module: Information concerning the schooling of each household member over the age of three years was collected, including the level completed, type of school, years repeated, access to meals and textbooks, distance traveled, and several categories of associated expenditures. Attention should be paid to the expenditure data. Some items request monthly expenditures. Others collect annual expenditures.
Migration Module. This section collected information about individuals who at the time of the interview had lived away from the household for more than six months. These individuals were not counted on the household roster and did not appear in HIES data. Information regarding the location of the individual, her/his reason for leaving, and whether or not contributions to the family were made. Because data on international remittances were collected in the HIES, the amount of the remittances was not recorded here. It was felt that respondents would be unwilling to answer too many questions about transfers from abroad.
Fertility Module. This section collected information about the pregnancy, birthing, and contraceptive practices of women in the household between the ages 13 and 49. Only one member of the household was interviewed for this section. The interviewer consulted a previously prepared list of randomly ordered ID numbers and selected the first one corresponding to an eligible member of the household being interviewed. ID numbers were established when the household roster was completed using information from the HIES. The ID code was a number between 01 and 12. Selecting a number randomly avoided selection bias on the part of the interviewer.
Anthropometry Module. During the initial LSMS interview, the interviewer indicated whether any children under the age of five years were present. For households with such children, a return visit by a team of nursing students, was made to collect data on the children.
Anthropometric measurements gathered were standing height (taken for children between two and five years of age), recumbent length (taken lying down on children less than two years of age) and weight of pre-school age children (less than five years of age). Immunizations received, breastfeeding, other dietary and vitamin supplement data were also collected.
A data entry program was developed for the LSMS data by Sistemas Integrales in Santiago, Chile. The program consisted of a series of computer screens designed to resemble the questionnaire and was programmed to perform a number of data verification tasks automatically. These included flagging omitted, out-of-range and inconsistent responses. Those questionnaires which were identified as having inconsistent responses were returned to the field for additional data or clarification.
Facebook
TwitterThis zip file contains the Standard Area Measurements (SAM) for the 2021 Statistical Areas in England and Wales as at Census Day (21 March 2021). This includes the Output Areas (OA), Lower layer Super Output Areas (LSOA), Middle layer Super Output Areas (MSOA), the Lower-Tier Local Authorities (LTLA) including the Unitary Authorities (E06 and W06), Non-metropolitan Districts (E07), Metropolitan Districts (E08) and London Boroughs (E09), the Upper-Tier Local Authorities (UTLA) including the Unitary Authorities (E06 and W06), Counties (E10), Metropolitan Districts (E08) and London Boroughs (E09), the Regions including the country of Wales, Countries and National. All measurements provided are ‘flat’ as they do not take into account variations in relief e.g. mountains and valleys. Measurements are given in hectares (10,000 square metres) to 2 decimal places and square kilometres to 4 decimal places. Four types of measurements are included: total extent (AREAEHECT), area to mean high water (coastline) (AREACHECT), area of inland water (AREAIHECT) and area to mean high water excluding area of inland water (land area) (AREALHECT). The Eurostat-recommended approach is to use the ‘land area’ measurement to compile population density figures.This V2 is because the user guide name was too long.PLEASE NOTE:There is an extremely small OA with the code E00187556 and measures 400 centimetres squared. This is because all the population and household points are centred around a very small space and to make sure it was in threshold it was manually changed to make it within threshold.Click the Download button to download the files
Facebook
TwitterAccording to a survey conducted among employers in the U.S. in 2023, ** percent reported that they have increased employee deductible to control rising health care costs. A further*** percent reported that they have increased the employee share of premium to control costs. This statistic illustrates the measures taken by employers to control rising healthcare costs in the U.S. in 2023.
Facebook
TwitterThis is a public release of the dataset described in Kennedy et al. (2020), consisting of 39,565 comments annotated by 7,912 annotators, for 135,556 combined rows. The primary outcome variable is the "hate speech score" but the 10 constituent labels can also be treated as outcomes.
The original paper can be found here: Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application
Original dataset link at HuggingFace: https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech
Acknowledgemen to the original work:
@article{kennedy2020constructing,
title={Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application},
author={Kennedy, Chris J and Bacon, Geoff and Sahn, Alexander and von Vacano, Claudia},
journal={arXiv preprint arXiv:2009.10277},
year={2020}
}
Facebook
TwitterT1DiabetesGranada
A longitudinal multi-modal dataset of type 1 diabetes mellitus
Documented by:
Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4
Background
Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.
Data Records
The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.
Patient_info.csv
Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Sex – Sex of the patient. Values: F (for female), masculine (for male)
Birth_year – Year of birth of the patient. Format: YYYY.
Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.
Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.
Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.
Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.
Glucose_measurements.csv
Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.
Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.
Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.
Biochemical_parameters.csv
Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.
Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.
Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.
Diagnostics.csv
Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Technical Validation
Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.
Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.
Usage Notes
For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.
The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.
Graphs_and_stats.ipynb
The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.
Code Availability
The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.
Original_patient_info_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.
Glucose_measurements_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.
Biochemical_parameters_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.
Diagnostic_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.
Get_patient_info_variables.ipynb
In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.
Data Usage Agreement
The conditions for use are as follows:
You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.
You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.
You will require
Facebook
TwitterIn 1992, Bosnia-Herzegovina, one of the six republics in former Yugoslavia, became an independent nation. A civil war started soon thereafter, lasting until 1995 and causing widespread destruction and losses of lives. Following the Dayton accord, BosniaHerzegovina (BiH) emerged as an independent state comprised of two entities, namely, the Federation of Bosnia-Herzegovina (FBiH) and the Republika Srpska (RS), and the district of Brcko. In addition to the destruction caused to the physical infrastructure, there was considerable social disruption and decline in living standards for a large section of the population. Along side these events, a period of economic transition to a market economy was occurring. The distributive impacts of this transition, both positive and negative, are unknown. In short, while it is clear that welfare levels have changed, there is very little information on poverty and social indicators on which to base policies and programs.
In the post-war process of rebuilding the economic and social base of the country, the government has faced the problems created by having little relevant data at the household level. The three statistical organizations in the country (State Agency for Statistics for BiH –BHAS, the RS Institute of Statistics-RSIS, and the FBiH Institute of Statistics-FIS) have been active in working to improve the data available to policy makers: both at the macro and the household level. One facet of their activities is to design and implement a series of household series. The first of these surveys is the Living Standards Measurement Study survey (LSMS). Later surveys will include the Household Budget Survey (an Income and Expenditure Survey) and a Labor Force Survey. A subset of the LSMS households will be re-interviewed in the two years following the LSMS to create a panel data set.
The three statistical organizations began work on the design of the Living Standards Measurement Study Survey (LSMS) in 1999. The purpose of the survey was to collect data needed for assessing the living standards of the population and for providing the key indicators needed for social and economic policy formulation. The survey was to provide data at the country and the entity level and to allow valid comparisons between entities to be made.
The LSMS survey was carried out in the Fall of 2001 by the three statistical organizations with financial and technical support from the Department for International Development of the British Government (DfID), United Nations Development Program (UNDP), the Japanese Government, and the World Bank (WB). The creation of a Master Sample for the survey was supported by the Swedish Government through SIDA, the European Commission, the Department for International Development of the British Government and the World Bank.
The overall management of the project was carried out by the Steering Board, comprised of the Directors of the RS and FBiH Statistical Institutes, the Management Board of the State Agency for Statistics and representatives from DfID, UNDP and the WB. The day-to-day project activities were carried out by the Survey Mangement Team, made up of two professionals from each of the three statistical organizations.
The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:
To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population’s living conditions, as well as on available resources for satisfying basic needs.
To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population’s living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.
To provide key contributions for development of government’s Poverty Reduction Strategy Paper, based on analyzed data.
National coverage. Domains: Urban/rural/mixed; Federation; Republic
Sample survey data [ssd]
A total sample of 5,400 households was determined to be adequate for the needs of the survey: with 2,400 in the Republika Srpska and 3,000 in the Federation of BiH. The difficulty was in selecting a probability sample that would be representative of the country's population. The sample design for any survey depends upon the availability of information on the universe of households and individuals in the country. Usually this comes from a census or administrative records. In the case of BiH the most recent census was done in 1991. The data from this census were rendered obsolete due to both the simple passage of time but, more importantly, due to the massive population displacements that occurred during the war.
At the initial stages of this project it was decided that a master sample should be constructed. Experts from Statistics Sweden developed the plan for the master sample and provided the procedures for its construction. From this master sample, the households for the LSMS were selected.
Master Sample [This section is based on Peter Lynn's note "LSMS Sample Design and Weighting - Summary". April, 2002. Essex University, commissioned by DfID.]
The master sample is based on a selection of municipalities and a full enumeration of the selected municipalities. Optimally, one would prefer smaller units (geographic or administrative) than municipalities. However, while it was considered that the population estimates of municipalities were reasonably accurate, this was not the case for smaller geographic or administrative areas. To avoid the error involved in sampling smaller areas with very uncertain population estimates, municipalities were used as the base unit for the master sample.
The Statistics Sweden team proposed two options based on this same method, with the only difference being in the number of municipalities included and enumerated. For reasons of funding, the smaller option proposed by the team was used, or Option B.
Stratification of Municipalities
The first step in creating the Master Sample was to group the 146 municipalities in the country into three strata- Urban, Rural and Mixed - within each of the two entities. Urban municipalities are those where 65 percent or more of the households are considered to be urban, and rural municipalities are those where the proportion of urban households is below 35 percent. The remaining municipalities were classified as Mixed (Urban and Rural) Municipalities. Brcko was excluded from the sampling frame.
Urban, Rural and Mixed Municipalities: It is worth noting that the urban-rural definitions used in BiH are unusual with such large administrative units as municipalities classified as if they were completely homogeneous. Their classification into urban, rural, mixed comes from the 1991 Census which used the predominant type of income of households in the municipality to define the municipality. This definition is imperfect in two ways. First, the distribution of income sources may have changed dramatically from the pre-war times: populations have shifted, large industries have closed and much agricultural land remains unusable due to the presence of land mines. Second, the definition is not comparable to other countries' where villages, towns and cities are classified by population size into rural or urban or by types of services and infrastructure available. Clearly, the types of communities within a municipality vary substantially in terms of both population and infrastructure.
However, these imperfections are not detrimental to the sample design (the urban/rural definition may not be very useful for analysis purposes, but that is a separate issue). [Note: It may be noted that the percent of LSMS households in each stratum reporting using agricultural land or having livestock is highest in the "rural" municipalities and lowest in the "urban" municipalities. However, the concentration of agricultural households is higher in RS, so the municipality types are not comparable across entities. The percent reporting no land or livestock in RS was 74.7% in "urban" municipalities, 43.4% in "mixed" municipalities and 31.2% in "rural" municipalities. Respective figures for FbiH were 88.7%, 60.4% and 40.0%.]
The classification is used simply for stratification. The stratification is likely to have some small impact on the variance of survey estimates, but it does not introduce any bias.
Selection of Municipalities
Option B of the Master Sample involved sampling municipalities independently from each of the six strata described in the previous section. Municipalities were selected with probability proportional to estimated population size (PPES) within each stratum, so as to select approximately 50% of the mostly urban municipalities, 20% of the mixed and 10% of the mostly rural ones. Overall, 25 municipalities were selected (out of 146) with 14 in the FbiH and 11 in the RS. The distribution of selected municipalities over the sampling strata is shown below.
Stratum / Total municipalities Mi / Sampled municipalities mi 1. Federation, mostly urban / 10 / 5 2. Federation, mostly mixed / 26 / 4 3. Federation, mostly rural / 48 / 5 4. RS, mostly urban /4 / 2 5. RS, mostly mixed /29 / 5 6. RS, mostly rural / 29 / 4
Note: Mi is the total number of municipalities in stratum i (i=1, … , 6); mi is the number of municipalities selected from stratum
Facebook
TwitterOverview The data included features wind, temperature, and turbulence measurements. Data Details Each met station (met.z18, met.z19, met.z21, and met.z23) consists of multiple levels of three-dimensional ultrasonic anemometers, RM Young 81000 (sampling frequency = 20 Hz), and temperature/relative humidity probes, Rotronics HC2S3 (sampling frequency = 1 Hz). The HC2S3 probes were housed in radiation shields to protect them from thermal radiation, and they were adequately ventilated. Moreover, an infrared gas analyzer (LI-7500 Open Path CO~2/H~2~O Analyzer) was collocated at 3-meter height at met.z18 (sampling frequency = 20 Hz) in a second phase of the experiment (June 2016). Raw data are collected by Campbell CR3000 dataloggers and successively parsed into 15-minutes data files. For each met station, multiple data files are outputted with each data file corresponding to a certain type of instrument and a specific measurement height. The types of instrument and measurement heights are specified by the name of the data file itself. For example, met station met.z19 consists of sonic anemometers measurements at 3-, 10-, and 17-meter height and temperature/relative humidity measurements at 3- and 17-meter height. Therefore, the following data files are outputted: 3m sonic anemometer sample file name: met.z19.00.20160502.171500.son03m.dat 10m sonic anemometer sample file name: met.z19.00.20160502.171500.son10m.dat 17m sonic anemometer sample file name: met.z19.00.20160502.171500.son17m.dat 3m T-RH sensor sample file name: met.z19.00.20160502.171500.trh03m.dat 17m T-RH sensor sample file name: met.z19.00.20160502.171500.trh17m.dat Note that in the "Primary Measurements/Variables" section, the variables sonic_u-wind, sonic_v-wind, sonic_w-wind represent the orthogonal u, v, and w wind velocities outputted by the sonic RM Young 81000, oriented with u-axis aligned east-west and v-axis aligned north-south. In this orientation, +u values = wind from the east, and +v values = wind from the north. Wind from below (updraft) = +w. Instruments' manuals and dataset samples are provided as attachments. Data Quality Raw data: no quality control (QC) is applied. Data are visually inspected at least weekly. Uncertainty RM Young 81000 Ultrasonic Anemometer Measurements Anything that blocks the acoustic signal path will degrade the measurement. If the path is blocked sufficiently, measurements cannot be made. The RM Young 81000 can make accurate measurements in driving rain, but light mist or heavy fog can allow droplets to accumulate on the transducer faces and block the measurement. Measurements may be made in driving snow, although frost and snow that adheres to the transducer face may block the measurement. Similarly, freezing rain on the transducer face may block the measurement. Rotronics HC2S3 Temperature and Relative Humidity Measurement This sensor requires minimal maintenance, but dust, debris, and salts on the filter cap may degrade sensor performance. Because of the remote location and difficulties in climbing the met towers, no maintenance of these sensors was performed during the field experiment. Licor LI-7500 Measurement The LI-7500 optical windows should be cleaned when necessary (by checking the diagnostic values). Rain, snow, fog, condensation, or dust deposition on the optical path of the instrument may affect the gas analyzer's performance and lead to less consistent/missing measurements. Because of the remote location and difficulties in climbing the met towers, no maintenance of these sensors was performed during the field experiment.
Facebook
Twitterhttps://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14886202%2F923fd95415179f2d404a950c244aa593%2F635b9f266c204621a757996f_63207860cd75456010c794da_machine-learning-public-datasets.png?generation=1722804438164522&alt=media" alt="">
This dataset contains shear strength measurements derived from unconfined compression tests for two types of soils, Soil1 and Soil2. The measurements are recorded in tons per square foot. The dataset includes the following data fields:
The following dataset is based on data provided by the World Bank (World Bank Edstats) and includes the 2015 PISA Test results. It provides educational performance metrics across various countries, detailing students' competencies in reading, mathematics, and science.
Data fields include various performance metrics and demographic information relevant to the PISA test results, offering insights into educational systems and student performance on an international scale.
Facebook
TwitterThe mission of the U.S. Geological Survey (USGS) involves providing reliable, impartial, and timely information that is needed to understand the Nation’s water resource. New techniques that aid in achieving this mission are important, especially those that allow USGS to do so more accurately or cost-effectively. To this end, a new method for selecting the optimum exposure time for velocity and discharge measurements has been explored. These data were assembled to assist in the development and evaluation of this new method. Four kinds of time-series data are available and used for this purpose. They are: (1) model-derived synthetic velocities, (2) point-velocity measurements in laboratory flumes, (3) point-velocity measurements in streams, and (4) water velocity profile measurements in streams. The model-derived velocity data were obtained using methods described in Garcia and others (2005). Point-velocity flume measurement data were obtained using a Nortek 16 MHz acoustic Doppler velocimeter (ADV) for the purpose of characterizing turbulence in the flow in a flume. Point-velocity measurement data collected in the field were obtained using a SonTek Flowtracker ADV (1 MHz) and an OTT acoustic Doppler current meter or ADC (6 Mhz) as a part of routine mid-section discharge measurements. Water velocity profile measurements in streams were collected using SonTek and Teledyne RD Instruments Acoustic Doppler current profilers (ADCPs) during routine mid-section discharge measurements. The laboratory ADV data were collected, processed, and exported using the associated ADV software. Data are provided in the zip file, ‘DynamicExpTime.zip’ which contains 4 types of times series data. The model-derived velocities are provided in a spreadsheet format. The ADV, ADC, Flowtracker, and ADCP data were exported from their native file formats and are provided in comma-separated value and ASCII text files. References García, C. M., Cantero, M. I., Niño, Y., and García, M. H. (2005). Turbulence measurements with Acoustic Doppler velocimeters: Journal of Hydraulic Engineering: v. 131 no. 12. [Also available at https://doi.org/10.1061/(asce)0733-9429(2005)131:12(1062).]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is presented in the paper:
Building and analysing a labelled Measure While Drilling dataset from 15 hard rock tunnels in Norway, by T.F. Hansen, Z. Liu, J. Torressen
The paper has a preprint on SSRN: http://dx.doi.org/10.2139/ssrn.4729646 and is under review in a peer-reviewed journal.
The dataset is utilised in a machine learning analysis in the paper:
Predicting rock type from MWD tunnel data using a reproducible ML-modelling process, by T.F. Hansen, Z. Liu, J. Torressen
The paper is published in the journal Tunnelling and Underground Space Technology:
https://doi.org/10.1016/j.tust.2024.105843
Description of the dataset:
Measure While Drilling (MWD) is a technique in rock drilling, mainly used in drill and blast tunnelling, where data about the rock mass is registered by sensors while drilling. The extensive and geologically diversified dataset contains corresponding MWD-data and rock mass mappings for 5205 blasting rounds from 15 hard rock tunnels in Norway. MWD-data are presented as tabular data. 10 different rocktypes are the corresponding labels.
Four files are given:
A csv-file of the training dataset - with outliers removed
A csv-file of the testing dataset (split train/test 0.75/0.25) - with outliers removed
A csv-file with the full unsplitted dataset, cleaned and with outliers removed
A csv-file with the raw dataset, before cleaning, processing and outlier removal
The author gratefully acknowledge the tunnel software/hardware company Bever Control, which have facilitated data from the clients Bane NOR, Statens Vegvesen, Nye Veier, and the contractor AF-Gruppen.
NOTE: The dataset is only available for research, no commercial use.
Facebook
TwitterPERIOD: FY 1933-1937. SOURCE: Résumé statistique des poids et mesures [Statistical Abstract of Weights and Measures]; [Statistics by government offices, overseas territories of Japan].
Facebook
TwitterIn 2001, the World Bank in co-operation with the Republika Srpska Institute of Statistics (RSIS), the Federal Institute of Statistics (FOS) and the Agency for Statistics of BiH (BHAS), carried out a Living Standards Measurement Survey (LSMS). The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:
To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs.
To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.
To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analyzed data.
The Department for International Development, UK (DFID) contributed funding to the LSMS and provided funding for a further two years of data collection for a panel survey, known as the Household Survey Panel Series (HSPS). Birks Sinclair & Associates Ltd. were responsible for the management of the HSPS with technical advice and support provided by the Institute for Social and Economic Research (ISER), University of Essex, UK. The panel survey provides longitudinal data through re-interviewing approximately half the LSMS respondents for two years following the LSMS, in the autumn of 2002 and 2003. The LSMS constitutes Wave 1 of the panel survey so there are three years of panel data available for analysis. For the purposes of this documentation we are using the following convention to describe the different rounds of the panel survey: - Wave 1 LSMS conducted in 2001 forms the baseline survey for the panel - Wave 2 Second interview of 50% of LSMS respondents in Autumn/ Winter 2002 - Wave 3 Third interview with sub-sample respondents in Autumn/ Winter 2003
The panel data allows the analysis of key transitions and events over this period such as labour market or geographical mobility and observe the consequent outcomes for the well-being of individuals and households in the survey. The panel data provides information on income and labour market dynamics within FBiH and RS. A key policy area is developing strategies for the reduction of poverty within FBiH and RS. The panel will provide information on the extent to which continuous poverty is experienced by different types of households and individuals over the three year period. And most importantly, the co-variates associated with moves into and out of poverty and the relative risks of poverty for different people can be assessed. As such, the panel aims to provide data, which will inform the policy debates within FBiH and RS at a time of social reform and rapid change. KIND OF DATA
National coverage. Domains: Urban/rural/mixed; Federation; Republic
Households
Sample survey data [ssd]
The Wave 3 sample consisted of 2878 households who had been interviewed at Wave 2 and a further 73 households who were interviewed at Wave 1 but were non-contact at Wave 2 were issued. A total of 2951 households (1301 in the RS and 1650 in FBiH) were issued for Wave 3. As at Wave 2, the sample could not be replaced with any other households.
Panel design
Eligibility for inclusion
The household and household membership definitions are the same standard definitions as a Wave 2. While the sample membership status and eligibility for interview are as follows: i) All members of households interviewed at Wave 2 have been designated as original sample members (OSMs). OSMs include children within households even if they are too young for interview. ii) Any new members joining a household containing at least one OSM, are eligible for inclusion and are designated as new sample members (NSMs). iii) At each wave, all OSMs and NSMs are eligible for inclusion, apart from those who move outof-scope (see discussion below). iv) All household members aged 15 or over are eligible for interview, including OSMs and NSMs.
Following rules
The panel design means that sample members who move from their previous wave address must be traced and followed to their new address for interview. In some cases the whole household will move together but in others an individual member may move away from their previous wave household and form a new split-off household of their own. All sample members, OSMs and NSMs, are followed at each wave and an interview attempted. This method has the benefit of maintaining the maximum number of respondents within the panel and being relatively straightforward to implement in the field.
Definition of 'out-of-scope'
It is important to maintain movers within the sample to maintain sample sizes and reduce attrition and also for substantive research on patterns of geographical mobility and migration. The rules for determining when a respondent is 'out-of-scope' are as follows:
i. Movers out of the country altogether i.e. outside FBiH and RS. This category of mover is clear. Sample members moving to another country outside FBiH and RS will be out-of-scope for that year of the survey and not eligible for interview.
ii. Movers between entities Respondents moving between entities are followed for interview. The personal details of the respondent are passed between the statistical institutes and a new interviewer assigned in that entity.
iii. Movers into institutions Although institutional addresses were not included in the original LSMS sample, Wave 3 individuals who have subsequently moved into some institutions are followed. The definitions for which institutions are included are found in the Supervisor Instructions.
iv. Movers into the district of Brcko are followed for interview. When coding entity Brcko is treated as the entity from which the household who moved into Brcko originated.
Face-to-face [f2f]
Data entry
As at Wave 2 CSPro was the chosen data entry software. The CSPro program consists of two main features to reduce to number of keying errors and to reduce the editing required following data entry: - Data entry screens that included all skip patterns. - Range checks for each question (allowing three exceptions for inappropriate, don't know and missing codes). The Wave 3 data entry program had more checks than at Wave 2 and DE staff were instructed to get all anomalies cleared by SIG fieldwork. The program was extensively tested prior to DE. Ten computer staff were employed in each Field Office and as all had worked on Wave 2 training was not undertaken.
Editing
Editing Instructions were compiled (Annex G) and sent to Supervisors. For Wave 3 Supervisors were asked to take more time to edit every questionnaire returned by their interviewers. The FBTSA examined the work twelve of the twenty-two Supervisors. All Supervisors made occasional errors with the Control Form so a further 100% check of Control Forms and Module 1 was undertaken by the FBTSA and SIG members.
The panel survey has enjoyed high response rates throughout the three years of data collection with the wave 3 response rates being slightly higher than those achieved at wave 2. At wave 3, 1650 households in the FBiH and 1300 households in the RS were issued for interview. Since there may be new households created from split-off movers it is possible for the number of households to increase during fieldwork. A similar number of new households were formed in each entity; 62 in the FBiH and 63 in the RS. This means that 3073 households were identified during fieldwork. Of these, 3003 were eligible for interview, 70 households having either moved out of BiH, institutionalised or deceased (34 in the RS and 36 in the FBiH).
Interviews were achieved in 96% of eligible households, an extremely high response rate by international standards for a survey of this type.
In total, 8712 individuals (including children) were enumerated within the sample households (4796 in the FBiH and 3916 in the RS). Within in the 3003 eligible households, 7781 individuals aged 15 or over were eligible for interview with 7346 (94.4%) being successfully interviewed. Within cooperating households (where there was at least one interview) the interview rate was higher (98.8%).
A very important measure in longitudinal surveys is the annual individual re-interview rate. This is because a high attrition rate, where large numbers of respondents drop out of the survey over time, can call into question the quality of the data collected. In BiH the individual re-interview rates have been high for the survey. The individual re-interview rate is the proportion of people who gave an interview at time t-1 who also give an interview at t. Of those who gave a full interview at wave 2, 6653 also gave a full interview at wave 3. This represents a re-interview rate of 97.9% - which is extremely high by international standards. When we look at those respondents who have been interviewed at all three years of the survey there are 6409 cases which are available for longitudinal analysis, 2881 in the RS and 3528 in the FBiH. This represents 82.8% of the responding wave 1 sample, a