Facebook
TwitterSince the beginning of the 1960s, Statistics Sweden, in collaboration with various research institutions, has carried out follow-up surveys in the school system. These surveys have taken place within the framework of the IS project (Individual Statistics Project) at the University of Gothenburg and the UGU project (Evaluation through follow-up of students) at the University of Teacher Education in Stockholm, which since 1990 have been merged into a research project called 'Evaluation through Follow-up'. The follow-up surveys are part of the central evaluation of the school and are based on large nationally representative samples from different cohorts of students.
Evaluation through follow-up (UGU) is one of the country's largest research databases in the field of education. UGU is part of the central evaluation of the school and is based on large nationally representative samples from different cohorts of students. The longitudinal database contains information on nationally representative samples of school pupils from ten cohorts, born between 1948 and 2004. The sampling process was based on the student's birthday for the first two and on the school class for the other cohorts.
For each cohort, data of mainly two types are collected. School administrative data is collected annually by Statistics Sweden during the time that pupils are in the general school system (primary and secondary school), for most cohorts starting in compulsory school year 3. This information is provided by the school offices and, among other things, includes characteristics of school, class, special support, study choices and grades. Information obtained has varied somewhat, e.g. due to changes in curricula. A more detailed description of this data collection can be found in reports published by Statistics Sweden and linked to datasets for each cohort.
Survey data from the pupils is collected for the first time in compulsory school year 6 (for most cohorts). Questionnaire in survey in year 6 includes questions related to self-perception and interest in learning, attitudes to school, hobbies, school motivation and future plans. For some cohorts, questionnaire data are also collected in year 3 and year 9 in compulsory school and in upper secondary school.
Furthermore, results from various intelligence tests and standartized knowledge tests are included in the data collection year 6. The intelligence tests have been identical for all cohorts (except cohort born in 1987 from which questionnaire data were first collected in year 9). The intelligence test consists of a verbal, a spatial and an inductive test, each containing 40 tasks and specially designed for the UGU project. The verbal test is a vocabulary test of the opposite type. The spatial test is a so-called ‘sheet metal folding test’ and the inductive test are made up of series of numbers. The reliability of the test, intercorrelations and connection with school grades are reported by Svensson (1971).
For the first three cohorts (1948, 1953 and 1967), the standartized knowledge tests in year 6 consist of the standard tests in Swedish, mathematics and English that up to and including the beginning of the 1980s were offered to all pupils in compulsory school year 6. For the cohort 1972, specially prepared tests in reading and mathematics were used. The test in reading consists of 27 tasks and aimed to identify students with reading difficulties. The mathematics test, which was also offered for the fifth cohort, (1977) includes 19 assignments. After a changed version of the test, caused by the previously used test being judged to be somewhat too simple, has been used for the cohort born in 1982. Results on the mathematics test are not available for the 1987 cohort. The mathematics test was not offered to the students in the cohort in 1992, as the test did not seem to fully correspond with current curriculum intentions in mathematics. For further information, see the description of the dataset for each cohort.
For several of the samples, questionnaires were also collected from the students 'parents and teachers in year 6. The teacher questionnaire contains questions about the teacher, class size and composition, the teacher's assessments of the class' knowledge level, etc., school resources, working methods and parental involvement and questions about the existence of evaluations. The questionnaire for the guardians includes questions about the child's upbringing conditions, ambitions and wishes regarding the child's education, views on the school's objectives and the parents' own educational and professional situation.
The students are followed up even after they have left primary school. Among other things, data collection is done during the time they are in high school. Then school administrative data such as e.g. choice of upper secondary school line / program and grades after completing studies. For some of the cohorts, in addition to school administrative data, questionnaire data were also collected from the students.
he sample consisted of students born on the 5th, 15th and 25th of any month in 1953, a total of 10,723 students.
The data obtained in 1966 were: 1. School administrative data (school form, class type, year and grades). 2. Information about the parents' profession and education, number of siblings, the distance between home and school, etc.
This information was collected for 93% of all born on the current days. The reason for this is reduced resources for Statistics Sweden for follow-up work - reminders etc. Annual data for cohorts in 1953 were collected by Statistics Sweden up to and including academic year 1972/73.
Response rate for test and questionnaire data is 88% Standard test results were received for just over 85% of those who took the tests.
The sample included a total of 9955 students, for whom some form of information was obtained.
Part of the "Individual Statistics Project" together with cohort 1953.
Facebook
TwitterThe rapid and massive dissemination of mobile phones in the developing world is creating new opportunities for the discipline of survey research. The World Bank is interested in leveraging mobile phone technology as a means of direct communication with poor households in the developing world in order to gather rapid feedback on the impact of economic crises and other events on the economy of such households.
The World Bank commissioned Gallup to conduct the Listening to LAC (L2L) pilot program, a research project aimed at testing the feasibility of mobile phone technology as a way of data collection for conducting quick turnaround, self-administered, longitudinal surveys among households in Peru and Honduras.
The project used face-to-face interviews as its benchmark, and included Short Message Service (SMS), Interactive Voice Response (IVR) and Computer Assisted Telephone Interviews (CATI) as test methods of data collection.
The pilot was designed in a way that allowed testing the response rates and the quality of data, while also providing information on the cost of collecting data using mobile phones. Researchers also evaluated if providing incentives affected panel attrition rates. The Honduras design was a test-retest design, which is closely related to the difference-in-difference methodology of experimental evaluation.
The random stratified multistage sampling technique was used to select a nationally representative sample of 1,500 households. During the initial face-to-face interviews, researchers gathered information on the socio-economic characteristics of households and recruited participants for follow-up research. Questions wording was the same in all modes of data collection.
In Honduras, after the initial face-to-face interviews, respondents were exposed to the remaining three methodologies according to a randomized scheme (three rotations, one methodology per week). Panelists in Honduras were surveyed for four and a half months, starting in February 2012.
Includes the entire national territory, with the exception of neighborhoods where access of interviewers is extremely difficult, due to lack of transportation infrastructure or for situations that threaten the physical integrity of the interviewers and supervisors (i.e. extremely high crime rate, warfare, etc.)
All the households that exist in the neighborhoods of Honduras, as reported by the 2001 Census. Institutions such as military, religious or educational living quarters are not included in the universe.
Sample survey data [ssd]
Honduras did not have an income oversample because the poverty rate is 60 percent, so oversampling 20 percent above the poverty rate would include a large portion of the middle class, which are not the most vulnerable in times of crisis.
The Honduras panel was built on a nationally representative sample of 1,500 households. The sample was drawn by means of a random, stratified, multistage design. The pilot used Gallup World Poll sampling frame.
Census-defined municipalities were classified into five strata according to population size: I. Municipalities with 500,000 to 999,000 inhabitants II. Municipalities with 100,000 to 499,000 inhabitants III. Municipalities with 50,000 to 99,000 inhabitants IV. Municipalities with 10,000 and 49,000 inhabitants V. Municipalities with less than 10,000 inhabitants
Interviews were then proportionally allocated to these five strata according to their share among the country's population.
The first stage of the design consisted of a random selection of Primary Sampling Units (PSU's) within each of the five strata previously defined.
In the second stage, in each PSU, one or more Secondary Sampling Units (SSU's) were then selected.
Once SSU's were selected, interviewers were sent to the field to proceed with the third stage of the sample design, which consisted of selecting households using a systematic "random route" procedure. Interviewers started from the previously selected "random origin" and walked around the block in clockwise direction, selecting every third household on their right hand side. They were also trained to handle vacant, nonresponsive, non-cooperative households, as well as other failed attempts, in a systematic manner.
Other [oth]
The following survey instruments were used in the project:
1) Initial face-to-face questionnaire
In Peru, the starting point was the ENAHO (National Household Survey) questionnaire. Step-wise regressions were done to select the set of questions that best predicted consumption. For the purposes of robustness, the regressions were also done with questions that best predicted income, which yielded the same results. A similar procedure was done in Honduras, using the latest household survey deployed by the Honduran Statistics Institute, except that only best predictors of income were chosen, because Honduras did not have a recent consumption aggregate.
The survey gathered information on households' demographics, household infrastructure, employment, remittances, income, accidents, food security, self-perceptions on poverty, Internet access and cellphones use.
2) Monthly questionnaires (SMS, IVR, CATI)
The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters. A maximum of 10 questions had to be chosen for the monthly questionnaire. In addition, two questions sought to ensure the validity of the responses by testing if the respondent was a member of the household. Most questions were time-variant and each questionnaire was repeated to observe if answers changed over time. All questions related to variables that strongly affect household welfare and are likely to change in times of crisis.
3) Final face-to-face questionnaire
Gallup conducted face-to-face closing surveys among 700 panelists. The researchers asked about issues the respondets had with mobile phones and coverage during the test. Panelists were also asked what would motivate them to keep on participating in a project like this in the future.
The questionnaires were worded exactly the same way, regardless of the mode, which meant short questions, since SMS is limited to 160 characters, unlike IVR and CATI.
In Honduras, 41% of recruited households failed to answer the first round of follow-up surveys. The attrition rate from the initial face-to-face interview to the end of panel study was 50%.
As part of the survey administration process Gallup implemented a number of mechanisms to maximize the response rate and panelist retention. The following strategies were applied to respondents who did not replay first time:
Also, in order to minimize non-response, three types of incentives were given. First, households that did not own a mobile phone were provided one for free. Approximately 127 phones were donated in Honduras. Second, all communications between the interviewers and the households were free to the respondents. Finally, households were randomly assigned to one of three incentive levels: one-third of households received US$1 in free airtime for each questionnaire they answered, one-third received US$5 in free airtime, and one-third received no financial incentive (the control group).
Facebook
TwitterBackground: A systematic review of community health worker led interventions carrying out proactive case detection and management of childhood illnesses in low and middle income countries found that home based proactive case detection improved treatment coverage for malaria and reduced newborn and infant mortality. However, there is no high quality evidence on the effect of home based proactive case detection on all cause under-five mortality. Identifying the effect of home based proactive case detection on all cause under-five mortality will directly inform the design and delivery of national community health strategies in many low and middle income countries.
Objective: To test the effectiveness of proactive home visits by trained community health workers on all cause mortality among children under five years of age (primary endpoint) and a set of secondary endpoints related to healthcare utilization among children under five years of age, maternal health, and reproductive health.
Methods: We conducted a two arm, parallel, unblinded cluster randomized trial in 137 village-clusters in seven primary health center catchment areas in rural Mali. Village-clusters were randomized 1:1 to receive comprehensive primary care services, led by proactive CHWs providing regular home visits (intervention) or by CHWs providing care at a fixed post (control) over a three year period. In both arms, user fees were removed and PHCs received staffing and infrastructure improvements prior to the trial start. To assess trial outcomes, we conducted household panel surveys at baseline (December 2016–January 2017), and after 12 months (February–March 2018), 24 months (March–May 2019) and 36 months (February–April 2020).
All households in the study area were eligible to participate in this survey. Female interviewers, who were not a resident of the study area, administered the surveys to consenting (18 years or older) or assenting (15–17 years) women of reproductive age (at enrolment) at their homes. The survey instrument was adapted from the Mali Demographic and Health Survey questionnaire, encoded in Open Data Kit and loaded onto mobile tablets for use by interviewers.. Each survey included a household roster and modules on sociodemographic characteristics, reproductive and maternal health, and recent illness and health-care utilization among children younger than five years of age. At follow-up surveys, respondents reported their lifetime birth histories and the number of CHW home visits their household received in the preceding month. We updated household rosters at each survey round to identify new members (due to births, migration, marriage or adoption) and those absent due to migration or death. At each time point, we invited newly eligible women (reaching reproductive age or arriving in study area) to participate. In all surveys, we made up to three attempts to contact each eligible household and woman. Study outcomes: The primary endpoint of this study was all-cause under-five mortality. Secondary outcomes included the following: Infant mortality rate Newborn mortality rate Prevalence of diarrhea, acute respiratory illness, and fever among children under 5 in the preceding two weeks For any symptom among children under 5 in the preceding two weeks, receiving any care; receiving care from within the health sector; and receiving timely treatment; Receipt of oral rehydration therapy and zinc among children under 5 reporting diarrhea in the preceding two weeks; Receipt of a malaria diagnostic test among children under 5 reporting fever in the preceding two weeks; Receipt of antibiotics within 24 hours of onset of cough and fast breathing among children under 5 reporting acute respiratory infection in the preceding two weeks; Receipt of a pregnancy test for women of reproductive age if indicated; For the most recent pregnancy in the intervention period, receipt of 3 or more doses of Sulfadoxine-Pyrimethamine (SP) as Intermittent Preventive Treatment (IPTp); enrollment in antenatal care (ANC) with a skilled provider in the first trimester; completion of four or more ANC consultations with a skilled provider; delivery with a skilled provider; delivery at a health facility; receipt of a postnatal consultation for the mother from a skilled provider, auxiliary midwife, or CHW within 24 hours of delivery; receipt of a postnatal consultation for the newborn from a skilled provider, auxiliary midwife, or CHW within 24 hours of delivery. Use of a modern method of contraception among women of reproductive age Use of a long-acting reversible contraception method among women of reproductive age
Seven health catchment areas in the Bankass district, Mopti region, Mali.
All analytic files are at the individual level, with the exception of the “Menage” file (household level).
This study covers all households in the geographic area where the ProCCM Trial was conducted. Some data are specific to women of reproductive age, 15 to 49 years, who reside in these files. These women of reproductive age reported about their children under five years of age, as well as their live births, regardless of whether that child is still living or died.
Données échantillonées [ssd]
All households and individuals in the study area were eligible to participate in the annual household surveys related to the ProCCM. At each wave, all households in a cluster were approached by a trained interviewer. Interviewers attempted to interview a household up to three times. During each interview, we collected a household roster where each household member was assigned a unique identifier.
Because all eligible individuals and households had the same probability of selection into the sample, there are no sample weights.
During the trial, violent insecurity reached central Mali, affecting trial clusters and the lives of participants and providers. Six clusters from three health catchment areas dropped out of the trial due to insecurity at the 24- or 36-month follow-up: four were targeted and destroyed by the conflict and two were inaccessible to the survey team due to insecurity.
Interview face à face [f2f]
Household survey
Data processing and quality control do files were set up before data collection began, and the first test of these programs was carried out after the pre-test. When data collection began, we established a schedule for synchronizing the data on the Ona server(ODK Collect), and once we had confirmation that the data had been synchronized by the field supervisors, we proceeded to extract, process and anticipate the various data attributes, before, during and after data collection.
Not collected.
NA
NA
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Citizen science, the involvement of volunteers in collecting of scientific data, can be a useful research tool. However, data collected by volunteers are often of lower quality than those collected by professional scientists. We studied the accuracy with which the volunteers identified insects visiting ivy (Hedera) flowers in Sussex, England. In the first experiment, we examined the effects of training method, volunteer background and prior experience. Fifty-three participants were trained for the same duration using one of three different methods (pamphlet, pamphlet + slide show or pamphlet + direct training). Almost immediately following training, we tested the ability of the participants to identify live insects on ivy flowers to one of 10 taxonomic categories and recorded whether their identifications were correct or incorrect, without providing feedback. The results showed that the type of training method had a significant effect on identification accuracy (P = 0·008). Participants identified 79·1% of insects correctly after using a one-page colour pamphlet, 85·6% correctly after using the pamphlet and viewing a slide show and 94·3% correctly after using the pamphlet in combination with direct training in the field. As direct training cannot be delivered remotely, in the following year we conducted a second experiment, in which a different sample of 26 volunteers received the pamphlet plus slide show training repeatedly three times. Moreover, in this experiment, participants received c. 2 min of additional training material, either videos of insects or stills taken from the videos. Testing showed that identification accuracy increased from 88·6% to 91·3% to 97·5% across the three successive tests. We also found a borderline significant interaction between the type of additional material and the test number (P = 0·053), such that the video gave fewer errors than stills in the first two tests only. The most common errors made by volunteers were misidentifications of honeybees and social wasps with their hover fly mimics. We also tested six experts who achieved nearly perfect accuracy (99·8%), which shows what is possible in practice. Overall, our study shows that two or three sessions of remote training can be as good as one of direct training, even for relatively challenging taxonomic discriminations that include distinguishing models and mimics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains inertial data from 4 wearable sensor nodes and 1 wearable patch worn by 20 healthy adult participants performing a series of physical functioning tests (including the short physical performance battery, the timed up and go test, a walking test and balance tests). Details of patient demographics, the physical functioning tests and of each sensor are contained in files in the main folder.
Inertial data (accelerometer and gyroscope) is contained in two folders relating to each sensor type. The start and end time for each sensor can be taken from the details in each folder structure, as detailed below. The times given are specific to each sensor's monitoring system which are not exactly synchronised. As such, a manual synchronisation shaking protocol was followed where all sensors were strapped together and shaken three times in succession at the start of each data collection period. The physical functioning test times will also need to be synchronised.
-> Inertial sensor data / (subject id).zip / (subject id) / (date_time_crossTest_SD_session#) / -> Wearable inertial patch / (subject id) / (date)T(time) /
Facebook
TwitterM2T3NVCLD (or tavg3_3d_cld_Nv) is a 3-dimensional 3-hourly time averaged data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of cloud diagnostics at 72 model layers, such as cloud fraction for radiation, pressure thickness, in cloud cloud ice (or liquid) for radiation, and relative humidity. The data field is available every three hour starting from 01:30 UTC, e.g.: 01:30, 04:30, … , 22:30 UTC. Section 4.2 of the MERRA-2 File Specification document provides pressure values nominal for a 1000 hPa surface pressure and refers to the top edge of the layer. The lev=1 is for the top layer, and lev=72 is for the bottom (or surface) model layer. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov).
Facebook
TwitterIn 2001, the World Bank in co-operation with the Republika Srpska Institute of Statistics (RSIS), the Federal Institute of Statistics (FOS) and the Agency for Statistics of BiH (BHAS), carried out a Living Standards Measurement Survey (LSMS). The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:
To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs.
To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.
To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analyzed data.
The Department for International Development, UK (DFID) contributed funding to the LSMS and provided funding for a further two years of data collection for a panel survey, known as the Household Survey Panel Series (HSPS). Birks Sinclair & Associates Ltd. were responsible for the management of the HSPS with technical advice and support provided by the Institute for Social and Economic Research (ISER), University of Essex, UK. The panel survey provides longitudinal data through re-interviewing approximately half the LSMS respondents for two years following the LSMS, in the autumn of 2002 and 2003. The LSMS constitutes Wave 1 of the panel survey so there are three years of panel data available for analysis. For the purposes of this documentation we are using the following convention to describe the different rounds of the panel survey: - Wave 1 LSMS conducted in 2001 forms the baseline survey for the panel - Wave 2 Second interview of 50% of LSMS respondents in Autumn/ Winter 2002 - Wave 3 Third interview with sub-sample respondents in Autumn/ Winter 2003
The panel data allows the analysis of key transitions and events over this period such as labour market or geographical mobility and observe the consequent outcomes for the well-being of individuals and households in the survey. The panel data provides information on income and labour market dynamics within FBiH and RS. A key policy area is developing strategies for the reduction of poverty within FBiH and RS. The panel will provide information on the extent to which continuous poverty is experienced by different types of households and individuals over the three year period. And most importantly, the co-variates associated with moves into and out of poverty and the relative risks of poverty for different people can be assessed. As such, the panel aims to provide data, which will inform the policy debates within FBiH and RS at a time of social reform and rapid change. KIND OF DATA
National coverage. Domains: Urban/rural/mixed; Federation; Republic
Households
Sample survey data [ssd]
The Wave 3 sample consisted of 2878 households who had been interviewed at Wave 2 and a further 73 households who were interviewed at Wave 1 but were non-contact at Wave 2 were issued. A total of 2951 households (1301 in the RS and 1650 in FBiH) were issued for Wave 3. As at Wave 2, the sample could not be replaced with any other households.
Panel design
Eligibility for inclusion
The household and household membership definitions are the same standard definitions as a Wave 2. While the sample membership status and eligibility for interview are as follows: i) All members of households interviewed at Wave 2 have been designated as original sample members (OSMs). OSMs include children within households even if they are too young for interview. ii) Any new members joining a household containing at least one OSM, are eligible for inclusion and are designated as new sample members (NSMs). iii) At each wave, all OSMs and NSMs are eligible for inclusion, apart from those who move outof-scope (see discussion below). iv) All household members aged 15 or over are eligible for interview, including OSMs and NSMs.
Following rules
The panel design means that sample members who move from their previous wave address must be traced and followed to their new address for interview. In some cases the whole household will move together but in others an individual member may move away from their previous wave household and form a new split-off household of their own. All sample members, OSMs and NSMs, are followed at each wave and an interview attempted. This method has the benefit of maintaining the maximum number of respondents within the panel and being relatively straightforward to implement in the field.
Definition of 'out-of-scope'
It is important to maintain movers within the sample to maintain sample sizes and reduce attrition and also for substantive research on patterns of geographical mobility and migration. The rules for determining when a respondent is 'out-of-scope' are as follows:
i. Movers out of the country altogether i.e. outside FBiH and RS. This category of mover is clear. Sample members moving to another country outside FBiH and RS will be out-of-scope for that year of the survey and not eligible for interview.
ii. Movers between entities Respondents moving between entities are followed for interview. The personal details of the respondent are passed between the statistical institutes and a new interviewer assigned in that entity.
iii. Movers into institutions Although institutional addresses were not included in the original LSMS sample, Wave 3 individuals who have subsequently moved into some institutions are followed. The definitions for which institutions are included are found in the Supervisor Instructions.
iv. Movers into the district of Brcko are followed for interview. When coding entity Brcko is treated as the entity from which the household who moved into Brcko originated.
Face-to-face [f2f]
Data entry
As at Wave 2 CSPro was the chosen data entry software. The CSPro program consists of two main features to reduce to number of keying errors and to reduce the editing required following data entry: - Data entry screens that included all skip patterns. - Range checks for each question (allowing three exceptions for inappropriate, don't know and missing codes). The Wave 3 data entry program had more checks than at Wave 2 and DE staff were instructed to get all anomalies cleared by SIG fieldwork. The program was extensively tested prior to DE. Ten computer staff were employed in each Field Office and as all had worked on Wave 2 training was not undertaken.
Editing
Editing Instructions were compiled (Annex G) and sent to Supervisors. For Wave 3 Supervisors were asked to take more time to edit every questionnaire returned by their interviewers. The FBTSA examined the work twelve of the twenty-two Supervisors. All Supervisors made occasional errors with the Control Form so a further 100% check of Control Forms and Module 1 was undertaken by the FBTSA and SIG members.
The panel survey has enjoyed high response rates throughout the three years of data collection with the wave 3 response rates being slightly higher than those achieved at wave 2. At wave 3, 1650 households in the FBiH and 1300 households in the RS were issued for interview. Since there may be new households created from split-off movers it is possible for the number of households to increase during fieldwork. A similar number of new households were formed in each entity; 62 in the FBiH and 63 in the RS. This means that 3073 households were identified during fieldwork. Of these, 3003 were eligible for interview, 70 households having either moved out of BiH, institutionalised or deceased (34 in the RS and 36 in the FBiH).
Interviews were achieved in 96% of eligible households, an extremely high response rate by international standards for a survey of this type.
In total, 8712 individuals (including children) were enumerated within the sample households (4796 in the FBiH and 3916 in the RS). Within in the 3003 eligible households, 7781 individuals aged 15 or over were eligible for interview with 7346 (94.4%) being successfully interviewed. Within cooperating households (where there was at least one interview) the interview rate was higher (98.8%).
A very important measure in longitudinal surveys is the annual individual re-interview rate. This is because a high attrition rate, where large numbers of respondents drop out of the survey over time, can call into question the quality of the data collected. In BiH the individual re-interview rates have been high for the survey. The individual re-interview rate is the proportion of people who gave an interview at time t-1 who also give an interview at t. Of those who gave a full interview at wave 2, 6653 also gave a full interview at wave 3. This represents a re-interview rate of 97.9% - which is extremely high by international standards. When we look at those respondents who have been interviewed at all three years of the survey there are 6409 cases which are available for longitudinal analysis, 2881 in the RS and 3528 in the FBiH. This represents 82.8% of the responding wave 1 sample, a
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe problem of sedentary behavior among primary school children is alarming, with numbers gradually increasing worldwide, including Sri Lanka. Physical activity interventions within classroom settings have been acknowledged as a critical strategy to increase students’ movement behaviors while enhancing their academic achievement and health. Yet, the busy curriculum and challenging educational demands encourage more sedentary classroom behavior. Hence, this study aims to develop and evaluate an in-classroom physical activity breaks (IcPAB) intervention among fifth graders in Sri Lanka.MethodsThe study will adopt a randomized controlled trial (RCT), comprising an in-classroom physical activity breaks program group and a control group to evaluate the effects of IcPAB on academic achievement, movement behaviors and health outcomes. The intervention design is based on the capability (C), opportunity (O) and motivation (M) behavior (B) (COM-B) model. A least 198 fifth graders will be recruited from two schools in Uva province, Sri Lanka. The recruitment process will start in late 2022. Class teachers of the intervention group will implement 5-min activity breaks at least three times a day after completing a training session. The primary variables include mathematics and reading achievement. The secondary variables include physical activity levels, steps count, sedentary behavior, body mass index, aerobic fitness, and perceived stress. Data collection will be implemented at pre-test and post-test, respectively. Intervention fidelity and the process will also be evaluated.DiscussionThe IcPAB is designed to prevent pure educational time loss by introducing curriculum-integrated short bouts of physical active breaks into the classroom routine. If the IcPAB is effective, it can (1) improve the mathematics and reading achievement of fifth-grade girls and boys, which is a significant factor determining the performance at the Grade Five National Scholarship Examination in Sri Lanka; (2) improve movement behaviors as well as physical and mental health outcomes among primary school students. Sequentially, the IcPAB will enrich school-based physical activity intervention approaches which can in turn bring academic and health benefits to primary school children in Sri Lanka.Trial registrationThe first version of the trial was registered with the ISRCTN registry (Ref: ISRCTN52180050) on 20/07/2022.
Facebook
TwitterTransparency data for standards to account for delays in apprentices completing their programme. We publish information on learners who completed or withdrew from their learning between August and November 2019. We estimate that including these learners would increase the national overall apprenticeship achievement rate by 0.4 percentage points, and that for standards by 3.9 percentage points.
Where a college merger has taken place during the 2018 to 2019 academic year we have provided the headline figures for those individual colleges which make up the constituent parts of the merger.
Redactions – we have redacted 14 providers from our formal performance tables (NARTs) where we are unable to form a reliable QAR. This is done where the data we hold does not allow us to calculate a reliable estimate and therefore provides an unfair measure of performance. We publish headline information for these providers separately for transparency, but they do not constitute a formal QAR and should not be used to compare performance. The underpinning data is included in our national achievement rates to provide a complete view of performance. We estimate that excluding the apprenticeship data of these providers would increase the national overall apprenticeship achievement rate in the 2018 to 2019 academic year by 0.6 percentage points.
In addition, we are making further experimental data available split by campus. Collection of campus data was only introduced into the ILR during the 2018/19 academic year. As it is the first year that providers have been asked to collect this information we have provided this data simply as an early view of what campus achievement rates look like. Any learners who withdrew before the start of 2018/19 will be shown with no campus identified.
The 3 year time-series spreadsheets for 2018 to 2019 are available along with the rest of the NARTs tables.
To accompany the National Achievement Rate Tables (NARTs), a number of other tables are made available to give transparency to the system:
Newcastle College Group (NCG) components - the NCG QAR used for performance management has been included in the detailed NART tables. NCG have been undertaking a specific pilot for data collection arrangements. We have included a table that gives the headline figures for those individual providers that make up the constituent parts of Newcastle College Group.
Merged Colleges – where a college merger has taken place during the 2017 to 2018 and 2016 to 2017 academic year, NARTs will show the aggregated view for the merged college. We have provided the headline figures for those individual colleges that make up the constituent parts of the merger.
Redactions – we redact providers from our formal performance tables (NARTs) where we are unable to form a reliable QAR. This is done where the data we hold does not allow us to calculate a reliable estimate and therefore provides an unfair measure of performance. However, we publish headline information for these providers separately for transparency. The underpinning data is included in our national achievement rates to provide a complete view of performance.
The 3 year time-series spreadsheets for 2017 to 2018 and 2016 to 2017 are available along with the rest of the NARTs tables.
The implementation of the improved methodology for the 2015 to 2016 qualification achievement rates led to a significant impact on the estimates compared to previous years. Therefore for the first time, we published a three-year comparison at the national level as part of the further education (FE) and skills statistical first release (SFR).
We have assessed what additional in
Facebook
TwitterM2T3NPRAD (or tavg3_3d_rad_Np) is a 3-dimensional 3-hourly time averaged data collection in Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2). This collection consists of assimilations of radiation diagnostics on 42 pressure levels, such as cloud fraction for radiation, and air temperature tendency due to longwave (or shortwave). The data field is available every three hour starting from 01:30 UTC, e.g.: 01:30, 04:30, … , 22:30 UTC. The information on the pressure levels can be found in the section 4.2 of the MERRA-2 File Specification document. MERRA-2 is the latest version of global atmospheric reanalysis for the satellite era produced by NASA Global Modeling and Assimilation Office (GMAO) using the Goddard Earth Observing System Model (GEOS) version 5.12.4. The dataset covers the period of 1980-present with the latency of ~3 weeks after the end of a month. Data Reprocessing: Please check “Records of MERRA-2 Data Reprocessing and Service Changes” linked from the “Documentation” tab on this page. Note that a reprocessed data filename is different from the original file.MERRA-2 Mailing List: Sign up to receive information on reprocessing of data, changing of tools and services, as well as data announcements from GMAO. Contact the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov) to be added to the list.Questions: If you have a question, please read "MERRA-2 File Specification Document", “MERRA-2 Data Access – Quick Start Guide”, and FAQs linked from the ”Documentation” tab on this page. If that does not answer your question, you may post your question to the NASA Earthdata Forum (forum.earthdata.nasa.gov) or email the GES DISC Help Desk (gsfc-dl-help-disc@mail.nasa.gov).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental Setup Our study was conducted in a greenhouse in Ames, Iowa (42°01'42.7"N 93°38'40.9"W) between May 2018 and October 2019. We used four accessions (LA3123, LA1237, LA2655, LA2983) of S. pimpinellifolium collected from the Galapagos, Northern Ecuador, Northern Peru, and Central Peru and obtained from the Tomato Genetic Resource Center (TGRC) at the University of California Davis (Table S1). We chose these accessions (hereafter populations) to obtain a broad geographic spread and because they have known population sizes and high seed viability. We planted seeds in May 2018. Once seedlings reached maturity about three months after planting, we picked four of the most phenotypically distinct individuals from each population as representative genotypes (total of 16 genotypes) and took cuttings from each to make nine clonal replicates per genotype. Experimental plants were grown under ambient light during the spring, summer, and fall, and supplemented with light in the winter to maintain 12 hours of light per day to more closely match their original tropical conditions. The ambient temperature ranged between 21 and 26°C and we watered plants as needed to maintain moist soil (Fig. S1). We fertilized plants twice weekly to reduce nutrient limitation and sprayed plants with insecticide and miticide monthly to prevent white flies and spider mites (Supplemental Information). Plants were placed across two rooms of a greenhouse, one meter apart in a random arrangement with respect to treatment and population. Because S. pimpinellifolium has indeterminate growth, we pruned plants weekly to maintain a height of approximately one meter. Starting in January 2019, we hand-pollinated all flowers weekly to ensure even pollen distribution, maximum fruit set, and reduced selfing. Pollen was manually collected from all genotypes within a population and then pooled and used for hand-pollination of all flowers in that population. Fruit Removal Treatments Beginning in February 2019, we established three fruit removal treatments – high, moderate, and low – to test how plants respond to the rate of frugivory. We randomly assigned three clonal replicates per genotype to each treatment (Figure 1). In the ‘high’ removal treatment, we removed all ripe fruits three times a week to simulate the high level of frugivory we might expect during peak fruiting season with an intact frugivore community. In the ‘moderate’ treatment, we removed approximately half of the ripe fruits present on a weekly basis. In the ‘low’ removal treatment simulating the absence of frugivores, we did not remove any fruits except the subset we used for data collection once every four weeks, equivalent to less than five percent of the total fruits present at a given time. Fruit removal events over the duration of the study for every individual in the high and moderate treatments are shown in Figure S2. Though fruit yield (as estimated by total removal in high removal treatment) was highly variable across individuals (mean = 85, s = 75), all individuals that produced little to no fruit (yield < 15 fruits) over the entire duration of the study were removed from the analysis (n = 15) and these were distributed approximately evenly across treatments. Data Collection Every four weeks beginning April 2019, we located and marked the pedicels of the two most basal racemes on each plant with a flower ready to receive pollen. We then hand-pollinated the first three flowers on each raceme and tagged them. By marking each raceme and pedicel with nontoxic paint, we tracked ripening time and collected these fruits when ripe for morphological trait measurements. We determined fruits were ripe using a combination of color and softness cues (Grumet et al., 1981). When removing each fruit, we used a twisting motion to separate the fruit from the sepal, rather than removing from the weakened point on the pedicel to mimic removal by frugivores (Douglas J. Levey et al., 2006). We measured the following morphological traits: fruit size, fresh mass, seeds per fruit, pulp dry mass, and seed dry mass. Fresh mass and fruit size were recorded on the same day the fruit was picked. Fruit size was recorded as two lengths, the first from the point of attachment to the most distal end, and the second as the longest length perpendicular to the first. To measure the dry mass of the seeds and pulp, the gelatinous sac around each seed was manually removed and both seeds and fruit pulp were put in a drying oven at 170°C for 48 hours or until reaching a constant mass. We could not collect data on the total fruit set because of the pruning required throughout the experiment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In April 2020 Eurostat set up an exceptional data collection on total weekly deaths, in order to support the policy and research efforts related to Covid-19. With this data collection, Eurostat's target was to provide quickly statistics that show the changing situation of the total number of weekly deaths from early 2020 onwards.
The available data on the total weekly deaths are transmitted by the National Statistical Institutes to Eurostat on voluntary basis. Data are collected cross classified by sex, 5-year age-groups and NUTS3 region (NUTS2021). The age breakdown by 5-year age group is the most significant and should be considered by the reporting countries as the main option; when that is not possible, data may be provided with less granularity. Similar with the regional structure, data granularity varies with the country.
Eurostat requested from the National Statistical Institutes the transmission of a back time series of weekly deaths for as many year as possible, recommending as starting point the year 2000. Shorter time series, imposed by data availability, are transmitted by some countries. A long enough time series is necessary for temporal comparisons and statistical modelling.
A note on Ireland: Data from Ireland were not included in the first phase of the weekly deaths data collection: official timely data were not available because deaths can be registered up to three months after the date of death. Because of the COVID-19 pandemic, the Central Statistics Office of Ireland began to explore experimental ways of obtaining up-to-date mortality data, finding a strong correlation between death notices published on RIP.ie and official mortality statistics. Recently, CSO Ireland started publishing a time series covering the period from October 2019 until the most recent weeks, using death notices (see CSO website). For the purpose of this release, Eurostat compared the new 2020-2021 web-scraped series with a 2016-2019 baseline established using official data. CSO is periodically assessing the quality of these data.
The purpose of Eurostat’s online tables in the folder Weekly deaths - special data collection (demomwk) is to make available to users information on the weekly number of deaths disaggregated by sex, 5 years age group and NUTS3 regions over the last 20 years, depending on the availability in each country covered in Eurostat demographic statistics data collections. In order to ensure the highest timeliness possible, data are made available as reported by the countries, and work is ongoing in order to improve data quality and user friendliness.
Starting in 2025, the weekly deaths data is collected on a quarterly basis. The database updates are expected by mid-June (release of monthly data for 1st quarter of the year), mid-September (2nd quarter), mid-December (3rd quarter), and mid-February (4th quarter).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset originates from projects focused on the sorting of used clothes within a sorting facility. The primary objective is to classify each garment into one of several categories to determine its ultimate destination: reuse, reuse outside Sweden (export), recycling, repair, remake, or thermal waste.
The dataset has 31,997 clothing items, a massive update from the 3,000 items in version 1. The dataset collection started under the Vinnova funded project "AI for resource-efficient circular fashion" in Spring, 2022 and involves collaboration among three institutions: RISE Research Institutes of Sweden AB, Wargön Innovation AB, and Myrorna AB. The dataset has received further support through the EU project, CISUTAC (cisutac.eu).
- Webpage: https://fnauman.github.io/second-hand-fashion/">second-hand-fashion
- Contact: farrukh.nauman@ri.se
- The dataset contains 31,997 clothing items, each with a unique item ID in a datetime format. The items are divided into three stations: `station1`, `station2`, and `station3`. The `station1` and `station2` folders contain images and annotations from Wargön Innovation AB, while the `station3` folder contains data from Myrorna AB. Each clothing item has three images and a JSON file containing annotations.
- Three images are provided for each clothing item:
1. Front view.
2. Back view.
3. Brand label close-up. About 4000-5000 brand images are missing because of privacy concerns: people's hands, faces, etc. Some clothing items did not have a brand label to begin with.
- Image resolutions are primarily in two sizes: `1280x720` and `1920x1080`. The background of the images is a table that used a measuring tape prior to January 2023, but later images have a square grid pattern with each square measuring `10x10` cm.
- Each JSON file contains a list of annotations, some of which require nuanced interpretation (see `labels.py` for the options):
- `usage`: Arguably the most critical label, usage indicates the garment's intended pathway. Options include 'Reuse,' 'Repair,' 'Remake,' 'Recycle,' 'Export' (reuse outside Sweden), and 'Energy recovery' (thermal waste). About 99% of the garments fall into the 'Reuse,' 'Export,' or 'Recycle' categories.
- `price`: The price field should be viewed as suggestive rather than definitive. Pricing models in the second-hand industry vary widely, including pricing by weight, brand, demand, or fixed value. Wargön Innovation AB does not determine actual pricing.
- `trend`: This field refers to the general style of the garment, not a time-dependent trend as in some other datasets (e.g., Visuelle 2.0). It might be more accurately labeled as 'style.'
- `material`: Material annotations are mostly based on the readings from a Near Infrared (NIR) scanner and in some cases from the garment's brand label.
- Damage-related attributes include:
- `condition` (1-5 scale, 5 being the best)
- `pilling` (1-5 scale, 5 meaning no pilling)
- `stains`, `holes`, `smell` (each with options 'None,' 'Minor,' 'Major').
Note: 'holes' and 'smell' were introduced after November 17th, 2022, and stains previously only had 'Yes'/'No' options. For `station1` and `station2`, we introduced additional damage location labels to assist in damage detection:
"damageimage": "back",
"damageloc": "bottom left",
"damage": "stain ",
"damage2image": "front",
"damage2loc": "None",
"damage2": "",
"damage3image": "back",
"damage3loc": "bottom right",
"damage3": "stain"
Taken from `labels_2024_04_05_08_47_35.json` file. Additionally, we annotated a few hundred images with bounding box annotations that we aim to release at a later date.
- `comments`: The comments field is mostly empty, but sometimes contains important information about the garment, such as a detailed text description of the damage.
- Whenever possible, ISO standards have been followed to define these attributes on a 1-5 scale (e.g., `pilling`).
- Gold dataset: `Test` inside the comments field is meant for garments that were annotated multiple times by different annotators for annotator agreement comparisons. These 100 garments were annotated twice at Wargön Innovation AB (search within `station1/[dec2022,feb2023]`)and once at Myrorna AB (see `station3/test100` folder for JSON files containing their annotations).
- The data has been annotated by a group of expert second-hand sorters at Wargön Innovation AB and Myrorna AB.
- Some attributes, such as `price`, should be considered with caution. Many distinct pricing models exist in the second-hand industry:
- Price by weight
- Price by brand and demand (similar to first-hand fashion)
- Generic pricing at a fixed value (e.g., 1 Euro or 10 SEK)
Wargön Innovation AB does not set the prices in practice and their prices are suggestive only (`station1` and `station2`). Myrorna AB (`station3`), in contrast, does resale and sets the prices.
- We received feedback on our version 1 that some images were too blurry or had poor lighting. The image quality has slightly improved, but largely remains similar to release 1.
- We further learned that a handful of data items were duplicates. Several duplicate images were removed, but about 400 still remain.
- Some users did not prefer a `tar.gz` format that we uploaded in version 1 of the dataset. We have now switched to `.zip` for convenience.
- Most JSON files parse fine using any standard JSON reader, but a handful that are problematic have been set aside in the `json_errors` folder.
- Extra care was taken not to leak personal information. This is why you will not see any entries for `annotator` attribute in the JSON files in station1/sep2023 since people used their real names. Since then, we used internally assigned IDs.
- Many brand images contained people's hands, faces, or other personal information. We have removed about 4000-5000 brand images for privacy reasons.
- Please inform us immediately if you find any personal information revelations in the dataset:
- Farrukh Nauman (RISE AB): `farrukh.nauman@ri.se`,
- Susanne Eriksson (Wargön Innovation AB): `susanne.eriksson@wargoninnovation.se`,
- Gabriella Engstrom (Wargön Innovation AB): `gabriella.engstrom@wargoninnovation.se`.
We went through 100k images three times to ensure no personal information is leaked, but we are human and can make mistakes.
The data collection for this dataset has been carried out in collaboration with the following partners:
1. RISE Research Institutes of Sweden AB: RISE is a leading research institute dedicated to advancing innovation and sustainability across various sectors, including fashion and textiles.
2. Wargön Innovation AB: Wargön Innovation is an expert in sustainable and circular fashion solutions, contributing valuable insights and expertise to the dataset creation.
3. Myrorna AB: Myrorna is Sweden's oldest chain of stores for collecting clothes and furnishings that can be reused.
CC-BY 4.0. Please refer to the LICENSE file for more details.
This dataset was made possible through the collaborative efforts of RISE Research Institutes of Sweden AB, Wargön Innovation AB, and Myrorna AB, with funding from Vinnova and support from the EU project CISUTAC. We extend our gratitude to all the expert second-hand sorters and annotators who contributed their expertise to this project.
Facebook
TwitterSave the Children has developed a low-cost and potentially scalable early stimulation program that delivers actionable messages to mothers and other caregivers that show them how to interact and play with young children. American Institutes for Research (AIR) and its research partners at Data International, the International Centre for Diarrhoeal Disease Research, Bangladesh, and Minhaj Mahmud, the head of research of BRAC Institute of Governance of BRAC University, are conducting a cluster-randomized control trial to evaluate the impact of the early stimulation program in the regions of Satkania, Muladi, and Kulaura in Bangladesh. The study is also receiving advice from a Technical Advisory Board consisting of child development and nutrition specialists and government officials in Bangladesh.
In this evaluation, community clinics within the same union are randomly assigned to either receive the Save the Children intervention or not. Data on individual child outcomes and family stimulation behavior are collected from households within the catchment areas of these community clinics. Data from service providers operating within each community clinic’s catchment area are also collected.
Seventy eight community clinics are participating in the study, with half receiving the intervention (the treatment group) and half not receiving it (the control group). Thirty three households with children between 3 and 18 months of age residing in the catchment area of each community clinic at the time of baseline data collection were randomly sampled, resulting in a total sample size of 2,574 households, half treatment and half control.
Bangladesh is divided into seven major administrative regions called divisions, and the study takes place in three of Bangladesh’s seven divisions: Barisal (a southern district), Chittagong (a district in the southeast), and Sylhet (a district in the northeast). Within these three divisions, the study is located in three districts: Barisal (in the division of Barisal), Chittagong (in the division of Chittagong) and Moulvibazar (in the division of Sylhet). Districts are subdivided into subdistricts, or upazilas. Within these three districts, the study is located in three upazilas: Muladi (in the district of Barisal), Satkania (in the district of Chittagong), and Kalaura (in the district of Moulvibazar). Upazilas are subdivided into unions, and the study takes place in 30 unions: 4 unions in Muladi, 16 unions in Satkania, and 10 unions in Kalaura.
The unit of analysis consists of households with children between 3 and 18 months of age residing in the catchment area of participating community clinics at the time of baseline data collection.
The full universe of the evaluation consists of households with children between 3 and 18 months of age residing in the catchment area of the 78 community clinics at the time of baseline data collection.
Sample survey data [ssd]
The study sample frame was generated from community clinic health assistant records, which have the advantage of being the centralized government document of record containing the population frame for all households with children under five years of age. The health assistant dataset included data for all three upazilas of interest. Based on an examination of the extant health assistant dataset described above, the study excluded 11 unions (out of a total of 41 unions) located in these three upazilas. Six of the unions were removed because data were not available. A further five unions were removed because they only had one community clinic (the study design requires each union to have at least one community clinic for each of the two treatment conditions). The final sampling frame included 78 community clinics located in 30 unions.
The sample frame was generated within each community clinic, and the units in the frame are households with children aged between 3 months and 18 months of age, which were situated in the selected community clinics' catchment areas during the period of the baseline data collection. The rationale for restricting the frame to households with children aged three months or older was that the main developmental assessment tool chosen for the evaluation-the BSID-III-has not been previously validated on children under the age of three months in Bangladesh. Early child development specialists consider the BSID-III test to be the gold standard assessment of development for children under 42 months of age, and it has been adapted by the team for use in Bangladesh. Because the BSID-III test is only valid for children under 42 months of age, we had to restrict the upper age limit of participating children to 18 months or younger at the time of baseline data collection in order to collect valid endline data 24 months later. To be eligible, the household had to reside in the catchment area during the baseline data collection period (November 2013-January 2014).
Initial Sampling: Using the health assistant records, the team created a list of households with at least one child aged between 3 and 18 months during the baseline data collection period. The team used a reference date of October 21, 2013, to calculate the age (in months) of the target children, and the team will collect endline data by October 2015, when the children will still be under 42 months of age.
Finally, within each community clinic catchment area, we randomly selected 33 households with children aged between 3 months and 18 months (as of October 21, 2013). The same set of households surveyed during the baseline data collection period will be surveyed during the endline data collection period.
Replacement Sampling: Anticipating that some households would be ineligible or would refuse to participate in the study, the team developed rules for replacing ineligible or "out-of scope households" and refusal households, following the guidance of two survey methodologists from AIR. Twenty additional replacement households were randomly selected from within each community clinic and included in a separate list, with each household randomly sorted from 1 to 20. When any of the originally selected 33 households were found to be ineligible or refused to participate, the field interviewer replaced it with the first household from the 20-household replacement list. Field interviewers continued replacing households in order. A careful differentiation was made between ineligible and refusal households.
Ineligible or "out-of scope" households: This category includes households that were randomly selected to be part of the sample but did not fit the target sample description of "Households with children from 3-18 months of age that live in the selected community clinics' catchment areas during the period of the baseline data collection." Out-of-scope households included the following cases: a) Households that had permanently left the catchment area. These 300 households had resided in the catchment area during birth record data collection, but by the time of the baseline data collection they had relocated to a different residence outside the catchment area. In these cases, more than one source (such as neighbors or health assistants) confirmed that the household had moved. b) Households with incorrect location information in birth records. In 291 cases, the selected households were not able to be located. This class of out-of-scope households includes two groups. The first group consists of the households who did not permanently reside in the catchment area of the selected community clinic, but had been registered in the health assistant record because they received services while they were visiting relatives or otherwise transiting through the community clinic's catchment area. The second group consists of households whose birth records were fabricated. This was confirmed to be the case in two community clinics, where a large number of households could not be located. (In response to this finding, the field data team met with the relevant HA, as well as representatives from Save the Children). c) Households with children ineligible due to inaccurate date of birth. In 173 households, the birth records had an inaccurate date of birth for the child, and the child was not in the age range of 3-18 months old. d) Households with temporarily absent families. In 159 cases, the households were located but the respondents were not available for interview because they were not in the village and were temporarily staying elsewhere (often visiting relatives).
Refusals: This category includes both households that refused to participate in the study and households that began but did not complete data collection. Thirty-nine eligible households (1.5% of the sample) did not agree to fully participate in the study. In 12 cases, the household refused to participate in any capacity. In 27 cases, the households began the household survey but later decided not to complete data collection (i.e., they did not participate in the BSID-III test or the anthropometric measures). For all 39 cases of refusal, the data collectors completed a non-complier questionnaire that captured some basic characteristics of this group to compare with the compliers.
Field Sampling: In cases where the field team was unable to complete data collection with a full set of 33 households in a community, even after exhausting the 53 randomly selected households (33 households from the original sample and 20 replacement households), the study employed an additional field replacement process. A total of 454 households from among the 2,574 were sampled using this method. The field replacement process was necessary
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Kidmose CANid Dataset (KCID)The Kidmose CANid Dataset (KCID) contains CAN bus data collected by Brooke and Andreas Kidmose from 16 different drivers across 4 different vehicles. This dataset is designed to support driver identification and authentication research.The term "CANid" reflects the dataset's dual purpose: data collected from the CAN bus for driver identification research.VEHICLESThe dataset includes data from four different vehicles across various manufacturers and model years:2011 Chevrolet Traverse - 5-door full-size SUV crossover, AWD, 8 drivers (8 unique drivers in single-driver traces; 1 additional driver in a mixed trace)2017 Ford Focus - 5-door compact station wagon, FWD, 4 drivers2017 Subaru Forester - 5-door compact SUV crossover, AWD, 6 drivers (6 unique drivers in single-driver traces; 3 additional drivers in mixed traces)2022 Honda CR-V Touring - 5-door compact SUV crossover, AWD, 1 driverNote: The number of drivers includes volunteer drivers whose data was captured in single-driver traces, where we know who was driving at all times. We exclude volunteer drivers whose data is only available in mixed traces because we do not know when each specific driver was actually operating the vehicle.DRIVERSThe dataset includes 16 drivers across different demographic categories:Male Drivers:Under 30 years: 4 drivers ("male-under30-1" through "male-under30-4")30-55 years: 4 drivers ("male-30-55-1" through "male-30-55-4")Over 55 years: 3 drivers ("male-over55-1" through "male-over55-3")Female Drivers:All ages: 5 drivers ("female-all-ages-1" through "female-all-ages-5")Driver Directory Structure: Driver identifiers are used as directory/folder names. Within each directory, you will find traces collected from that particular driver, with additional information (location, data collection method, etc.) specified in the filename.Note: We use "unknown driver(s)" in directory names when we know that one or more volunteer drivers was operating the vehicle, but we cannot identify who was driving or when. We used a standalone data logger for some data collection sessions. If we failed to download the data and clear the logger's memory before switching drivers, this resulted in mixed traces and, occasionally, "unknown driver(s)" entries. Unfortunately, some of our volunteer drivers were short-term visitors, so we did not have the opportunity to redo their traces as single-driver traces.LOCATIONSData collection took place across multiple locations:DK - DenmarkUSA - United States of AmericaFL - FloridaNE - NebraskaNE-to-FL - Trip from Nebraska to FloridaTN - TennesseeTN-to-NE - Trip from Tennessee to NebraskaLocation codes appear in filenames (e.g., USA-FL-CANEdge-00000001.mf4 indicates data collected in Florida, USA).DATA COLLECTION METHODSThree different data collection methods were employed:CANEdge - CSS Electronics CANEdge2: Standalone data logger that connects to the OBD-II port and logs to an SD cardKorlan - Korlan USB2CAN: CAN-to-USB cable connecting the vehicle's OBD-II port to a laptopKvaser - Kvaser Hybrid CAN-LIN: CAN-to-USB cable connecting the vehicle's OBD-II port to a laptopThe data collection method is indicated in filenames (e.g., USA-FL-CANEdge-00000001.mf4).FILE TYPESThe dataset provides data in three formats to support different use cases:.mf4 (MDF4) Format: Measurement Data Format version 4 (MDF4)Binary format standardized by the Association for Standardization of Automation (ASAM)Advantages: Compact size, popular with automotive/CAN toolsUse case: Native format from CSS Electronics CANEdge2Reference: https://www.csselectronics.com/pages/mf4-mdf4-measurement-data-format.log Format: Text-based log formatCompatibility: Linux SocketCAN can-utilsAdvantages: Compatibility with SocketCAN can-utils; if a .log file is replayed, then data can be captured and monitored using Python's python-can libraryReferences: https://github.com/linux-can/can-utils, https://packages.debian.org/sid/can-utils, https://python-can.readthedocs.io/en/stable/.csv Format: Text-based comma-separated values (CSV) formatAdvantages: Easy to load with Python using the pandas library; easy to use with Python-based machine learning frameworks (e.g., scikit-learn, Keras, TensorFlow, PyTorch)Usage: Load with Python pandas: pd.read_csv()Reference: https://pandas.pydata.org/SPECIALIZED EXPERIMENTSThe KCID Dataset includes five specialized experiments:Fixed Routes ExperimentVehicles: 2011 Chevrolet Traverse, 2017 Subaru ForesterDrivers: male-30-55-3, male-30-55-4, male-over55-1, female-all-ages-1, female-all-ages-2, female-all-ages-5Location: Florida, USA (specific routes)Data Collection Methods: CSS Electronics CANEdge2, Kvaser Hybrid CAN-LINPurpose: Capture CAN traces for specific, mappable routes; eliminate route-based variations in driver authentication data (e.g., low-speed local routes vs. high-speed long-distance routes)OBD Requests and Responses ExperimentVehicle: 2011 Chevrolet TraverseDriver: female-all-ages-5Location: Florida, USAData Collection Method: CSS Electronics CANEdge2Purpose: Capture OBD requests and responses Arbitration IDs: Requests: 0x7DF, Responses: 0x7E8Tire Pressure ExperimentVehicle: 2011 Chevrolet TraverseDriver: female-all-ages-5Location: Florida, USAData Collection Method: Kvaser Hybrid CAN-LINPurpose: Capture normal and low tire pressure scenariosApplications: Detect tire pressure issues via CAN bus analysis; develop predictive maintenance strategiesDriving Modes and Features ExperimentVehicle: 2017 Ford FocusDriver: male-30-55-1Location: DenmarkData Collection Method: Korlan USB2CANPurpose: Capture different driving (and non-driving) modes and featuresExamples: gear (park, reverse, neutral, drive, sport); headlights on/offStationary Vehicles ExperimentVehicles: 2024 Chevrolet Malibu, 2025 Toyota CorollaDriver: N/A (vehicles remained stationary)Location: Florida, USAData Collection Method: Kvaser Hybrid CAN-LINPurpose: Capture CAN bus traffic from very new, very modern vehicles; identify differences between an older vehicle's CAN bus (e.g., 2011 Chevrolet Traverse), and a newer vehicle's CAN bus (e.g., 2024 Chevrolet Malibu)ADDITIONAL DOCUMENTATIONEach "specialized experiment" directory contains a detailed README.md file with specific information about the experiment and the data collected.RESEARCH APPLICATIONSThis dataset supports various research areas:Driver authentication, driver fingerprintingBehavioral biometrics in the automotive domainVehicle diagnostics and predictive maintenanceMachine learning in the automotive domainCAN bus analysis and reverse engineeringCITATIONIf you use the Kidmose CANid Dataset in your research, please cite appropriately. Citation information will be updated when our paper is published in a peer-reviewed venue.Article Citation:APA Style: Kidmose, B. E., Kidmose, A. B., and Zou, C. C. (2025). A critical roadmap to driver authentication via CAN bus: Dataset review, introduction of the Kidmose CANid Dataset (KCID), and proof of concept. arXiv. https://arxiv.org/pdf/2510.25856MLA Style: Kidmose, Brooke Elizabeth, Andreas Brasen Kidmose, and Cliff C. Zou. "A Critical Roadmap to Driver Authentication via CAN Bus: Dataset Review, Introduction of the Kidmose CANid Dataset (KCID), and Proof of Concept." arXiv, 2025. doi:10.48550/arXiv.2510.25856Chicago Style: Kidmose, Brooke Elizabeth, Andreas Brasen Kidmose, and Cliff C. Zou. "A Critical Roadmap to Driver Authentication via CAN Bus: Dataset Review, Introduction of the Kidmose CANid Dataset (KCID), and Proof of Concept." arXiv (2025). doi:10.48550/arXiv.2510.25856Dataset Citation:APA Style: Kidmose, B. E. and Kidmose, A. B. (2025). Kidmose CANid Dataset (KCID) v1. [Data set]. Technical University of Denmark. https://doi.org/10.11583/DTU.30483005.v1MLA Style: Kidmose, Brooke Elizabeth, and Andreas Brasen Kidmose. "Kidmose CANid Dataset (KCID) v1." Technical University of Denmark, 30 Oct. 2025. Web. {Date accessed in dd mmm yyyy format}. doi:10.11583/DTU.30483005.v1Chicago Style: Kidmose, Brooke Elizabeth, and Andreas Brasen Kidmose. 2025. "Kidmose CANid Dataset (KCID) v1." Technical University of Denmark. doi:10.11583/DTU.30483005.v1
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In April 2020 Eurostat set up an exceptional data collection on total weekly deaths, in order to support the policy and research efforts related to Covid-19. With this data collection, Eurostat's target was to provide quickly statistics that show the changing situation of the total number of weekly deaths from early 2020 onwards.
The available data on the total weekly deaths are transmitted by the National Statistical Institutes to Eurostat on voluntary basis. Data are collected cross classified by sex, 5-year age-groups and NUTS3 region (NUTS2021). The age breakdown by 5-year age group is the most significant and should be considered by the reporting countries as the main option; when that is not possible, data may be provided with less granularity. Similar with the regional structure, data granularity varies with the country.
Eurostat requested from the National Statistical Institutes the transmission of a back time series of weekly deaths for as many year as possible, recommending as starting point the year 2000. Shorter time series, imposed by data availability, are transmitted by some countries. A long enough time series is necessary for temporal comparisons and statistical modelling.
A note on Ireland: Data from Ireland were not included in the first phase of the weekly deaths data collection: official timely data were not available because deaths can be registered up to three months after the date of death. Because of the COVID-19 pandemic, the Central Statistics Office of Ireland began to explore experimental ways of obtaining up-to-date mortality data, finding a strong correlation between death notices published on RIP.ie and official mortality statistics. Recently, CSO Ireland started publishing a time series covering the period from October 2019 until the most recent weeks, using death notices (see CSO website). For the purpose of this release, Eurostat compared the new 2020-2021 web-scraped series with a 2016-2019 baseline established using official data. CSO is periodically assessing the quality of these data.
The purpose of Eurostat’s online tables in the folder Weekly deaths - special data collection (demomwk) is to make available to users information on the weekly number of deaths disaggregated by sex, 5 years age group and NUTS3 regions over the last 20 years, depending on the availability in each country covered in Eurostat demographic statistics data collections. In order to ensure the highest timeliness possible, data are made available as reported by the countries, and work is ongoing in order to improve data quality and user friendliness.
Starting in 2025, the weekly deaths data is collected on a quarterly basis. The database updates are expected by mid-June (release of monthly data for 1st quarter of the year), mid-September (2nd quarter), mid-December (3rd quarter), and mid-February (4th quarter).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In April 2020 Eurostat set up an exceptional data collection on total weekly deaths, in order to support the policy and research efforts related to Covid-19. With this data collection, Eurostat's target was to provide quickly statistics that show the changing situation of the total number of weekly deaths from early 2020 onwards.
The available data on the total weekly deaths are transmitted by the National Statistical Institutes to Eurostat on voluntary basis. Data are collected cross classified by sex, 5-year age-groups and NUTS3 region (NUTS2021). The age breakdown by 5-year age group is the most significant and should be considered by the reporting countries as the main option; when that is not possible, data may be provided with less granularity. Similar with the regional structure, data granularity varies with the country.
Eurostat requested from the National Statistical Institutes the transmission of a back time series of weekly deaths for as many year as possible, recommending as starting point the year 2000. Shorter time series, imposed by data availability, are transmitted by some countries. A long enough time series is necessary for temporal comparisons and statistical modelling.
A note on Ireland: Data from Ireland were not included in the first phase of the weekly deaths data collection: official timely data were not available because deaths can be registered up to three months after the date of death. Because of the COVID-19 pandemic, the Central Statistics Office of Ireland began to explore experimental ways of obtaining up-to-date mortality data, finding a strong correlation between death notices published on RIP.ie and official mortality statistics. Recently, CSO Ireland started publishing a time series covering the period from October 2019 until the most recent weeks, using death notices (see CSO website). For the purpose of this release, Eurostat compared the new 2020-2021 web-scraped series with a 2016-2019 baseline established using official data. CSO is periodically assessing the quality of these data.
The purpose of Eurostat’s online tables in the folder Weekly deaths - special data collection (demomwk) is to make available to users information on the weekly number of deaths disaggregated by sex, 5 years age group and NUTS3 regions over the last 20 years, depending on the availability in each country covered in Eurostat demographic statistics data collections. In order to ensure the highest timeliness possible, data are made available as reported by the countries, and work is ongoing in order to improve data quality and user friendliness.
Starting in 2025, the weekly deaths data is collected on a quarterly basis. The database updates are expected by mid-June (release of monthly data for 1st quarter of the year), mid-September (2nd quarter), mid-December (3rd quarter), and mid-February (4th quarter).
Facebook
TwitterThis dataset is made available under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). See LICENSE.pdf for details.
Dataset description
Parquet file, with:
The file is indexed on [participant]_[month], such that 34_12 means month 12 from participant 34. All participant IDs have been replaced with randomly generated integers and the conversion table deleted.
Column names and explanations are included as a separate tab-delimited file. Detailed descriptions of feature engineering are available from the linked publications.
File contains aggregated, derived feature matrix describing person-generated health data (PGHD) captured as part of the DiSCover Project (https://clinicaltrials.gov/ct2/show/NCT03421223). This matrix focuses on individual changes in depression status over time, as measured by PHQ-9.
The DiSCover Project is a 1-year long longitudinal study consisting of 10,036 individuals in the United States, who wore consumer-grade wearable devices throughout the study and completed monthly surveys about their mental health and/or lifestyle changes, between January 2018 and January 2020.
The data subset used in this work comprises the following:
From these input sources we define a range of input features, both static (defined once, remain constant for all samples from a given participant throughout the study, e.g. demographic features) and dynamic (varying with time for a given participant, e.g. behavioral features derived from consumer-grade wearables).
The dataset contains a total of 35,694 rows for each month of data collection from the participants. We can generate 3-month long, non-overlapping, independent samples to capture changes in depression status over time with PGHD. We use the notation ‘SM0’ (sample month 0), ‘SM1’, ‘SM2’ and ‘SM3’ to refer to relative time points within each sample. Each 3-month sample consists of: PHQ-9 survey responses at SM0 and SM3, one set of screener survey responses, LMC survey responses at SM3 (as well as SM1, SM2, if available), and wearable PGHD for SM3 (and SM1, SM2, if available). The wearable PGHD includes data collected from 8 to 14 days prior to the PHQ-9 label generation date at SM3. Doing this generates a total of 10,866 samples from 4,036 unique participants.
Facebook
TwitterThis dataset originates from a series of experimental studies titled “Tough on People, Tolerant to AI? Differential Effects of Human vs. AI Unfairness on Trust” The project investigates how individuals respond to unfair behavior (distributive, procedural, and interactional unfairness) enacted by artificial intelligence versus human agents, and how such behavior affects cognitive and affective trust.1 Experiment 1a: The Impact of AI vs. Human Distributive Unfairness on TrustOverview: This dataset comes from an experimental study aimed at examining how individuals respond in terms of cognitive and affective trust when distributive unfairness is enacted by either an artificial intelligence (AI) agent or a human decision-maker. Experiment 1a specifically focuses on the main effect of the “type of decision-maker” on trust.Data Generation and Processing: The data were collected through Credamo, an online survey platform. Initially, 98 responses were gathered from students at a university in China. Additional student participants were recruited via Credamo to supplement the sample. Attention check items were embedded in the questionnaire, and participants who failed were automatically excluded in real-time. Data collection continued until 202 valid responses were obtained. SPSS software was used for data cleaning and analysis.Data Structure and Format: The data file is named “Experiment1a.sav” and is in SPSS format. It contains 28 columns and 202 rows, where each row corresponds to one participant. Columns represent measured variables, including: grouping and randomization variables, one manipulation check item, four items measuring distributive fairness perception, six items on cognitive trust, five items on affective trust, three items for honesty checks, and four demographic variables (gender, age, education, and grade level). The final three columns contain computed means for distributive fairness, cognitive trust, and affective trust.Additional Information: No missing data are present. All variable names are labeled in English abbreviations to facilitate further analysis. The dataset can be directly opened in SPSS or exported to other formats.2 Experiment 1b: The Mediating Role of Perceived Ability and Benevolence (Distributive Unfairness)Overview: This dataset originates from an experimental study designed to replicate the findings of Experiment 1a and further examine the potential mediating role of perceived ability and perceived benevolence.Data Generation and Processing: Participants were recruited via the Credamo online platform. Attention check items were embedded in the survey to ensure data quality. Data were collected using a rolling recruitment method, with invalid responses removed in real time. A total of 228 valid responses were obtained.Data Structure and Format: The dataset is stored in a file named Experiment1b.sav in SPSS format and can be directly opened in SPSS software. It consists of 228 rows and 40 columns. Each row represents one participant’s data record, and each column corresponds to a different measured variable. Specifically, the dataset includes: random assignment and grouping variables; one manipulation check item; four items measuring perceived distributive fairness; six items on perceived ability; five items on perceived benevolence; six items on cognitive trust; five items on affective trust; three items for attention check; and three demographic variables (gender, age, and education). The last five columns contain the computed mean scores for perceived distributive fairness, ability, benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be analyzed directly in SPSS or exported to other formats as needed.3 Experiment 2a: Differential Effects of AI vs. Human Procedural Unfairness on TrustOverview: This dataset originates from an experimental study aimed at examining whether individuals respond differently in terms of cognitive and affective trust when procedural unfairness is enacted by artificial intelligence versus human decision-makers. Experiment 2a focuses on the main effect of the decision agent on trust outcomes.Data Generation and Processing: Participants were recruited via the Credamo online survey platform from two universities located in different regions of China. A total of 227 responses were collected. After excluding those who failed the attention check items, 204 valid responses were retained for analysis. Data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2a.sav in SPSS format and can be directly opened in SPSS software. It contains 204 rows and 30 columns. Each row represents one participant’s response record, while each column corresponds to a specific variable. Variables include: random assignment and grouping; one manipulation check item; seven items measuring perceived procedural fairness; six items on cognitive trust; five items on affective trust; three attention check items; and three demographic variables (gender, age, and education). The final three columns contain computed average scores for procedural fairness, cognitive trust, and affective trust.Additional Notes: The dataset contains no missing values. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be directly analyzed in SPSS or exported to other formats as needed.4 Experiment 2b: Mediating Role of Perceived Ability and Benevolence (Procedural Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 2a and to further examine the potential mediating roles of perceived ability and perceived benevolence in shaping trust responses under procedural unfairness.Data Generation and Processing: Participants were working adults recruited through the Credamo online platform. A rolling data collection strategy was used, where responses failing attention checks were excluded in real time. The final dataset includes 235 valid responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2b.sav, which is in SPSS format and can be directly opened using SPSS software. It contains 235 rows and 43 columns. Each row corresponds to a single participant, and each column represents a specific measured variable. These include: random assignment and group labels; one manipulation check item; seven items measuring procedural fairness; six items for perceived ability; five items for perceived benevolence; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final five columns contain the computed average scores for procedural fairness, perceived ability, perceived benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to support future reuse and secondary analysis. The dataset can be directly analyzed in SPSS and easily converted into other formats if needed.5 Experiment 3a: Effects of AI vs. Human Interactional Unfairness on TrustOverview: This dataset comes from an experimental study that investigates how interactional unfairness, when enacted by either artificial intelligence or human decision-makers, influences individuals’ cognitive and affective trust. Experiment 3a focuses on the main effect of the “decision-maker type” under interactional unfairness conditions.Data Generation and Processing: Participants were college students recruited from two universities in different regions of China through the Credamo survey platform. After excluding responses that failed attention checks, a total of 203 valid cases were retained from an initial pool of 223 responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3a.sav, in SPSS format and compatible with SPSS software. It contains 203 rows and 27 columns. Each row represents a single participant, while each column corresponds to a specific measured variable. These include: random assignment and condition labels; one manipulation check item; four items measuring interactional fairness perception; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final three columns contain computed average scores for interactional fairness, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variable names are provided using standardized English abbreviations to facilitate secondary analysis. The data can be directly analyzed using SPSS and exported to other formats as needed.6 Experiment 3b: The Mediating Role of Perceived Ability and Benevolence (Interactional Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 3a and further examine the potential mediating roles of perceived ability and perceived benevolence under conditions of interactional unfairness.Data Generation and Processing: Participants were working adults recruited via the Credamo platform. Attention check questions were embedded in the survey, and responses that failed these checks were excluded in real time. Data collection proceeded in a rolling manner until a total of 227 valid responses were obtained. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3b.sav, in SPSS format and compatible with SPSS software. It includes 227 rows and
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Facebook
TwitterSince the beginning of the 1960s, Statistics Sweden, in collaboration with various research institutions, has carried out follow-up surveys in the school system. These surveys have taken place within the framework of the IS project (Individual Statistics Project) at the University of Gothenburg and the UGU project (Evaluation through follow-up of students) at the University of Teacher Education in Stockholm, which since 1990 have been merged into a research project called 'Evaluation through Follow-up'. The follow-up surveys are part of the central evaluation of the school and are based on large nationally representative samples from different cohorts of students.
Evaluation through follow-up (UGU) is one of the country's largest research databases in the field of education. UGU is part of the central evaluation of the school and is based on large nationally representative samples from different cohorts of students. The longitudinal database contains information on nationally representative samples of school pupils from ten cohorts, born between 1948 and 2004. The sampling process was based on the student's birthday for the first two and on the school class for the other cohorts.
For each cohort, data of mainly two types are collected. School administrative data is collected annually by Statistics Sweden during the time that pupils are in the general school system (primary and secondary school), for most cohorts starting in compulsory school year 3. This information is provided by the school offices and, among other things, includes characteristics of school, class, special support, study choices and grades. Information obtained has varied somewhat, e.g. due to changes in curricula. A more detailed description of this data collection can be found in reports published by Statistics Sweden and linked to datasets for each cohort.
Survey data from the pupils is collected for the first time in compulsory school year 6 (for most cohorts). Questionnaire in survey in year 6 includes questions related to self-perception and interest in learning, attitudes to school, hobbies, school motivation and future plans. For some cohorts, questionnaire data are also collected in year 3 and year 9 in compulsory school and in upper secondary school.
Furthermore, results from various intelligence tests and standartized knowledge tests are included in the data collection year 6. The intelligence tests have been identical for all cohorts (except cohort born in 1987 from which questionnaire data were first collected in year 9). The intelligence test consists of a verbal, a spatial and an inductive test, each containing 40 tasks and specially designed for the UGU project. The verbal test is a vocabulary test of the opposite type. The spatial test is a so-called ‘sheet metal folding test’ and the inductive test are made up of series of numbers. The reliability of the test, intercorrelations and connection with school grades are reported by Svensson (1971).
For the first three cohorts (1948, 1953 and 1967), the standartized knowledge tests in year 6 consist of the standard tests in Swedish, mathematics and English that up to and including the beginning of the 1980s were offered to all pupils in compulsory school year 6. For the cohort 1972, specially prepared tests in reading and mathematics were used. The test in reading consists of 27 tasks and aimed to identify students with reading difficulties. The mathematics test, which was also offered for the fifth cohort, (1977) includes 19 assignments. After a changed version of the test, caused by the previously used test being judged to be somewhat too simple, has been used for the cohort born in 1982. Results on the mathematics test are not available for the 1987 cohort. The mathematics test was not offered to the students in the cohort in 1992, as the test did not seem to fully correspond with current curriculum intentions in mathematics. For further information, see the description of the dataset for each cohort.
For several of the samples, questionnaires were also collected from the students 'parents and teachers in year 6. The teacher questionnaire contains questions about the teacher, class size and composition, the teacher's assessments of the class' knowledge level, etc., school resources, working methods and parental involvement and questions about the existence of evaluations. The questionnaire for the guardians includes questions about the child's upbringing conditions, ambitions and wishes regarding the child's education, views on the school's objectives and the parents' own educational and professional situation.
The students are followed up even after they have left primary school. Among other things, data collection is done during the time they are in high school. Then school administrative data such as e.g. choice of upper secondary school line / program and grades after completing studies. For some of the cohorts, in addition to school administrative data, questionnaire data were also collected from the students.
he sample consisted of students born on the 5th, 15th and 25th of any month in 1953, a total of 10,723 students.
The data obtained in 1966 were: 1. School administrative data (school form, class type, year and grades). 2. Information about the parents' profession and education, number of siblings, the distance between home and school, etc.
This information was collected for 93% of all born on the current days. The reason for this is reduced resources for Statistics Sweden for follow-up work - reminders etc. Annual data for cohorts in 1953 were collected by Statistics Sweden up to and including academic year 1972/73.
Response rate for test and questionnaire data is 88% Standard test results were received for just over 85% of those who took the tests.
The sample included a total of 9955 students, for whom some form of information was obtained.
Part of the "Individual Statistics Project" together with cohort 1953.