These data are the results of a systematic review that investigated how data standards and reporting formats are documented on the version control platform GitHub. Our systematic review identified 32 data standards in earth science, environmental science, and ecology that use GitHub for version control of data standard documents. In our analysis, we characterized the documents and content within each of the 32 GitHub repositories to identify common practices for groups that version control their documents on GitHub. In this data package, there are 8 CSV files that contain data that we characterized from each repository, according to the location within the repository. For example, in 'readme_pages.csv' we characterize the content that appears across the 32 GitHub repositories included in our systematic review. Each of the 8 CSV files has an associated data dictionary file (names appended with '_dd.csv' and here we describe each content category within CSV files. There is one file-level metadata file (flmd.csv) that provides a description of each file within the data package.
https://www.icpsr.umich.edu/web/ICPSR/studies/8379/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8379/terms
This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This article presents an original database on international standards, constructed using modern data gathering methods. StanDat facilitates studies into the role of standards in the global political economy by (1) being a source for descriptive statistics, (2) enabling researchers to assess scope conditions of previous findings, and (3) providing data for new analyses, for example the exploration of the relationship between standardization and trade, as demonstrated in this article. The creation of StanDat aims to stimulate further research into the domain of standards. Moreover, by exemplifying data collection and dissemination techniques applicable to investigating less-explored subjects in the social sciences, it serves as a model for gathering, systematizing and sharing data in areas where information is plentiful yet not readily accessible for research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains explementary data to demonstrate the utility of By-example data query in AutomationML
Generating own examples for previously encountered new concepts is a common learning activity. Unfortunately, research has shown that students are not able to accurately evaluate the quality of their own examples. Instructional support measures such as idea unit standards have turned out to be ineffective in evaluating the quality of self-generated examples. In the present study, we investigated the benefits of a relatively parsimonious means to enhance judgment accuracy in example generation tasks, i.e. the provision of expert examples as external standards. For this purpose, we varied whether N = 131 university students were supported by expert example standards (with vs. without) and idea unit standards (with vs. without) in evaluating the quality of self-generated examples that illustrated new declarative concepts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset is an example of a distribution of 20 correlated Bernoulli random variables.
The Tajik Living Standards Survey (TLSS) was conducted jointly by the State Statistical Agency and the Center for Strategic Studies under the Office of the President in collaboration with the sponsors, the United Nations Development Programme (UNDP) and the World Bank (WB). International technical assistance was provided by a team from the London School of Economics (LSE). The purpose of the survey is to provide quantitative data at the individual, household and community level that will facilitate purposeful policy design on issues of welfare and living standards of the population of the Republic of Tajikistan in 1999.
National coverage. The TLSS sample was designed to represent the population of the country as a whole as well as the strata. The sample was stratified by oblast and by urban and rural areas.
The country is divided into 4 oblasts, or regions; Leninabad in the northwest of the country, Khatlon in the southwest, Rayons of Republican Subordination (RRS) in the middle and to the west of the country, and Gorno-Badakhshan Autonomous Oblast (GBAO) in the east. The capital, Dushanbe, in the RRS oblast, is a separately administrated area. Oblasts are divided into rayons (districts). Rayons are further subdivided into Mahallas (committees) in urban areas, and Jamoats (villages) in rural areas.
Sample survey data [ssd]
The TLSS sample was designed to represent the population of the country as a whole as well as the strata. The sample was stratified by oblast and by urban and rural areas.
In common with standard LSMS practice a two-stage sample was used. In the first stage 125 primary sample units (PSU) were selected with the probability of selection within strata being proportional to size. At the second stage, 16 households were selected within each PSU, with each household in the area having the same probability of being chosen. [Note: In addition to the main sample, the TLSS also included a secondary sample of 15 extra PSU (containing 400 households) in Dangara and Varzob. Data in the oversampled areas were collected for the sole purpose of providing baseline data for the World Bank Health Project in these areas. The sampling for these additional units was carried out separately after the main sampling procedure in order to allow for their exclusion in nationally representative analysis.] The twostage procedure has the advantage that it provides a self-weighted sample. It also simplified the fieldwork operation as a one-field team could be assigned to cover a number of PSU.
A critical problem in the sample selection with Tajikistan was the absence of an up to date national sample frame from which to select the PSU. As a result lists of the towns, rayons and jamoats (villages) within rayons were prepared manually. Current data on population size according to village and town registers was then supplied to the regional offices of Goskomstat and conveyed to the center. This allowed the construction of a sample frame of enumeration units by sample size from which to draw the PSU.
This procedure worked well in establishing a sample frame for the rural population. However administrative units in some of the larger towns and in the cities of Dushanbe, Khojand and Kurgan-Tubbe were too large and had to be sub-divided into smaller enumeration units. Fortuitously the survey team was able to make use of information available as a result of the mapping exercise carried out earlier in the year as preparation for the 2000 Census in order to subdivide these larger areas into enumeration units of roughly similar size.
The survey team was also able to use the household listings prepared for the Census for the second stage of the sampling in urban areas. In rural areas the selection of households was made using the village registers – a complete listing of all households in the village which is (purported to be) regularly updated by the local administration. When selecting the target households a few extra households (4 in addition to the 16) were also randomly selected and were to be used if replacements were needed. In actuality non-response and refusals from households were very rare and use of replacement households was low. There was never the case that the refusal rate was so high that there were not enough households on the reserve list and this enabled a full sample of 2000 randomly selected households to be interviewed.
Face-to-face [f2f]
The questionnaire was based on the standard LSMS for the CIS countries, and adapted and abridged for Tajikistan. In particular the health section was extended to allow for more in depth information to be collected and a section on food security was also added. The employment section was reduced and excludes information on searching for employment.
The questionnaires were translated into Tajik, Russian and Uzbek.
The TLSS consists of three parts: a household questionnaire, a community level questionnaire and a price questionnaire.
Household questionnaire: the Household questionnaire is comprised of 10 sections covering both household and individual aspects.
Community/Population point Questionnaire: the Community level or Population Point Questionnaire consists of 8 sections. The community level questionnaire provides information on differences in demographic and economic infrastructure. Open-ended questions in the questionnaire were not coded and hence information on the responses to these qualitative questions is not provided in the data sets.
Summary of Section contents
The brief descriptions below provide a summary of the information found in each section. The descriptions are by no means exhaustive of the information covered by the survey and users of the survey need to refer to each particular section of the questionnaire for a complete picture of the information gathered.
Household information/roster This includes individual level information of all individuals in the household. It establishes who belongs to the household at the time of the interview. Information on gender, age, relation to household head and marital status are included. In the question relating to family status, question 7, “Nekared” means married where nekar is the Islamic (arabic) term for marriage contract. Under Islamic law a man may marry more than once (up-to four wives at any one time). Although during the Soviet period it was illegal to be married to more than one woman this practice did go on. There may be households where the household head is not present but the wife is married or nekared, or in the same household a respondent may answer married and another nekared to the household head.
Dwelling This section includes information covering the type of dwelling, availability of utilities and water supply as well as questions pertaining to dwelling expenses, rents, and the payment of utilities and other household expenses. Information is at the household level.
Education This section includes all individuals aged 7 years and older and looks at educational attainment of individuals and reasons for not continuing education for those who are not currently studying. Questions related to educational expenditures at the household level are also covered. Schooling in Tajikistan is compulsory for grades (classes) 1-9. Primary level education refers to grades 1 - 4 for children aged 7 to 11 years old. General secondary level education refers to grades 5-9, corresponding to the age group 12-16 year olds. Post-compulsory schooling can be divided into three types of school: - Upper secondary education covers the grades 10 and 11. - Vocational and Technical schools can start after grade 9 and last around 4 years. These schools can also start after grade 11 and then last only two years. Technical institutions provide medical and technical (e.g. engineering) education as well as in the field of the arts while vocational schools provide training for employment in specialized occupation. - Tertiary or University education can be entered after completing all 11 grades. - Kindergarten schools offer pre-compulsory education for children aged 3 – 6 years old and information on this type of schooling is not covered in this section.
Health This section examines individual health status and the nature of any illness over the recent months. Additional questions relate to more detailed information on the use of health care services and hospitals, including expenses incurred due to ill health. Section 4B includes a few terms, abbreviations and acronyms that need further clarification. A feldscher is an assistant to a physician. Mediniski dom or FAPs are clinics staffed by physical assistants and/or midwifes and a SUB is a local clinic. CRH is a local hospital while an oblast hospital is a regional hospital based in the oblast administrative centre, and the Repub. Hospital is a national hospital based in the capital, Dushanbe. The latter two are both public hospitals.
Employment This section covers individuals aged 11 years and over. The first part of this section looks at the different activities in which individuals are involved in order to determine if a person is engaged in an income generating activity. Those who are engaged in such activities are required to answer questions in Part B. This part relates to the nature of the work and the organization the individual is attached to as well as questions relating to income, cash income and in-kind payments. There are also a few questions relating to additional income generating activities in addition to the main activity. Part C examines employment
In acquiring new conceptual knowledge, learners often engage in the generation of examples that illustrate the to-be-learned principles and concepts. Learners are, however, bad at judging the quality of self-generated examples, which can result in suboptimal regulation decisions. A promising means to foster judgment accuracy in this context is providing external standards in form of expert examples after learners have generated own examples. Empirical evidence on this support measure, however, is scarce. Furthermore, it is unclear whether providing learners with poor examples, which include typical wrong illustrations, as negative example standards after they generated own examples would increase judgment accuracy as well. When they generated poor examples themselves, learners might realize similarities between their examples and the negative ones, which could result in more cautious and hence likely more accurate judgments concerning their own examples. Against this background, in a 2×2 factorial experiment we prompted N = 128 university students to generate examples that illustrate previously encountered concepts and self-evaluate these examples afterwards. During self-evaluation, we varied whether learners were provided with expert example standards (with vs. without) and negative example standards (with vs. without).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standard Sample Description V2 is a specification aimed at harmonising the collection of analytical measurement data for the presence of harmful or beneficial chemical substances in food, feed and water. The specification is a list of standardised data elements (items describing characteristics of samples or analytical results such as country of origin, product, analytical method, limit of detection, result, etc.), linked to controlled terminologies. This specification uses EFSA FoodEx2 to describe sampled foods.
This file has been prepared to support the publication of data and interoperability. This file indicates which data elements from the specification will not be published to ensure full protection of confidential/sensitive information, for example personal data in accordance with Regulation (EC) No 45/2001 and to protect commercial interests, including intellectual property as specified in Article 4(2), first indent, of Regulation (EC) No 1049/2001.
The Excel table contains information about the structural metadata elements of the data collection and their fact tables.
The column name shows the name of the element (e.g. localOrg).
The column description describes how the content has to be interpreted.
The column code expresses the corresponding code of the structural metadata element.
The column optional says whether the structural metadata element is optional or not (then it is mandatory).
The column dataType contains the type which can be used to fill the structural metadata element and the possible maximal length of the field. The possible types are: text or number.
The column catalogue contains the name of the catalogue where the content of the structural metadata element has to be picked from (e.g. COUNTRY).
The column data protection contains whether the structural metadata element will be published or not (yes = will not be published, no = will be published).
We present the basic forms of citation (formats and elements) developed for statistics, data, and maps products at Statistics Canada. From these models 80 examples have been created to become the citation standards of the organization. We also discuss the relationship between these standards and the ISO 690, 690-2 revision to include examples of statistics, data, and maps citation in the new ISO bibliographic standard, and the opportunities for IASSIST and the data community to be part of this process.
This repository contains all submissions made to the Mosaic Standards challenge from the period of the challenge opening through December 31st, 2019. The Mosaic Standards Challenge asked the microbiome research community to participate in determining the level of variation due to wet-lab protocols by sequencing a set of samples and providing the resulting files. Each participant ordered one or more kits, where each kit contained five fecal samples and two predetermined DNA mixtures. All samples were identical across all kits; in other words, the samples labled "#1" provided to each lab were identical to each other. Participants in the challenge sequenced any number of the provided samples and provided both the raw sequencing result files, and the details of their protocol. Protocol details were provided by answering a set of pre-specified questions in a metadata spreadsheet upon submission of each sample.
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Magnetotellurics (MT) is an electromagnetic geophysical method that is sensitive to variations in subsurface electrical resistivity. Measurements of natural electric and magnetic fields are done in the time domain, where instruments can record for a couple of hours up to mulitple months resulting in data sets on the order of gigabytes. The principles of findability, accessibility, interoperability, and reuse of digital assets (FAIR) requires standardized metadata. Unfortunately, the MT community has never had a metadata standard for time series data. In 2019, the Working Group for Magnetotelluric Data Handling and Software (https://www.iris.edu/hq/about_iris/governance/mt_soft) was assembled by the Incorporated Research Institutions for Seismology (IRIS) to develop a metadata standard for time series data. This product describes the metadata definitions. Metadata Hierarchy: Survey -> Station -> Run -> Channel The hierarchy and structure of the MT metadata logically follows how MT time series data are collected. The highest level is "survey" which contains metadata for data collected over a certain time interval in a given geographic region. This may include multiple principle investigators or multiple data collection episodes but should be confined to a specific project. Next, a "station" which contains metadata for a single location over a certain time interval. If the location changes during a run, then a new station should be created and subsequently a new run under the new station. If the sensors, cables, data logger, battery, etc. are replaced during a run but the station remains in the same location, then this can be recorded in the "run" metadata but does not require a new station entry. A "run" contains metadata for continuous data collected at a single sample rate. If channel parameters are changed between runs, this would require creating a new run. If the station is relocated then a new station should be created. If a run has channels that drop out, the start and end period will be the minimum time and maximum time for all channels recorded. Finally, a "channel" contains metadata for a single channel during a single run, where "electric", "magnetic", and "auxiliary" channels have some different metadata to uniquely describe the physical measurement.
Analytical Standards Market Size 2024-2028
The analytical standards market size is forecast to increase by USD 657.8 million at a CAGR of 6.78% between 2023 and 2028.
The market is experiencing significant growth, driven primarily by the burgeoning life sciences industry. This sector's increasing focus on research and development, coupled with the need for precise and accurate analytical data, is fueling the demand for high-quality analytical standards. Additionally, the adoption of customized analytical standards is on the rise, as organizations seek to meet specific regulatory requirements and improve the efficiency of their analytical processes. However, the market faces challenges, including the limited shelf life of analytical standards, which necessitates frequent replenishment and adds to operational costs. Furthermore, regulatory hurdles impact adoption, as stringent regulations governing the production and use of analytical standards can hinder market growth.
To capitalize on this market's opportunities and navigate these challenges effectively, companies must focus on developing robust supply chains, ensuring regulatory compliance, and investing in research and development to extend the shelf life of their analytical standards. By addressing these issues, market participants can differentiate themselves and capture a larger share of this dynamic and growing market.
What will be the Size of the Analytical Standards Market during the forecast period?
Request Free Sample
The market encompasses a diverse range of techniques and technologies used to ensure measurement traceability and maintain quality systems in various industries. Microscopy techniques and spectroscopic methods play a crucial role in elemental and organic analysis, while chromatographic techniques are essential for inorganic analysis. Method verification and validation are integral parts of the analytical workflow, ensuring the reliability and accuracy of automated analysis. Accreditation bodies and standard methods provide a framework for method development and instrument calibration, enabling data management and interpretation.
Uncertainty evaluation and statistical process control are essential components of quality control, with data reporting and uncertainty budgets ensuring transparency and accountability. Outlier detection and data management are vital for maintaining the integrity of analytical chemistry, from sample handling and preparation to mass spectrometry techniques and data interpretation.
How is this Analytical Standards Industry segmented?
The analytical standards industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Type
Chromatography
Spectroscopy
Titrimetry
Physical properties testing
Application
Food and beverages
Pharmaceuticals and life sciences
Environmental
Others
Methodology
Bioanalytical testing
Stability testing
Raw material testing
Dissolution testing
Others
Geography
North America
US
Europe
Germany
UK
APAC
China
India
Rest of World (ROW)
By Type Insights
The chromatography segment is estimated to witness significant growth during the forecast period.
The market is driven by the increasing demand for techniques ensuring data integrity and precision in various industries. Chromatography technology, known for its high performance in identifying and separating impurities, dominates the market. Liquid chromatography and gas chromatography, with their extensive range of applications in chemical analysis, pharmaceutical research, and food safety, are significant contributors. Advancements in technologies such as high-performance liquid chromatography, gas chromatography-mass spectrometry, and liquid chromatography-mass spectrometry, have boosted their adoption. Measurement uncertainty and validation studies are integral to the market, ensuring accurate and reliable results. Calibration standards and reference materials play a crucial role in maintaining measurement consistency, while laboratory accreditation and quality management systems ensure data integrity.
Techniques like nuclear magnetic resonance, infrared spectroscopy, Raman spectroscopy, and mass spectrometry offer complementary analysis, enhancing the overall analytical process. Environmental monitoring and materials science applications further expand the market's reach. Inorganic analysis and elemental analysis are essential for industries dealing with heavy metals and minerals. Quality control and quality assurance are integral to maintaining product consistency and safety. Good laboratory practices and standard operating procedures ensure consistent and reliable results, while interlaboratory c
Famatinite is a mineral with nominal chemical formula Cu3SbS4. This electron excited X-ray data set was collected from a natural flat-polished sample and the surrounding silicate mineral. Live time/pixel: 0.704.00.953600.0/(512512 # 0.95 hours on 4 detectors Probe current: 1.0 nA Beam energy: 20 keV Energy scale: 10 eV/ch and 0.0 eV offset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MixS human Gut Data Standard_SampleSequencingAssaySetup. Example CSV file for setting up sample registration and update events for MixS human Gut Data Standard. (CSV 19 kb)
The primary data consist of allele or haplotype frequencies for N=1036 anonymized U.S. population samples. Additional files are supplements to the associated publications. Any changes to spreadsheets are listed in the "Change Log" tab within each spreadsheet. DOI numbers for associated publications are listed below, under "References".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.
The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).
In 2001, the World Bank in co-operation with the Republika Srpska Institute of Statistics (RSIS), the Federal Institute of Statistics (FOS) and the Agency for Statistics of BiH (BHAS), carried out a Living Standards Measurement Survey (LSMS).
The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:
To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs.
To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.
To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analyzed data.
The Department for International Development, UK (DFID) contributed funding to the LSMS and provided funding for a further three years of data collection for a panel survey, known as the Household Survey Panel Series (HSPS) – and more popularly known as Living in BiH (LiBiH). Birks Sinclair & Associates Ltd. in cooperation with the Independent Bureau for Humanitarian Issues (IBHI) were responsible for the management of the HSPS with technical advice and support provided by the Institute for Social and Economic Research (ISER), University of Essex, UK.
The panel survey provides longitudinal data through re-interviewing approximately half the LSMS respondents for three years following the LSMS, in the autumns of 2002 and 2003 and the winter of 2004. The LSMS constitutes Wave 1 of the panel survey so there are four years of panel data available for analysis. For the purposes of this documentation we are using the following convention to describe the different rounds of the panel survey: - Wave 1 LSMS conducted in 2001 forms the baseline survey for the panel - Wave 2 Second interview of 50% of LSMS respondents in Autumn/Winter 2002 - Wave 3 Third interview with sub-sample respondents in Autumn/Winter 2003 - Wave 4 Fourth interview with sub-sample respondents in Winter 2004
The panel data allows the analysis of key transitions and events over this period such as labour market or geographical mobility and observations on the consequent outcomes for the well-being of individuals and households in the survey. The panel data provides information on income and labour market dynamics within FBiH and RS. A key policy area is developing strategies for the reduction of poverty within FBiH and RS. The panel will provide information on the extent to which continuous poverty and movements in an out of poverty are experienced by different types of households and individuals over the four year period. Most importantly, the co-variates associated with moves into and out of poverty and the relative risks of poverty for different people can be assessed. As such, the panel aims to provide data, which will inform the policy debates within BiH at a time of social reform and rapid change.
In order to develop base line (2004) data on poverty, incomes and socio-economic conditions, and to begin to monitor and evaluate the implementation of the BiH MTDS, EPPU commissioned this modified fourth round of the LiBiH Panel Survey.
National coverage. Domains: Urban/rural/mixed; Federation; Republic
Households
Sample survey data [ssd]
The Wave 4 sample comprised of 2882 households interviewed at Wave 3 (1309 in the RS and 1573 in FBiH). As at previous waves, sample households could not be replaced with any other households.
Panel design
Eligibility for inclusion
The household and household membership definitions assume the same standard definitions used at Wave 3. While the sample membership, status and eligibility for interview are as follows: i) All members of households interviewed at Wave 3 have been designated as original sample members (OSMs). OSMs include children within households even if they are too young for interview, i.e. younger than 15 years. ii) Any new members joining a household containing at least one OSM, are eligible for inclusion and are designated as new sample members (NSMs). iii) At each wave, all OSMs and NSMs are eligible for inclusion, apart from those who move outof-scope (see discussion below). iv) All household members aged 15 or over are eligible for interview, including OSMs and NSMs.
Following rules The panel design provides that sample members who move from their previous wave address must be traced and followed to their new address for interview. In some cases the whole household will move together but in other cases an individual member may move away from their previous wave household and form a new "split-off" household of their own. All sample members, OSMs and NSMs, are followed at each wave and an interview attempted. This method has the benefits of maintaining the maximum number of respondents within the panel and being relatively straightforward to implement in the field.
Definition of 'out-of-scope'
It is important to maintain movers within the sample to maintain sample sizes and reduce attrition and also for substantive research on patterns of geographical mobility and migration. The rules for determining when a respondent is 'out-of-scope' are:
i. Movers out of the country altogether i.e. outside BiH This category of mover is clear. Sample members moving to another country outside BiH will be out-of-scope for that year of the survey and ineligible for interview.
ii. Movers between entities Respondents moving between entities are followed for interview. Personal details of "movers" are passed between the statistical institutes and an interviewer assigned in that entity.
iii. Movers into institutions Although institutional addresses were not included in the original LSMS sample, Wave 4 individuals who have subsequently moved into some institutions are followed. The definitions for which institutions are included are found in the Supervisor Instructions.
iv. Movers into the district of Brcko Are followed for interview. When coding, Brcko is treated as the entity from which the household moved.
Feed-forward
Details of the address at which respondents were found in the previous wave, together with a listing of household members found in each household at the last wave were fed-forward as the starting point for Wave 4 fieldwork. This "feed-forward" data also includes key variables required for correctly identifying individual sample members and includes the following: - For each household: Household ID (IDD); Full address details and phone number - For each Original Sample Member: Name; Person number (ID); unique personal identifier (LID); Sex; Date of birth
The sample details are held in an Access database and in order to ensure the confidentiality of respondents, personal details, names and addresses are held separately from the survey data collected during fieldwork. The IDD, LID and ID are the key linking variables between the two databases i.e. the name and address database and the survey database.
Face-to-face [f2f]
Dat entry
As at previous waves, CSPro was the chosen data entry software. The CSPro program consists of two main features intended to reduce the number of keying errors and to reduce the editing required following data entry: - Data entry screens that included all skip patterns. - Range checks for each question (allowing three exceptions for inappropriate, don't know and missing codes).
The Wave 4 data entry program had similar checks to the Wave 3 program - and DE staff were instructed to clear all anomalies with SIG fieldwork members. The program was tested prior to the commencement of data entry. Twelve data entry staff were employed in each Field Office, as all had worked on previous waves training was not undertaken.
Editing
Instructions for editing were provided in the Supervisors Instructions. At Wave 4 supervisors were asked to take more time to edit every questionnaire returned by their interviewers. The SIG Fieldwork Managers examined every Control Form.
The level of cases that were unable to be traced is extremely low as are the whole household refusal or non-contact rates. In total, 9128 individuals (including children) were enumerated within the sample households at Wave 4, 5019 individuals in the FBiH and 4109 in the RS. Within in the 2875 eligible households, 7603 individuals aged 15 or over were eligible for interview with 7116 (93.6%) being successfully interviewed. Within co-operating households (where there was at least one interview) the interview rate was higher (98.6%).
A very important measure in longitudinal surveys is the annual individual re-interview rate as a high attrition rate, where large numbers of respondents drop out of the survey over time, can call into question the quality of the data collected. In BiH the individual re-interview rates have been high for the survey. The individual re-interview rate is the proportion of people who gave an interview at time t-1 who also give an interview at t. Of those who gave a full interview at wave 3, 6654 also gave a full
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interoperability in systems-of-systems is a difficult problem due to the abundance of data standards and formats. Current approaches to interoperability rely on hand-made adapters or methods using ontological metadata. This dataset was created to facilitate research on data-driven interoperability solutions. The data comes from a simulation of a building heating system, and the messages sent within control systems-of-systems. For more information see attached data documentation.
The data comes in two semicolon-separated (;) csv files, training.csv and test.csv. The train/test split is not random; training data comes from the first 80% of simulated timesteps, and the test data is the last 20%. There is no specific validation dataset, the validation data should instead be randomly selected from the training data. The simulation runs for as many time steps as there are outside temperature values available. The original SMHI data only samples once every hour, which we linearly interpolate to get one temperature sample every ten seconds. The data saved at each time step consists of 34 JSON messages (four per room and two temperature readings from the outside), 9 temperature values (one per room and outside), 8 setpoint values, and 8 actuator outputs. The data associated with each of those 34 JSON-messages is stored as a single row in the tables. This means that much data is duplicated, a choice made to make it easier to use the data.
The simulation data is not meant to be opened and analyzed in spreadsheet software, it is meant for training machine learning models. It is recommended to open the data with the pandas library for Python, available at https://pypi.org/project/pandas/.
The data file with temperatures (smhi-july-23-29-2018.csv) acts as input for the thermodynamic building simulation found on Github, where it is used to get the outside temperature and corresponding timestamps. Temperature data for Luleå Summer 2018 were downloaded from SMHI.
These data are the results of a systematic review that investigated how data standards and reporting formats are documented on the version control platform GitHub. Our systematic review identified 32 data standards in earth science, environmental science, and ecology that use GitHub for version control of data standard documents. In our analysis, we characterized the documents and content within each of the 32 GitHub repositories to identify common practices for groups that version control their documents on GitHub. In this data package, there are 8 CSV files that contain data that we characterized from each repository, according to the location within the repository. For example, in 'readme_pages.csv' we characterize the content that appears across the 32 GitHub repositories included in our systematic review. Each of the 8 CSV files has an associated data dictionary file (names appended with '_dd.csv' and here we describe each content category within CSV files. There is one file-level metadata file (flmd.csv) that provides a description of each file within the data package.