Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model performance results based on random forest, gradient boosting, penalized logistic regression, XGBoost, SVM, neural network, and stacking for APAT data as training set and EMBARC data as testing set after multiple imputation for 10 times.
https://www.kaggle.com/tpmeli/missing-data-exploration-mean-iterative-more
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Starter vs. non-starter contrasts from non-imputed data.
https://infrabel.opendatasoft.com/pages/license/https://infrabel.opendatasoft.com/pages/license/
The historical method used to calculate the statistics presented in this dataset takes into account all the minutes of delay caused by 'major incidents' (internally know as 'relazen') on the rail network as reported to the Railway Accident and Incident Investigation Body (OEAIF/OOIS) and the Railway Safety and Interoperability Service (NSA Rail Belgium) under the Royal Decree of 16 January 2007 laying down certain rules relating to investigations into railway accidents and incidents. The criteria defining 'major incidents' (internally known as 'relations'**) are as follows:
1 passenger train delayed by an incident for 20 minutes or more Several passenger trains delayed by an incident for at least 40 minutes Incidents leading to the cancellation (partial or total) of trains Incidents with an impact on operational safety
There is no unequivocal relationship between the minutes of delay in 'major incidents' and the punctuality rate because:
The minutes included in 'major incidents' do not necessarily have an actual impact on punctuality (a train can make up its delay as it goes along). Some trains arrive at their terminus more than 6 minutes late (and therefore have an actual impact on punctuality), but are not included in the 'major incidents'.
In order to provide an exhaustive overview of the causes and responsibilities for delays, a new dataset has been made available: Monthly causes of loss of punctuality. The data presented in this new dataset is as follows: for each train delayed by 6 minutes or more on arrival at a tracking point*, an analysis is made of the cause of all the minutes of delay along the route, and a proportional score is awarded for each responsibility identified. More info in the new dataset's description
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This observational study aimed to analyze external training load in highly trained female football players, comparing starters and non-starters across various cycle lengths and training days. Method: External training load [duration, total distance [TD], high-speed running distance [HSRD], sprint distance [SpD], and acceleration- and deceleration distance [AccDecdist] from 100 female football players (22.3 ± 3.7 years of age) in the Norwegian premier division were collected over two seasons using STATSports APEX. This resulted in a final dataset totaling 10498 observations after multiple imputation of missing data. Microcycle length was categorized based on the number of days between matches (2 to 7 days apart), while training days were categorized relative to match day (MD, MD+1, MD+2, MD-5, MD-4, MD-3, MD-2, MD-1). Linear mixed modeling was used to assess differences between days, and starters vs. non-starters. Results: In longer cycle lengths (5–7 days between matches), the middle of the week (usually MD-4 or MD-3) consistently exhibited the highest external training load (~21–79% of MD TD, MD HSRD, MD SpD, and MD AccDecdist); though, with the exception of duration (~108–120% of MD duration), it remained lower than MD. External training load was lowest on MD+2 and MD-1 (~1–37% of MD TD, MD HSRD, MD SpD, MD AccDecdist, and ~73–88% of MD peak speed). Non-starters displayed higher loads (~137–400% of starter TD, HSRD, SpD, AccDecdist) on MD+2 in cycles with 3 to 7 days between matches, with non-significant differences (~76–116%) on other training days. Conclusion: Loading patterns resemble a pyramid or skewed pyramid during longer cycle lengths (5–7 days), with higher training loads towards the middle compared to the start and the end of the cycle. Non-starters displayed slightly higher loads on MD+2, with no significant load differentiation from MD-5 onwards.
Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
Latest edition information
For the fourth edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
AbstractGenomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. Usage notesphenotypephenotype and experimental designphen mask dryad.txtSVD genotype imputationmarker matrix for SVD imputation methodSVDimp dryad.txtMean genotype imputationmarker matrix for mean imputation methodMeanImp dryad.txtKNN genotype imputationmarker matrix for KNN imputation methodKNNimp dryad.txt
Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
Latest edition information
For the third edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.New reweighting policy
Following the new reweighting policy ONS has reviewed the latest population estimates made available during 2019 and have decided not to carry out a 2019 LFS and APS reweighting exercise. Therefore, the next reweighting exercise will take place in 2020. These will incorporate the 2019 Sub-National Population Projection data (published in May 2020) and 2019 Mid-Year Estimates (published in June 2020). It is expected that reweighted Labour Market aggregates and microdata will be published towards the end of 2020/early 2021.
Secure Access QLFS household data
Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. For some quarters, users should note that all missing values in the data are set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. From the 2013 household datasets, the standard -8 and -9 missing categories have been reinstated.
Secure Access household datasets for the QLFS are available from 2002 onwards, and include additional, detailed variables not included in the standard 'End User Licence' (EUL) versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurence of learning difficulty or disability; and benefits.
Prospective users of a Secure Access version of the QLFS will need to fulfil additional requirements, commencing with the completion of an extra application form to demonstrate to the data owners exactly why they need access to the extra, more detailed variables, in order to obtain permission to use that version. Secure Access users must also complete face-to-face training and agree to Secure Access' User Agreement (see 'Access' section below). Therefore, users are encouraged to download and inspect the EUL version of the data prior to ordering the Secure Access version.
LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of each volume of the User Guide including the appropriate questionnaires for the years concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS LFS User Guidance pages before commencing analysis.
The study documentation presented in the Documentation section includes the most recent documentation for the LFS only, due to available space. Documentation for previous years is provided alongside the data for access and is also available upon request.
Review of imputation methods for LFS Household data - changes to missing values
A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.
Variables DISEA and LNGLST
Dataset A08 (Labour market status of disabled people) which ONS suspended due to an apparent discontinuity between April to June 2017 and July to September 2017 is now available. As a result of this apparent discontinuity and the inconclusive investigations at this stage, comparisons should be made with caution between April to June 2017 and subsequent time periods. However users should note that the estimates are not seasonally adjusted, so some of the change between quarters could be due to seasonality. Further recommendations on historical comparisons of the estimates will be given in November 2018 when ONS are due to publish estimates for July to September 2018.
An article explaining the quality assurance investigations that have been conducted so far is available on the ONS Methodology webpage. For any queries about Dataset A08 please email Labour.Market@ons.gov.uk.
Latest Edition Information
For the sixteenth
edition (November 2023), one quarterly data file covering the time period
April-June, 2023, along with a new Excel variable catalogue for 2023 and a
documentation form, have been added to the study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Estimated marginal means by MD, cycle, and squad status, from imputed data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of study participants’ clinical variables, demographic variables and baseline questionnaire scores by remission status in EMBARC study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Values are presented as means with the standard deviations in parentheses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cardiovascular mortality according to alcohol consumption frequency using multiple imputation (n = 245,336).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model performance results based on random forest, gradient boosting, penalized logistic regression, XGBoost, SVM, neural network, and stacking for APAT data as training set and EMBARC data as testing set after multiple imputation for 10 times.