14 datasets found
  1. f

    Model performance results based on random forest, gradient boosting,...

    • plos.figshare.com
    xls
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junying Wang; David D. Wu; Christine DeLorenzo; Jie Yang (2024). Model performance results based on random forest, gradient boosting, penalized logistic regression, XGBoost, SVM, neural network, and stacking for APAT data as training set and EMBARC data as testing set after multiple imputation for 10 times. [Dataset]. http://doi.org/10.1371/journal.pone.0299625.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Junying Wang; David D. Wu; Christine DeLorenzo; Jie Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model performance results based on random forest, gradient boosting, penalized logistic regression, XGBoost, SVM, neural network, and stacking for APAT data as training set and EMBARC data as testing set after multiple imputation for 10 times.

  2. Iterative Imputation of Jane St train.csv

    • kaggle.com
    Updated Nov 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tpmeli (2020). Iterative Imputation of Jane St train.csv [Dataset]. https://www.kaggle.com/tpmeli/iterative-imputation-of-jane-st-traincsv/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    tpmeli
    Description

    I will be sharing all of my missing data exploration here:

    https://www.kaggle.com/tpmeli/missing-data-exploration-mean-iterative-more

  3. f

    Starter vs. non-starter contrasts from non-imputed data.

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas K. Winther; Ivan Baptista; Sigurd Pedersen; João Brito; Morten B. Randers; Dag Johansen; Svein Arne Pettersen (2024). Starter vs. non-starter contrasts from non-imputed data. [Dataset]. http://doi.org/10.1371/journal.pone.0299851.s007
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Andreas K. Winther; Ivan Baptista; Sigurd Pedersen; João Brito; Morten B. Randers; Dag Johansen; Svein Arne Pettersen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Starter vs. non-starter contrasts from non-imputed data.

  4. o

    Monthly imputation of delays

    • infrabel.opendatasoft.com
    • opendata.infrabel.be
    • +2more
    csv, excel, json
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Monthly imputation of delays [Dataset]. https://infrabel.opendatasoft.com/explore/dataset/toewijzingvertraging/table/
    Explore at:
    json, csv, excelAvailable download formats
    Dataset updated
    Aug 6, 2024
    License

    https://infrabel.opendatasoft.com/pages/license/https://infrabel.opendatasoft.com/pages/license/

    Description

    The historical method used to calculate the statistics presented in this dataset takes into account all the minutes of delay caused by 'major incidents' (internally know as 'relazen') on the rail network as reported to the Railway Accident and Incident Investigation Body (OEAIF/OOIS) and the Railway Safety and Interoperability Service (NSA Rail Belgium) under the Royal Decree of 16 January 2007 laying down certain rules relating to investigations into railway accidents and incidents. The criteria defining 'major incidents' (internally known as 'relations'**) are as follows:

    1 passenger train delayed by an incident for 20 minutes or more Several passenger trains delayed by an incident for at least 40 minutes Incidents leading to the cancellation (partial or total) of trains Incidents with an impact on operational safety

    There is no unequivocal relationship between the minutes of delay in 'major incidents' and the punctuality rate because:

    The minutes included in 'major incidents' do not necessarily have an actual impact on punctuality (a train can make up its delay as it goes along). Some trains arrive at their terminus more than 6 minutes late (and therefore have an actual impact on punctuality), but are not included in the 'major incidents'.

    In order to provide an exhaustive overview of the causes and responsibilities for delays, a new dataset has been made available: Monthly causes of loss of punctuality. The data presented in this new dataset is as follows: for each train delayed by 6 minutes or more on arrival at a tracking point*, an analysis is made of the cause of all the minutes of delay along the route, and a proportional score is awarded for each responsibility identified. More info in the new dataset's description

  5. f

    Between-day contrasts from non-imputed data.

    • figshare.com
    xlsx
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas K. Winther; Ivan Baptista; Sigurd Pedersen; João Brito; Morten B. Randers; Dag Johansen; Svein Arne Pettersen (2024). Between-day contrasts from non-imputed data. [Dataset]. http://doi.org/10.1371/journal.pone.0299851.s005
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Andreas K. Winther; Ivan Baptista; Sigurd Pedersen; João Brito; Morten B. Randers; Dag Johansen; Svein Arne Pettersen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This observational study aimed to analyze external training load in highly trained female football players, comparing starters and non-starters across various cycle lengths and training days. Method: External training load [duration, total distance [TD], high-speed running distance [HSRD], sprint distance [SpD], and acceleration- and deceleration distance [AccDecdist] from 100 female football players (22.3 ± 3.7 years of age) in the Norwegian premier division were collected over two seasons using STATSports APEX. This resulted in a final dataset totaling 10498 observations after multiple imputation of missing data. Microcycle length was categorized based on the number of days between matches (2 to 7 days apart), while training days were categorized relative to match day (MD, MD+1, MD+2, MD-5, MD-4, MD-3, MD-2, MD-1). Linear mixed modeling was used to assess differences between days, and starters vs. non-starters. Results: In longer cycle lengths (5–7 days between matches), the middle of the week (usually MD-4 or MD-3) consistently exhibited the highest external training load (~21–79% of MD TD, MD HSRD, MD SpD, and MD AccDecdist); though, with the exception of duration (~108–120% of MD duration), it remained lower than MD. External training load was lowest on MD+2 and MD-1 (~1–37% of MD TD, MD HSRD, MD SpD, MD AccDecdist, and ~73–88% of MD peak speed). Non-starters displayed higher loads (~137–400% of starter TD, HSRD, SpD, AccDecdist) on MD+2 in cycles with 3 to 7 days between matches, with non-significant differences (~76–116%) on other training days. Conclusion: Loading patterns resemble a pyramid or skewed pyramid during longer cycle lengths (5–7 days), with higher training loads towards the middle compared to the start and the end of the cycle. Non-starters displayed slightly higher loads on MD+2, with no significant load differentiation from MD-5 onwards.

  6. Quarterly Labour Force Survey Household Dataset, January - March, 2021

    • beta.ukdataservice.ac.uk
    • datacatalogue.cessda.eu
    Updated 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, January - March, 2021 [Dataset]. http://doi.org/10.5255/ukda-sn-8809-4
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    datacite
    Authors
    Office For National Statistics
    Description
    Background
    The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

    Household datasets
    Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

    Change to coding of missing values for household series
    From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

    LFS Documentation
    The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS
    LFS User Guidance page before commencing analysis.

    Additional data derived from the QLFS
    The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

    End User Licence and Secure Access QLFS Household datasets
    Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

    Changes to variables in QLFS Household EUL datasets
    In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

    Review of imputation methods for LFS Household data - changes to missing values
    A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

    Occupation data for 2021 and 2022 data files

    The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

    Latest edition information

    For the fourth edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

  7. B

    Data from: A comparison of genomic selection models across time in interior...

    • borealisdata.ca
    • open.library.ubc.ca
    Updated May 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blaise Ratcliffe; Omnia Gamal El-Dien; Jaroslav Klápště; Ilga Porth; Charles Chen; Barry Jaquish; Yousry A. El-Kassaby (2021). Data from: A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods [Dataset]. http://doi.org/10.5683/SP2/I9BJI6
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2021
    Dataset provided by
    Borealis
    Authors
    Blaise Ratcliffe; Omnia Gamal El-Dien; Jaroslav Klápště; Ilga Porth; Charles Chen; Barry Jaquish; Yousry A. El-Kassaby
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    British Columbia
    Description

    AbstractGenomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated. Usage notesphenotypephenotype and experimental designphen mask dryad.txtSVD genotype imputationmarker matrix for SVD imputation methodSVDimp dryad.txtMean genotype imputationmarker matrix for mean imputation methodMeanImp dryad.txtKNN genotype imputationmarker matrix for KNN imputation methodKNNimp dryad.txt

  8. Quarterly Labour Force Survey Household Dataset, July - September, 2021

    • beta.ukdataservice.ac.uk
    • datacatalogue.cessda.eu
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, July - September, 2021 [Dataset]. http://doi.org/10.5255/ukda-sn-8876-3
    Explore at:
    Dataset updated
    2023
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    datacite
    Authors
    Office For National Statistics
    Description
    Background
    The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

    Household datasets
    Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

    Change to coding of missing values for household series
    From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

    LFS Documentation
    The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS
    LFS User Guidance page before commencing analysis.

    Additional data derived from the QLFS
    The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

    End User Licence and Secure Access QLFS Household datasets
    Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

    Changes to variables in QLFS Household EUL datasets
    In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

    Review of imputation methods for LFS Household data - changes to missing values
    A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

    Occupation data for 2021 and 2022 data files

    The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

    Latest edition information

    For the third edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN and SOC20M have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

  9. Data from: A comparison of genomic selection models across time in interior...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated May 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blaise Ratcliffe; Omnia Gamal El-Dien; Jaroslav Klápště; Ilga Porth; Charles Chen; Barry Jaquish; Yousry A. El-Kassaby; Blaise Ratcliffe; Omnia Gamal El-Dien; Jaroslav Klápště; Ilga Porth; Charles Chen; Barry Jaquish; Yousry A. El-Kassaby (2022). Data from: A comparison of genomic selection models across time in interior spruce (Picea engelmannii × glauca) using unordered SNP imputation methods [Dataset]. http://doi.org/10.5061/dryad.m4vh4
    Explore at:
    Dataset updated
    May 27, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Blaise Ratcliffe; Omnia Gamal El-Dien; Jaroslav Klápště; Ilga Porth; Charles Chen; Barry Jaquish; Yousry A. El-Kassaby; Blaise Ratcliffe; Omnia Gamal El-Dien; Jaroslav Klápště; Ilga Porth; Charles Chen; Barry Jaquish; Yousry A. El-Kassaby
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Genomic selection (GS) potentially offers an unparalleled advantage over traditional pedigree-based selection (TS) methods by reducing the time commitment required to carry out a single cycle of tree improvement. This quality is particularly appealing to tree breeders, where lengthy improvement cycles are the norm. We explored the prospect of implementing GS for interior spruce (Picea engelmannii × glauca) utilizing a genotyped population of 769 trees belonging to 25 open-pollinated families. A series of repeated tree height measurements through ages 3–40 years permitted the testing of GS methods temporally. The genotyping-by-sequencing (GBS) platform was used for single nucleotide polymorphism (SNP) discovery in conjunction with three unordered imputation methods applied to a data set with 60% missing information. Further, three diverse GS models were evaluated based on predictive accuracy (PA), and their marker effects. Moderate levels of PA (0.31–0.55) were observed and were of sufficient capacity to deliver improved selection response over TS. Additionally, PA varied substantially through time accordingly with spatial competition among trees. As expected, temporal PA was well correlated with age-age genetic correlation (r=0.99), and decreased substantially with increasing difference in age between the training and validation populations (0.04–0.47). Moreover, our imputation comparisons indicate that k-nearest neighbor and singular value decomposition yielded a greater number of SNPs and gave higher predictive accuracies than imputing with the mean. Furthermore, the ridge regression (rrBLUP) and BayesCπ (BCπ) models both yielded equal, and better PA than the generalized ridge regression heteroscedastic effect model for the traits evaluated.

  10. Labour Force Survey Household Datasets, 2002-2023: Secure Access

    • beta.ukdataservice.ac.uk
    • datacatalogue.cessda.eu
    Updated 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Survey Division Office For National Statistics (2024). Labour Force Survey Household Datasets, 2002-2023: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-7674-16
    Explore at:
    Dataset updated
    2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    datacite
    Authors
    Social Survey Division Office For National Statistics
    Description

    Background

    The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

    New reweighting policy
    Following the new reweighting policy ONS has reviewed the latest population estimates made available during 2019 and have decided not to carry out a 2019 LFS and APS reweighting exercise. Therefore, the next reweighting exercise will take place in 2020. These will incorporate the 2019 Sub-National Population Projection data (published in May 2020) and 2019 Mid-Year Estimates (published in June 2020). It is expected that reweighted Labour Market aggregates and microdata will be published towards the end of 2020/early 2021.

    Secure Access QLFS household data
    Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. For some quarters, users should note that all missing values in the data are set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. From the 2013 household datasets, the standard -8 and -9 missing categories have been reinstated.

    Secure Access household datasets for the QLFS are available from 2002 onwards, and include additional, detailed variables not included in the standard 'End User Licence' (EUL) versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurence of learning difficulty or disability; and benefits.

    Prospective users of a Secure Access version of the QLFS will need to fulfil additional requirements, commencing with the completion of an extra application form to demonstrate to the data owners exactly why they need access to the extra, more detailed variables, in order to obtain permission to use that version. Secure Access users must also complete face-to-face training and agree to Secure Access' User Agreement (see 'Access' section below). Therefore, users are encouraged to download and inspect the EUL version of the data prior to ordering the Secure Access version.

    LFS Documentation
    The documentation available from the Archive to accompany LFS datasets largely consists of each volume of the User Guide including the appropriate questionnaires for the years concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS LFS User Guidance pages before commencing analysis.

    The study documentation presented in the Documentation section includes the most recent documentation for the LFS only, due to available space. Documentation for previous years is provided alongside the data for access and is also available upon request.

    Review of imputation methods for LFS Household data - changes to missing values
    A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

    Variables DISEA and LNGLST
    Dataset A08 (Labour market status of disabled people) which ONS suspended due to an apparent discontinuity between April to June 2017 and July to September 2017 is now available. As a result of this apparent discontinuity and the inconclusive investigations at this stage, comparisons should be made with caution between April to June 2017 and subsequent time periods. However users should note that the estimates are not seasonally adjusted, so some of the change between quarters could be due to seasonality. Further recommendations on historical comparisons of the estimates will be given in November 2018 when ONS are due to publish estimates for July to September 2018.

    An article explaining the quality assurance investigations that have been conducted so far is available on the ONS Methodology webpage. For any queries about Dataset A08 please email Labour.Market@ons.gov.uk.

    Latest Edition Information
    For the sixteenth edition (November 2023), one quarterly data file covering the time period April-June, 2023, along with a new Excel variable catalogue for 2023 and a documentation form, have been added to the study.

  11. f

    Estimated marginal means by MD, cycle, and squad status, from imputed data.

    • figshare.com
    xlsx
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas K. Winther; Ivan Baptista; Sigurd Pedersen; João Brito; Morten B. Randers; Dag Johansen; Svein Arne Pettersen (2024). Estimated marginal means by MD, cycle, and squad status, from imputed data. [Dataset]. http://doi.org/10.1371/journal.pone.0299851.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Andreas K. Winther; Ivan Baptista; Sigurd Pedersen; João Brito; Morten B. Randers; Dag Johansen; Svein Arne Pettersen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Estimated marginal means by MD, cycle, and squad status, from imputed data.

  12. Descriptive statistics of study participants’ clinical variables,...

    • plos.figshare.com
    xls
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junying Wang; David D. Wu; Christine DeLorenzo; Jie Yang (2024). Descriptive statistics of study participants’ clinical variables, demographic variables and baseline questionnaire scores by remission status in EMBARC study. [Dataset]. http://doi.org/10.1371/journal.pone.0299625.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 28, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Junying Wang; David D. Wu; Christine DeLorenzo; Jie Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Descriptive statistics of study participants’ clinical variables, demographic variables and baseline questionnaire scores by remission status in EMBARC study.

  13. f

    Detailed overview of cohort characteristics for train and test cohort.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hexin Li; Negin Ashrafi; Chris Kang; Guanlan Zhao; Yubing Chen; Maryam Pishgar (2024). Detailed overview of cohort characteristics for train and test cohort. [Dataset]. http://doi.org/10.1371/journal.pone.0309383.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Hexin Li; Negin Ashrafi; Chris Kang; Guanlan Zhao; Yubing Chen; Maryam Pishgar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Values are presented as means with the standard deviations in parentheses.

  14. f

    Cardiovascular mortality according to alcohol consumption frequency using...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eirik Degerud; Inger Ariansen; Eivind Ystrom; Sidsel Graff-Iversen; Gudrun Høiseth; Jørg Mørland; George Davey Smith; Øyvind Næss (2023). Cardiovascular mortality according to alcohol consumption frequency using multiple imputation (n = 245,336). [Dataset]. http://doi.org/10.1371/journal.pmed.1002476.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS Medicine
    Authors
    Eirik Degerud; Inger Ariansen; Eivind Ystrom; Sidsel Graff-Iversen; Gudrun Høiseth; Jørg Mørland; George Davey Smith; Øyvind Næss
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cardiovascular mortality according to alcohol consumption frequency using multiple imputation (n = 245,336).

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Junying Wang; David D. Wu; Christine DeLorenzo; Jie Yang (2024). Model performance results based on random forest, gradient boosting, penalized logistic regression, XGBoost, SVM, neural network, and stacking for APAT data as training set and EMBARC data as testing set after multiple imputation for 10 times. [Dataset]. http://doi.org/10.1371/journal.pone.0299625.t005

Model performance results based on random forest, gradient boosting, penalized logistic regression, XGBoost, SVM, neural network, and stacking for APAT data as training set and EMBARC data as testing set after multiple imputation for 10 times.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Mar 28, 2024
Dataset provided by
PLOS ONE
Authors
Junying Wang; David D. Wu; Christine DeLorenzo; Jie Yang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Model performance results based on random forest, gradient boosting, penalized logistic regression, XGBoost, SVM, neural network, and stacking for APAT data as training set and EMBARC data as testing set after multiple imputation for 10 times.

Search
Clear search
Close search
Google apps
Main menu