39 datasets found

Data from: Accuracy of identifying incident stroke cases from linked...
zenodo.org
data.niaid.nih.gov
+1more
pdf
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristiina Rannikmae; Kristiina Rannikmae (2024). Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank [Dataset]. http://doi.org/10.5061/dryad.w9ghx3fk0
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w9ghx3fk0
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kristiina Rannikmae; Kristiina Rannikmae
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objective: In UK Biobank (UKB), a large population-based prospective study, cases of many diseases are ascertained through linkage to routinely collected, coded national health datasets. We assessed the accuracy of these for identifying incident strokes.

Methods: In a regional UKB sub-population (n=17,249), we identified all participants with ≥1 code signifying a first stroke after recruitment (incident stroke-coded cases) in linked hospital admission, primary care or death record data. Stroke physicians reviewed their full electronic patient records (EPRs) and generated reference standard diagnoses. We evaluated the number and proportion of cases that were true positives (i.e. positive predictive value, PPV) for all codes combined and by code source and type.

Results: Of 232 incident stroke-coded cases, 97% had EPR information available. Data sources were: 30% hospital admission only; 39% primary care only; 28% hospital and primary care; 3% death records only. While 42% of cases were coded as unspecified stroke type, review of EPRs enabled a pathological type to be assigned in >99%. PPVs (95% confidence intervals) were: 79% (73%-84%) for any stroke (89% for hospital admission codes, 80% for primary care codes) and 83% (74%-90%) for ischemic stroke. PPVs for small numbers of death record and hemorrhagic stroke codes were low but imprecise.

Conclusions: Stroke and ischemic stroke cases in UKB can be ascertained through linked health datasets with sufficient accuracy for many research studies. Further work is needed to understand the accuracy of death record and hemorrhagic stroke codes and to develop scalable approaches for better identifying stroke types.
f
Table_2_Physician-Confirmed and Administrative Definitions of Stroke in UK...
figshare.com
xlsx
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristiina Rannikmäe; Konrad Rawlik; Amy C. Ferguson; Nikos Avramidis; Muchen Jiang; Nicola Pirastu; Xia Shen; Emma Davidson; Rebecca Woodfield; Rainer Malik; Martin Dichgans; Albert Tenesa; Cathie Sudlow (2023). Table_2_Physician-Confirmed and Administrative Definitions of Stroke in UK Biobank Reflect the Same Underlying Genetic Trait.xlsx [Dataset]. http://doi.org/10.3389/fneur.2021.787107.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fneur.2021.787107.s003
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers
Authors
Kristiina Rannikmäe; Konrad Rawlik; Amy C. Ferguson; Nikos Avramidis; Muchen Jiang; Nicola Pirastu; Xia Shen; Emma Davidson; Rebecca Woodfield; Rainer Malik; Martin Dichgans; Albert Tenesa; Cathie Sudlow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundStroke in UK Biobank (UKB) is ascertained via linkages to coded administrative datasets and self-report. We studied the accuracy of these codes using genetic validation.MethodsWe compiled stroke-specific and broad cerebrovascular disease (CVD) code lists (Read V2/V3, ICD-9/-10) for medical settings (hospital, death record, primary care) and self-report. Among 408,210 UKB participants, we identified all with a relevant code, creating 12 stroke definitions based on the code type and source. We performed genome-wide association studies (GWASs) for each definition, comparing summary results against the largest published stroke GWAS (MEGASTROKE), assessing genetic correlations, and replicating 32 stroke-associated loci.ResultsThe stroke case numbers identified varied widely from 3,976 (primary care stroke-specific codes) to 19,449 (all codes, all sources). All 12 UKB stroke definitions were significantly correlated with the MEGASTROKE summary GWAS results (rg.81-1) and each other (rg.4-1). However, Bonferroni-corrected confidence intervals were wide, suggesting limited precision of some results. Six previously reported stroke-associated loci were replicated using ≥1 UKB stroke definition.ConclusionsStroke case numbers in UKB depend on the code source and type used, with a 5-fold difference in the maximum case-sample size. All stroke definitions are significantly genetically correlated with the largest stroke GWAS to date.
Data from: Improving genome-wide association discovery and genomic...
zenodo.org
datadryad.org
application/gzip, bin
Updated Sep 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew R. Robinson; Matthew R. Robinson; Etienne J. Orliac; Daniel Trejo Banos; Sven E. Ojavee; Sven E. Ojavee; Kristi Läll; Reedik Mägi; Peter M. Visscher; Etienne J. Orliac; Daniel Trejo Banos; Kristi Läll; Reedik Mägi; Peter M. Visscher (2022). Improving genome-wide association discovery and genomic prediction accuracy in biobank data [Dataset]. http://doi.org/10.5061/dryad.gtht76hmz
Explore at:
bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gtht76hmz
Dataset updated
Sep 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthew R. Robinson; Matthew R. Robinson; Etienne J. Orliac; Daniel Trejo Banos; Sven E. Ojavee; Sven E. Ojavee; Kristi Läll; Reedik Mägi; Peter M. Visscher; Etienne J. Orliac; Daniel Trejo Banos; Kristi Läll; Reedik Mägi; Peter M. Visscher
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency–linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R 2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated h SNP 2 . We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average χ² value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies.
f
Table_1_Physician-Confirmed and Administrative Definitions of Stroke in UK...
frontiersin.figshare.com
xlsx
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristiina Rannikmäe; Konrad Rawlik; Amy C. Ferguson; Nikos Avramidis; Muchen Jiang; Nicola Pirastu; Xia Shen; Emma Davidson; Rebecca Woodfield; Rainer Malik; Martin Dichgans; Albert Tenesa; Cathie Sudlow (2023). Table_1_Physician-Confirmed and Administrative Definitions of Stroke in UK Biobank Reflect the Same Underlying Genetic Trait.XLSX [Dataset]. http://doi.org/10.3389/fneur.2021.787107.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fneur.2021.787107.s002
Dataset updated
Jun 15, 2023
Dataset provided by
Frontiers
Authors
Kristiina Rannikmäe; Konrad Rawlik; Amy C. Ferguson; Nikos Avramidis; Muchen Jiang; Nicola Pirastu; Xia Shen; Emma Davidson; Rebecca Woodfield; Rainer Malik; Martin Dichgans; Albert Tenesa; Cathie Sudlow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundStroke in UK Biobank (UKB) is ascertained via linkages to coded administrative datasets and self-report. We studied the accuracy of these codes using genetic validation.MethodsWe compiled stroke-specific and broad cerebrovascular disease (CVD) code lists (Read V2/V3, ICD-9/-10) for medical settings (hospital, death record, primary care) and self-report. Among 408,210 UKB participants, we identified all with a relevant code, creating 12 stroke definitions based on the code type and source. We performed genome-wide association studies (GWASs) for each definition, comparing summary results against the largest published stroke GWAS (MEGASTROKE), assessing genetic correlations, and replicating 32 stroke-associated loci.ResultsThe stroke case numbers identified varied widely from 3,976 (primary care stroke-specific codes) to 19,449 (all codes, all sources). All 12 UKB stroke definitions were significantly correlated with the MEGASTROKE summary GWAS results (rg.81-1) and each other (rg.4-1). However, Bonferroni-corrected confidence intervals were wide, suggesting limited precision of some results. Six previously reported stroke-associated loci were replicated using ≥1 UKB stroke definition.ConclusionsStroke case numbers in UKB depend on the code source and type used, with a 5-fold difference in the maximum case-sample size. All stroke definitions are significantly genetically correlated with the largest stroke GWAS to date.
Z
A scalable, accurate, and universal analysis framework to control for sample...
data.niaid.nih.gov
zenodo.org
Updated Dec 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
He, Xu (2023). A scalable, accurate, and universal analysis framework to control for sample relatedness in large-scale genome-wide association studies and its application to 79 longitudinal traits in UK Biobank [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10242061
Explore at:
Dataset updated
Dec 3, 2023
Dataset provided by
He, Xu
Wenjian, Bi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample relatedness is a major confounder in large-scale GWAS and could result in inflation if not appropriately controlled. Incorporating GRM-related random effects into the conventional models is the mostly used strategy. Although effective, it is technically challenging to extend this strategy to other complex traits with complicated structure. In this work, we propose a scalable, accurate, and universal analysis framework, SPAGRM, in which the sample relatedness is controlled via the precise approximation of the joint distribution of genotypes for related samples in families. SPAGRM can utilize GRM-free conventional models and thus is applicable to a wide variety of traits. A hybrid strategy including saddlepoint approximation (SPA) can greatly increase the accuracy to analyze low-frequency and rare genetic variants, especially if the phenotypic distribution is unbalanced. Extensive simulation studies and real data analyses validated that SPAGRM is accurate to control type I error rates and can gain power for a longitudinal trait analysis. Expanding upon the previous studies, we implemented a refined and meticulous QC pipeline to extract 79 longitudinal traits from UK Biobank primary care data. The application of SPAGRM to the 79 longitudinal traits identified 7,463 genetic loci, which is a pioneering attempt to conduct GWAS for a majority of these traits as a longitudinal phenotype.
f
Data_Sheet_1_Physician-Confirmed and Administrative Definitions of Stroke in...
frontiersin.figshare.com
pdf
Updated Jun 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristiina Rannikmäe; Konrad Rawlik; Amy C. Ferguson; Nikos Avramidis; Muchen Jiang; Nicola Pirastu; Xia Shen; Emma Davidson; Rebecca Woodfield; Rainer Malik; Martin Dichgans; Albert Tenesa; Cathie Sudlow (2023). Data_Sheet_1_Physician-Confirmed and Administrative Definitions of Stroke in UK Biobank Reflect the Same Underlying Genetic Trait.pdf [Dataset]. http://doi.org/10.3389/fneur.2021.787107.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fneur.2021.787107.s001
Dataset updated
Jun 15, 2023
Dataset provided by
Frontiers
Authors
Kristiina Rannikmäe; Konrad Rawlik; Amy C. Ferguson; Nikos Avramidis; Muchen Jiang; Nicola Pirastu; Xia Shen; Emma Davidson; Rebecca Woodfield; Rainer Malik; Martin Dichgans; Albert Tenesa; Cathie Sudlow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundStroke in UK Biobank (UKB) is ascertained via linkages to coded administrative datasets and self-report. We studied the accuracy of these codes using genetic validation.MethodsWe compiled stroke-specific and broad cerebrovascular disease (CVD) code lists (Read V2/V3, ICD-9/-10) for medical settings (hospital, death record, primary care) and self-report. Among 408,210 UKB participants, we identified all with a relevant code, creating 12 stroke definitions based on the code type and source. We performed genome-wide association studies (GWASs) for each definition, comparing summary results against the largest published stroke GWAS (MEGASTROKE), assessing genetic correlations, and replicating 32 stroke-associated loci.ResultsThe stroke case numbers identified varied widely from 3,976 (primary care stroke-specific codes) to 19,449 (all codes, all sources). All 12 UKB stroke definitions were significantly correlated with the MEGASTROKE summary GWAS results (rg.81-1) and each other (rg.4-1). However, Bonferroni-corrected confidence intervals were wide, suggesting limited precision of some results. Six previously reported stroke-associated loci were replicated using ≥1 UKB stroke definition.ConclusionsStroke case numbers in UKB depend on the code source and type used, with a 5-fold difference in the maximum case-sample size. All stroke definitions are significantly genetically correlated with the largest stroke GWAS to date.
E
Cornea resistance factor GWAS in the UK Biobank
find.data.gov.scot
dtechtive.com
gz, txt
Updated Nov 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Cornea resistance factor GWAS in the UK Biobank [Dataset]. http://doi.org/10.7488/ds/2944
Explore at:
txt(0.0166 MB), gz(235.1 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/2944
Dataset updated
Nov 4, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
UNITED KINGDOM
Description
Summary statistics from the genome-wide association study (GWAS) of cornea resistance factor (CRF) of 76,029 UK Biobank samples of British ancestry obtained through the application number 19655. The CRF analysis was performed on the average of the left and right eye measurements. Any outliers (i.e. CRF greater than population mean difference + 3 standard deviations) were removed. Further phenotypic filtering was applied by removing samples linked to, or self-reporting, ocular conditions that could affect the measurements accuracy such as eye surgery, refractive laser surgery, cataract surgery, glaucoma high pressure surgery or laser treatment, corneal graft surgery, eye injury , keratoconus or cornea disorders. A total of 102490 samples were kept. Using the genetic quality control of the UK Biobank the 76318 samples of British ancestry with imputed data failing heterozygosity or/and missingness, or having a mismatch between self-reported and genotype-derived gender or showing putative sex chromosome aneuploidy as well as individuals who have withdrawn from the study at the time of analysis were removed. An total of N-76029 were included in the GWAS. The GWAS was performed on common and low-frequency (MAF > 0.5%) well imputed (INFO > 0.6) variants using a linear mixed model accounting for population structure and (cryptic) relatedness, implemented in the software BOLT_LMM v1.3. Covariates fitted in the model were: age, sex, assessment centre, genotyping array, genotyping batch and the 20 first principal components of ancestry provided by the UK Biobank.
n
Data from: Variable prediction accuracy of polygenic scores within an...
data.niaid.nih.gov
datadryad.org
zip
Updated Feb 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hakhamanesh Mostafavi; Arbel Harpak; Ipsita Agarwal; Dalton Conley; Jonathan Pritchard; Molly Przeworski (2020). Variable prediction accuracy of polygenic scores within an ancestry group [Dataset]. http://doi.org/10.5061/dryad.66t1g1jxs
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.66t1g1jxs
Dataset updated
Feb 24, 2020
Dataset provided by
Stanford University
Princeton University
Columbia University
Authors
Hakhamanesh Mostafavi; Arbel Harpak; Ipsita Agarwal; Dalton Conley; Jonathan Pritchard; Molly Przeworski
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.
f
Accuracy of Electronic Health Record Data for Identifying Stroke Cases in...
plos.figshare.com
docx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rebecca Woodfield; Ian Grant; Cathie L. M. Sudlow (2023). Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group [Dataset]. http://doi.org/10.1371/journal.pone.0140533
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0140533
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Rebecca Woodfield; Ian Grant; Cathie L. M. Sudlow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveLong-term follow-up of population-based prospective studies is often achieved through linkages to coded regional or national health care data. Our knowledge of the accuracy of such data is incomplete. To inform methods for identifying stroke cases in UK Biobank (a prospective study of 503,000 UK adults recruited in middle-age), we systematically evaluated the accuracy of these data for stroke and its main pathological types (ischaemic stroke, intracerebral haemorrhage, subarachnoid haemorrhage), determining the optimum codes for case identification.MethodsWe sought studies published from 1990-November 2013, which compared coded data from death certificates, hospital admissions or primary care with a reference standard for stroke or its pathological types. We extracted information on a range of study characteristics and assessed study quality with the Quality Assessment of Diagnostic Studies tool (QUADAS-2). To assess accuracy, we extracted data on positive predictive values (PPV) and—where available—on sensitivity, specificity, and negative predictive values (NPV).Results37 of 39 eligible studies assessed accuracy of International Classification of Diseases (ICD)-coded hospital or death certificate data. They varied widely in their settings, methods, reporting, quality, and in the choice and accuracy of codes. Although PPVs for stroke and its pathological types ranged from 6–97%, appropriately selected, stroke-specific codes (rather than broad cerebrovascular codes) consistently produced PPVs >70%, and in several studies >90%. The few studies with data on sensitivity, specificity and NPV showed higher sensitivity of hospital versus death certificate data for stroke, with specificity and NPV consistently >96%. Few studies assessed either primary care data or combinations of data sources.ConclusionsParticular stroke-specific codes can yield high PPVs (>90%) for stroke/stroke types. Inclusion of primary care data and combining data sources should improve accuracy in large epidemiological studies, but there is limited published information about these strategies.
f
Aggregated UK Biobank clinical assessments and neuroimaging biomarkers.
figshare.com
xls
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simeone Marino; Yi Zhao; Nina Zhou; Yiwang Zhou; Arthur W. Toga; Lu Zhao; Yingsi Jian; Yichen Yang; Yehu Chen; Qiucheng Wu; Jessica Wild; Brandon Cummings; Ivo D. Dinov (2023). Aggregated UK Biobank clinical assessments and neuroimaging biomarkers. [Dataset]. http://doi.org/10.1371/journal.pone.0228520.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0228520.t003
Dataset updated
Jun 14, 2023
Dataset provided by
PLOS ONE
Authors
Simeone Marino; Yi Zhao; Nina Zhou; Yiwang Zhou; Arthur W. Toga; Lu Zhao; Yingsi Jian; Yichen Yang; Yehu Chen; Qiucheng Wu; Jessica Wild; Brandon Cummings; Ivo D. Dinov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Aggregated UK Biobank clinical assessments and neuroimaging biomarkers.
Sharing Genomic Data Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Sharing Genomic Data Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-sharing-genomic-data-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Sharing Genomic Data Market Outlook

The global sharing genomic data market size was valued at $5.2 billion in 2023 and is projected to reach $15.7 billion by 2032, growing at a compound annual growth rate (CAGR) of 13.2% during the forecast period. The surge in market size is driven by advancements in genomic research, widespread adoption of precision medicine, and increasing governmental and private sector investments in genomics.

One of the primary growth factors in the sharing genomic data market is the rapid advancement in genomic technologies. The cost of sequencing an entire genome has plummeted over the past decade, making it more accessible for researchers and healthcare providers. This democratization of genomic data has catalyzed numerous projects aimed at understanding genetic disorders, optimizing drug development, and personalizing medical treatments. Additionally, the development of robust bioinformatics tools for the analysis and interpretation of vast genomic datasets has further propelled the market forward.

Another significant growth factor is the increasing emphasis on precision medicine. Precision medicine aims to tailor medical treatment to the individual characteristics of each patient, and genomic data is a critical component in this approach. By understanding the genetic makeup of patients, healthcare providers can prescribe more effective treatments and interventions. Furthermore, governments and private institutions around the world are heavily investing in initiatives that support genomic research and data sharing, thereby boosting market growth. For instance, the National Institutes of Health (NIH) in the United States and the UK Biobank are exemplary projects that highlight the importance of genomic research.

The integration of artificial intelligence (AI) and machine learning (ML) with genomic data sharing platforms is another driving force. AI and ML algorithms are increasingly being used to identify patterns and correlations in genomic data that would be impossible for humans to discern. These technologies are enhancing the speed and accuracy of genomic data analysis, leading to quicker insights and more effective treatments. Furthermore, collaborations between tech companies and genomic research institutions are accelerating innovations in this field. These collaborations foster an ecosystem that supports rapid technological advancements and the efficient sharing of genomic data.

Regionally, North America holds the largest share in the sharing genomic data market, driven by the presence of leading genomic research institutions, substantial funding from government and private sectors, and favorable regulatory frameworks. Europe follows closely, with significant contributions from countries like the UK, Germany, and France. Meanwhile, the Asia Pacific region is expected to witness the highest growth rate due to increasing investments in genomic research, growing healthcare infrastructure, and the rise of biotech startups. Latin America and the Middle East & Africa are also emerging markets, showing potential for substantial growth driven by healthcare reforms and investments in genomic research initiatives.

Data Type Analysis

The data type segment of the sharing genomic data market is categorized into whole genome sequencing, exome sequencing, and targeted sequencing. Whole genome sequencing (WGS) is the most comprehensive form of sequencing, providing a complete picture of an individual's genetic makeup. WGS is increasingly being adopted in various research projects and clinical settings due to its thoroughness and the declining costs associated with the technology. This method encompasses all coding and non-coding regions of the genome, offering invaluable insights into complex genetic disorders, cancer genomics, and population genetics.

Exome sequencing, which focuses on sequencing only the coding regions of the genome (or exons), is another crucial component of this market segment. Exome sequencing is less costly compared to WGS and is highly effective in identifying mutations that cause diseases. This method is particularly popular in clinical diagnostics and personalized medicine, where quick and accurate detection of genetic anomalies is imperative. Exome sequencing is also widely used in research applications, where the focus is on understanding the functional aspects of genes.

Targeted sequencing involves sequencing specific regions of the genome that are of interest. This approach is highly efficient and cost-effective, making it an attractive option for both research and c
d
BayesW time-to-event analysis posterior outputs and summary statistics
search.dataone.org
datadryad.org
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven Erik Ojavee; Matthew Robinson (2025). BayesW time-to-event analysis posterior outputs and summary statistics [Dataset]. http://doi.org/10.5061/dryad.qbzkh18gp
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.qbzkh18gp
Dataset updated
Apr 27, 2025
Dataset provided by
Dryad Digital Repository
Authors
Sven Erik Ojavee; Matthew Robinson
Time period covered
Jan 1, 2021
Description
Here, we develop a Bayesian approach (BayesW) that provides probabilistic inference of the genetic architecture of age-at-onset phenotypes in a hybrid-parallel sampling scheme that facilitates Bayesian time-to-event large-scale biobank analyses. We show in extensive simulation work that BayesW achieves a greater number of discoveries, better model performance and improved genomic prediction as compared to other approaches. In the UK Biobank, we find many thousands of common genomic regions underlying the age-at-onset of high blood pressure (HBP), cardiac disease (CAD), and type-2 diabetes (T2D), and for the genetic basis of onset reflecting the underlying genetic liability to disease. Age-at-menopause and age-at-menarche are also highly polygenic, but with higher variance contributed by low-frequency variants. Genomic prediction into the Estonian Biobank data shows that BayesW gives higher prediction accuracy than other approaches.
Biobanking Market Analysis, Size, and Forecast 2024-2028: North America (US...
technavio.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio, Biobanking Market Analysis, Size, and Forecast 2024-2028: North America (US and Canada), Europe (France, Germany, Italy, and UK), Middle East and Africa (Egypt, KSA, Oman, and UAE), APAC (China, India, and Japan), South America (Argentina and Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/biobanking-market-industry-analysis
Explore at:
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Canada, Germany, Saudi Arabia, United States, Global
Description
Snapshot img

Biobanking Market Size 2024-2028

The biobanking market size is forecast to increase by USD 1.67 billion, at a CAGR of 9.04% between 2023 and 2028.

The market is experiencing significant growth, driven by the increasing demand for regenerative medicine. This trend is fueled by advancements in genetic research and the potential for customized treatment plans based on individual genetic profiles. Another key driver is the emergence of stem cell storage in biobanks and biopreservation, offering new opportunities for medical research and therapeutic applications. However, this market also faces challenges. Ethical issues surrounding the collection, storage, and use of biological samples remain a significant obstacle. Ensuring informed consent, privacy protection, and adherence to regulatory guidelines are essential for maintaining public trust and avoiding potential legal disputes. Companies seeking to capitalize on market opportunities must navigate these challenges effectively, while also staying abreast of technological advancements and evolving customer needs. Success in the market requires a strong commitment to ethical practices, innovative solutions, and strategic partnerships.

What will be the Size of the Biobanking Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free Sample

The market continues to evolve, driven by advancements in data management, sample collection, and research applications. Biobanks are increasingly integrating LIMS systems for efficient sample accessibility and inventory management. Forensic samples and microbial samples join the ranks of clinical and research specimens in biobanking, expanding its scope. Data analytics plays a crucial role in drug discovery and precision medicine, necessitating robust data security and access control. Ethical considerations, informed consent, and biobanking ethics remain paramount, shaping the industry's growth. Cell lines and audit trails are essential components of biobanking, ensuring transparency and traceability. Biobanking software facilitates sample availability and public health research, while temperature monitoring, humidity control, and predictive modeling optimize sample storage and processing.

Biobank networks collaborate to share resources and expertise, fostering advancements in therapeutic development, biomarker discovery, and disease research. Intellectual property rights and metadata standards ensure data integrity and enable data sharing. Short-term and long-term storage solutions, including dry ice, liquid nitrogen, and cryogenic freezers, cater to various sample preservation requirements. Automated liquid handling and temperature monitoring systems streamline sample processing and enhance quality control. Biobanking's continuous dynamism is reflected in its applications across sectors, from clinical trials to public health, and its role in advancing research and therapeutic development.

How is this Biobanking Industry segmented?

The biobanking industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Type Physical Virtual Product Equipment Consumables End-User Pharmaceutical & Biotechnology Companies Academic & Research Institutions Hospitals Contract Research Organizations (CROs) Application Regenerative Medicine Life Science Research Clinical Research Drug Discovery & Development Personalized Medicine Sample Type Blood Products Human Tissues Cell Lines Nucleic Acids Biological Fluids Human Waste Products Biobank Type Population-Based Biobanks Disease-Based Biobanks Virtual Biobanks Tissue Biobanks Genetic Biobanks Geography North America US Canada Europe France Germany Italy UK Middle East and Africa Egypt KSA Oman UAE APAC China India Japan South America Argentina Brazil Rest of World (ROW)

By Type Insights

The physical segment is estimated to witness significant growth during the forecast period.

Biobanks, as repositories for biological samples including human tissues, cells, blood, DNA, and other biomolecules, play a crucial role in research and medical applications. The physical segment of the market encompasses various types of biobanks, categorized by the nature of the samples. These include tissue biobanks, cell biobanks, and blood biobanks. The increasing emphasis on personalized medicine, which customizes treatments based on individual patients' genetic makeup and biomarkers, drives the demand for high-quality biological samples. Data management is
SNP Genotyping and Analysis Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Sep 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). SNP Genotyping and Analysis Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-snp-genotyping-and-analysis-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Sep 23, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
SNP Genotyping and Analysis Market Outlook

The global SNP genotyping and analysis market size was valued at approximately USD 4.5 billion in 2023 and is projected to reach around USD 12.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.8% during the forecast period. This significant growth is primarily driven by the increasing application of SNP genotyping in personalized medicine, advancements in genomic technologies, and the rise in government funding and research initiatives.

One of the primary growth factors for the SNP genotyping and analysis market is the burgeoning field of personalized medicine. Personalized medicine relies heavily on identifying genetic markers, such as Single Nucleotide Polymorphisms (SNPs), to tailor treatments to individual patients' genetic profiles. This approach not only enhances the efficacy of treatments but also minimizes adverse effects, making it a critical area of medical research and application. The increasing prevalence of chronic diseases such as cancer, diabetes, and cardiovascular conditions further underscores the need for personalized medicine, thus driving the demand for SNP genotyping and analysis.

Technological advancements in genomic sequencing and analysis are also playing a significant role in propelling the market. Innovations in next-generation sequencing (NGS), polymerase chain reaction (PCR), and microarray technologies have made SNP genotyping more accurate, faster, and cost-effective. These advancements have broadened the scope of SNP applications, enabling large-scale genomic studies, enhancing diagnostic capabilities, and facilitating drug development processes. The integration of artificial intelligence (AI) and machine learning (ML) in genomic data analysis is further expected to improve the accuracy and efficiency of SNP genotyping, fueling market growth.

Government funding and research initiatives are another critical factor contributing to market growth. Governments worldwide are increasingly investing in genomics research to understand the genetic basis of diseases better and develop targeted therapies. Various public and private sector collaborations, along with substantial funding, are being directed towards genomic research projects. For instance, initiatives such as the UK Biobank and the All of Us Research Program in the United States are aimed at collecting and analyzing genetic data from diverse populations to advance our understanding of genetic influences on health and disease, thereby boosting the demand for SNP genotyping and analysis.

Regionally, North America holds a dominant position in the SNP genotyping and analysis market, followed by Europe and the Asia Pacific. The significant presence of key market players, high adoption of advanced genomic technologies, and substantial government funding for genomics research are some of the factors driving the market in North America. The Asia Pacific region is expected to exhibit the highest growth rate during the forecast period, attributed to increasing healthcare investments, growing research activities in genomics, and rising awareness about personalized medicine in countries such as China and India.

Technology Analysis

The technology segment of the SNP genotyping and analysis market includes microarray, polymerase chain reaction (PCR), sequencing, and others. Microarray technology is one of the most established and widely used methods for SNP genotyping. It allows for the simultaneous analysis of thousands of SNPs, making it highly suitable for large-scale genetic studies. The technology has seen significant improvements over the years, leading to enhanced data accuracy and lower costs. These advancements have broadened the application of microarray technology in various fields such as pharmacogenomics, diagnostic research, and agricultural biotechnology, driving its demand in the market.

Polymerase chain reaction (PCR) technology continues to be a cornerstone in the SNP genotyping sector. PCR's ability to amplify specific DNA sequences makes it indispensable for identifying and analyzing SNPs. The development of real-time PCR and digital PCR has further augmented its application, offering higher precision and quantitative capabilities. This technology is extensively used in clinical diagnostics, drug development, and biological research, contributing to its sustained market presence. The increasing focus on precision medicine and the rising need for rapid and accurate genetic testing are expected to drive the growth of the PCR segment.

Sequencing technologies, par
f
Subsampling specifications for different Big Data sizes.
plos.figshare.com
xls
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simeone Marino; Yi Zhao; Nina Zhou; Yiwang Zhou; Arthur W. Toga; Lu Zhao; Yingsi Jian; Yichen Yang; Yehu Chen; Qiucheng Wu; Jessica Wild; Brandon Cummings; Ivo D. Dinov (2023). Subsampling specifications for different Big Data sizes. [Dataset]. http://doi.org/10.1371/journal.pone.0228520.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0228520.t001
Dataset updated
Jun 11, 2023
Dataset provided by
PLOS ONE
Authors
Simeone Marino; Yi Zhao; Nina Zhou; Yiwang Zhou; Arthur W. Toga; Lu Zhao; Yingsi Jian; Yichen Yang; Yehu Chen; Qiucheng Wu; Jessica Wild; Brandon Cummings; Ivo D. Dinov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The total number of subsamples M = 5,000.
c
Whole Exome Sequencing market size was $1.40 Billion in 2022!
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2023). Whole Exome Sequencing market size was $1.40 Billion in 2022! [Dataset]. https://www.cognitivemarketresearch.com/whole-exome-sequencing-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jan 1, 2023
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
As per Cognitive Market Research's latest published report, the Global Whole Exome Sequencing market size was $1.40 Billion in 2022 and it is forecasted to reach $4.71 Billion by 2030. Whole Exome Sequencing Industry's Compound Annual Growth Rate will be 18.74% from 2023 to 2030. Market Dynamics of Whole Exome Sequencing Market

Key Drivers for Whole Exome Sequencing Market

The adoption of precision medicine is on the rise: Whole Exome Sequencing (WES) is increasingly recognized as the gold standard for diagnosing rare genetic disorders, boasting a diagnostic yield that is 30% higher than that of targeted panels. More than 50% of top hospitals now provide clinical WES, propelled by its capability to analyze over 20,000 genes in a single test.

Cost reductions have been significant: Advances in technology have decreased the cost of WES from $5,000 in 2010 to less than $500 today. Automation and AI-enhanced variant interpretation have shortened turnaround times from weeks to days, thereby making it feasible for routine diagnostic use.

In the realm of research and drug development: Pharmaceutical companies are utilizing WES to pinpoint biomarkers and therapeutic targets, with 40% of oncology trials now integrating WES data. Large-scale population studies, such as the UK Biobank, depend on WES for genetic insights.

Key Restraints for Whole Exome Sequencing Market

Challenges in data interpretation: Between 30% and 40% of WES results necessitate manual review due to ambiguous variants. The absence of standardized guidelines contributes to a 20% variability in clinical reporting among laboratories.

Ethical and privacy issues are also a concern: The management of terabytes of sensitive genetic information poses risks related to GDPR and HIPAA compliance. Approximately 15% of patients opt out of WES due to apprehensions regarding genetic discrimination or the misuse of data.

Limited reimbursement policies present another obstacle: Half of insurers continue to categorize WES as "investigational" for numerous conditions. Reimbursement delays ranging from 6 to 12 months impede the broader clinical adoption of this technology.

Key Trends for Whole Exome Sequencing Market

AI-driven variant calling is enhancing accuracy: Machine learning models have improved accuracy by 25%, thereby reducing the incidence of false positives and negatives. Cloud-based platforms facilitate real-time collaboration among experts worldwide.

The integration of long-read WES is also noteworthy: The combination of short- and long-read technologies allows for the resolution of 15% more structural variants, thereby improving the detection of complex mutations.

The expansion of direct-to-consumer (DTC) services is notable: Companies are now providing physician-mediated WES for $299, focusing on wellness and ancestry markets. Emerging blockchain solutions are being developed to ensure secure ownership of genomic data. What is Whole Exome Sequencing?

Whole-exome sequencing is a common next-generation sequencing (NGS) technique that entails sequencing the genome's protein-coding regions. This technique offers a less expensive option to whole-genome sequencing because the human exome, which makes up less than 2% of the genome, contains 85% of known disease-related variations. Exome sequencing using exome enrichment can proficiently find coding changes in a wide range of applications, including, genetic disease, population genetics and cancer research.
f
Library of the 55 different classification and regression machine-learning...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simeone Marino; Yi Zhao; Nina Zhou; Yiwang Zhou; Arthur W. Toga; Lu Zhao; Yingsi Jian; Yichen Yang; Yehu Chen; Qiucheng Wu; Jessica Wild; Brandon Cummings; Ivo D. Dinov (2023). Library of the 55 different classification and regression machine-learning algorithms used by the ensemble predictor SuperLearner (SL.library) in the CBDA 2.0 implementation. [Dataset]. http://doi.org/10.1371/journal.pone.0228520.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0228520.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Simeone Marino; Yi Zhao; Nina Zhou; Yiwang Zhou; Arthur W. Toga; Lu Zhao; Yingsi Jian; Yichen Yang; Yehu Chen; Qiucheng Wu; Jessica Wild; Brandon Cummings; Ivo D. Dinov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Library of the 55 different classification and regression machine-learning algorithms used by the ensemble predictor SuperLearner (SL.library) in the CBDA 2.0 implementation.
f
Automated localization and quality control of the aorta in cine CMR can...
figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Biasiolli; Evan Hann; Elena Lukaschuk; Valentina Carapella; Jose M. Paiva; Nay Aung; Jennifer J. Rayner; Konrad Werys; Kenneth Fung; Henrike Puchta; Mihir M. Sanghvi; Niall O. Moon; Ross J. Thomson; Katharine E. Thomas; Matthew D. Robson; Vicente Grau; Steffen E. Petersen; Stefan Neubauer; Stefan K. Piechnik (2023). Automated localization and quality control of the aorta in cine CMR can significantly accelerate processing of the UK Biobank population data [Dataset]. http://doi.org/10.1371/journal.pone.0212272
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0212272
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Luca Biasiolli; Evan Hann; Elena Lukaschuk; Valentina Carapella; Jose M. Paiva; Nay Aung; Jennifer J. Rayner; Konrad Werys; Kenneth Fung; Henrike Puchta; Mihir M. Sanghvi; Niall O. Moon; Ross J. Thomson; Katharine E. Thomas; Matthew D. Robson; Vicente Grau; Steffen E. Petersen; Stefan Neubauer; Stefan K. Piechnik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionAortic distensibility can be calculated using semi-automated methods to segment the aortic lumen on cine CMR (Cardiovascular Magnetic Resonance) images. However, these methods require visual quality control and manual localization of the region of interest (ROI) of ascending (AA) and proximal descending (PDA) aorta, which limit the analysis in large-scale population-based studies. Using 5100 scans from UK Biobank, this study sought to develop and validate a fully automated method to 1) detect and locate the ROIs of AA and PDA, and 2) provide a quality control mechanism.MethodsThe automated AA and PDA detection-localization algorithm followed these steps: 1) foreground segmentation; 2) detection of candidate ROIs by Circular Hough Transform (CHT); 3) spatial, histogram and shape feature extraction for candidate ROIs; 4) AA and PDA detection using Random Forest (RF); 5) quality control based on RF detection probability. To provide the ground truth, overall image quality (IQ = 0–3 from poor to good) and aortic locations were visually assessed by 13 observers. The automated algorithm was trained on 1200 scans and Dice Similarity Coefficient (DSC) was used to calculate the agreement between ground truth and automatically detected ROIs.ResultsThe automated algorithm was tested on 3900 scans. Detection accuracy was 99.4% for AA and 99.8% for PDA. Aorta localization showed excellent agreement with the ground truth, with DSC ≥ 0.9 in 94.8% of AA (DSC = 0.97 ± 0.04) and 99.5% of PDA cases (DSC = 0.98 ± 0.03). AA×PDA detection probabilities could discriminate scans with IQ ≥ 1 from those severely corrupted by artefacts (AUC = 90.6%). If scans with detection probability < 0.75 were excluded (350 scans), the algorithm was able to correctly detect and localize AA and PDA in all the remaining 3550 scans (100% accuracy).ConclusionThe proposed method for automated AA and PDA localization was extremely accurate and the automatically derived detection probabilities provided a robust mechanism to detect low quality scans for further human review. Applying the proposed localization and quality control techniques promises at least a ten-fold reduction in human involvement without sacrificing any accuracy.
Supplementary-Data-3 (Article title: Rapid and accurate multi-phenotype...
figshare.com
zip
Updated Oct 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linlin Gu (2024). Supplementary-Data-3 (Article title: Rapid and accurate multi-phenotype imputation for millions of individuals) [Dataset]. http://doi.org/10.6084/m9.figshare.27134196.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27134196.v3
Dataset updated
Oct 4, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Linlin Gu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep phenotyping can enhance the power of genetic analysis such as genome-wide association study (GWAS), but recurrence of missing phenotypes compromises the potentials of such resources. Although many phenotypic imputation methods have been developed, accurate imputation for millions of individuals still remains extremely challenging. In the present study, leveraging efficient machine learning (ML)-based algorithms, we developed a novel multi-phenotype imputation method based on mixed fast random forest (PIXANT), which is several orders of magnitude in runtime and computer memory usage than the state-of-the-art methods when applied to the UK Biobank (UKB) data and scalable to cohorts with millions of individuals. Our simulations with hundreds of individuals showed that PIXANT was superior to or comparable to the most advanced methods available in terms of accuracy. We also applied PIXANT to impute 425 phenotypes for the UKB data of 277,301 unrelated white British citizens and performed GWAS on imputed phenotypes, and identified a 15.6% more GWAS loci than before imputation (8,710 vs 7,355). Due to the increased statistical power of GWAS, a certain proportion of novel genes were rediscovered, such as RNF220, SCN10A and RGS6 that affect heart rate, demonstrating the use of imputed phenotype data in a large cohort to discover novel genes for complex traits.
f
Supplementary Material for: Association Between Sleep Factors and...
karger.figshare.com
docx
Updated Jun 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul C. Stoy; Tristan Quaife; Kai Ueltzhöffer; Diana J. N. Armbruster-Genç; Peter C. Dumoulin; Stefanie A. Trop; Matthew A. Sherman (2023). Supplementary Material for: Association Between Sleep Factors and Parkinson’s Disease: A Prospective Study Based On 409,923 UK Biobank Participants [Dataset]. http://doi.org/10.6084/m9.figshare.22820597.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22820597.v1
Dataset updated
Jun 26, 2023
Dataset provided by
Karger Publishers
Authors
Paul C. Stoy; Tristan Quaife; Kai Ueltzhöffer; Diana J. N. Armbruster-Genç; Peter C. Dumoulin; Stefanie A. Trop; Matthew A. Sherman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction: Limited evidence indicates an association between sleep factors and the risk of Parkinson’s disease (PD). However, large prospective cohort studies including both sexes are needed to verify the association between daytime sleepiness, sleep duration, and PD risk. Furthermore, other sleep factors like chronotype and snoring and their impact on increased PD risk should be explored by simultaneously considering daytime sleepiness and snoring. Methods: This study included 409,923 participants from the UK Biobank. Data on five sleep factors (chronotype, sleep duration, sleeplessness/insomnia, snoring, and daytime sleepiness) were collected using a standard self-administered questionnaire. PD occurrence was identified using linkages with primary care, hospital admission, death register, or self-report. Cox proportional hazard models were used to investigate the association between sleep factors and PD risk. Subgroup (age and sex) and sensitivity analyses were performed. Results: During a median follow-up of 11.89 years, 2158 incident PD cases were documented. The main association analysis showed that prolonged sleep duration (hazard ratio [HR]: 1.20, 95% confidence interval [CI]: 1.05, 1.37) and occasional daytime sleepiness (HR: 1.15, 95%CI: 1.04, 1.26) increased the PD risk. Compared to those who self-reported never or rarely having sleeplessness/insomnia, participants who reported usually having sleeplessness/insomnia had a decreased risk of PD (HR: 0.85, 95%CI: 0.75, 0.96). Subgroup analysis revealed that women who self-reported no snoring had a decreased PD risk (HR: 0.84; 95%CI: 0.72, 0.99). Sensitivity analyses indicated that the robustness of the results was affected by potential reverse causation and data completeness. Conclusion: Long sleep duration increased the PD risk, especially among men and participants ≥60 years, while snoring increased the risk of PD in women. Additional studies are needed to i) further consider other sleep traits (e.g., rapid eye movement sleep behaviour disorder and sleep apnoea) that might be related to PD, ii) objectively measure sleep-related exposure, and iii) confirm the effects of snoring on PD risk by considering the impact of obstructive sleep apnoea and investigating its underlying mechanisms.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kristiina Rannikmae; Kristiina Rannikmae (2024). Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank [Dataset]. http://doi.org/10.5061/dryad.w9ghx3fk0

Data from: Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

pdfAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.w9ghx3fk0

Dataset updated

Jul 19, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Kristiina Rannikmae; Kristiina Rannikmae

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Objective: In UK Biobank (UKB), a large population-based prospective study, cases of many diseases are ascertained through linkage to routinely collected, coded national health datasets. We assessed the accuracy of these for identifying incident strokes.

Methods: In a regional UKB sub-population (n=17,249), we identified all participants with ≥1 code signifying a first stroke after recruitment (incident stroke-coded cases) in linked hospital admission, primary care or death record data. Stroke physicians reviewed their full electronic patient records (EPRs) and generated reference standard diagnoses. We evaluated the number and proportion of cases that were true positives (i.e. positive predictive value, PPV) for all codes combined and by code source and type.

Results: Of 232 incident stroke-coded cases, 97% had EPR information available. Data sources were: 30% hospital admission only; 39% primary care only; 28% hospital and primary care; 3% death records only. While 42% of cases were coded as unspecified stroke type, review of EPRs enabled a pathological type to be assigned in >99%. PPVs (95% confidence intervals) were: 79% (73%-84%) for any stroke (89% for hospital admission codes, 80% for primary care codes) and 83% (74%-90%) for ischemic stroke. PPVs for small numbers of death record and hemorrhagic stroke codes were low but imprecise.

Conclusions: Stroke and ischemic stroke cases in UKB can be ascertained through linked health datasets with sufficient accuracy for many research studies. Further work is needed to understand the accuracy of death record and hemorrhagic stroke codes and to develop scalable approaches for better identifying stroke types.

Clear search

Close search

Google apps

Main menu

Data from: Accuracy of identifying incident stroke cases from linked...

Table_2_Physician-Confirmed and Administrative Definitions of Stroke in UK...

Data from: Improving genome-wide association discovery and genomic...

Table_1_Physician-Confirmed and Administrative Definitions of Stroke in UK...

A scalable, accurate, and universal analysis framework to control for sample...

Data_Sheet_1_Physician-Confirmed and Administrative Definitions of Stroke in...

Cornea resistance factor GWAS in the UK Biobank

Data from: Variable prediction accuracy of polygenic scores within an...

Accuracy of Electronic Health Record Data for Identifying Stroke Cases in...

Aggregated UK Biobank clinical assessments and neuroimaging biomarkers.

Sharing Genomic Data Market Report | Global Forecast From 2025 To 2033

Sharing Genomic Data Market Outlook

Data Type Analysis

BayesW time-to-event analysis posterior outputs and summary statistics

Biobanking Market Analysis, Size, and Forecast 2024-2028: North America (US...

Snapshot img

SNP Genotyping and Analysis Market Report | Global Forecast From 2025 To...

SNP Genotyping and Analysis Market Outlook

Technology Analysis

Subsampling specifications for different Big Data sizes.

Whole Exome Sequencing market size was $1.40 Billion in 2022!

Library of the 55 different classification and regression machine-learning...

Automated localization and quality control of the aorta in cine CMR can...

Supplementary-Data-3 (Article title: Rapid and accurate multi-phenotype...

Supplementary Material for: Association Between Sleep Factors and...

Data from: Accuracy of identifying incident stroke cases from linked healthcare data in UK Biobank