Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains of the information on the mqtls of smoking-related methylation and is used to perform the G-E interaction analysis (for CD).
The objective of UK Biobank is to create a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants, which will contribute to the advancement of modern medicine, treatment and scientific discoveries that improve human health.
Lifestyle and environmental information, medical history, physical measurements, and biological samples are being collected from about 500,000 people aged 40-69 at presentation and then, with consent, their health will be followed for many years through medical and other health related records. The biological samples are stored so that they can be used for a wide range of biochemical and genetic analyses in the future.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains results of two genome-wide association studies for age-related hearing impairment (ARHI)-related traits as described in the following publication
Wells HRR, Freidin MB, Zainul Abidin FN, Payton A, Dawes P, Munro KJ, Morton CC, Moore DR, Dawson SJ, Williams FMK. GWAS Identifies 44 Independent Associated Genomic Loci for Self-Reported Adult Hearing Difficulty in UK Biobank. Am J Hum Genet. 2019 Oct 3;105(4):788-802. doi: 10.1016/j.ajhg.2019.09.008. Epub 2019 Sep 26.
Please cite the article if using this dataset.
Two files provide summary statistics for discovery analysis of Hearing difficulty (HD) and Hearing aid use (HAID) phenotypes for individuals of European descent from UK Biobank.
Acknowledgements
The research was carried out using the UK Biobank Resource under application number 11516. H.R.R.W. is funded by a PhD Studentship Grant, S44, from Action on Hearing Loss. The study was also supported by funding from NIHR UCLH BRC Deafness and Hearing Problems Theme, a grant from MED_EL, and the NIHR Manchester Biomedical Research Centre. The English Longitudinal Study of Aging is jointly run by University College London, Institute for Fiscal Studies, University of Manchester, and National Centre for Social Research. Genetic analyses have been carried out by UCL Genomics and funded by the Economic and Social Research Council and the National Institute on Aging. Data governance was provided by the METADAC data access committee, funded by ESRC, Wellcome, and MRC (2015-2018: Grant Number MR/N01104X/1 2018-2020: Grant Number ES/S008349/1). TwinsUK is funded by the Wellcome Trust, Medical Research Council, European Union, the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility, and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London. We would like to thank all the participants of UK Biobank, English Longitudinal Study of Aging, and TwinsUK.
Column headers:
SNP, SNP rsID
CHR, chromosome
BP, genomic position (GRCh37 build)
ALLELE1, effect allele (coded as "1")
ALLELE0, reference allele (coded as "0")
A1FREQ, effect allele frequency
INFO, imputation quality
BETA, effect size of effect allele
SE: standard error of effect size
P, P-value of association (without GC correction)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository stores synthetic datasets derived from the database of the UK Biobank (UKB) cohort.
The datasets were generated for illustrative purposes, in particular for reproducing specific analyses on the health risks associated with long-term exposure to air pollution using the UKB cohort. The code used to create the synthetic datasets is available and documented in a related GitHub repo, with details provided in the section below. These datasets can be freely used for code testing and for illustrating other examples of analyses on the UKB cohort.
Note: while the synthetic versions of the datasets resemble the real ones in several aspects, the users should be aware that these data are fake and must not be used for testing and making inferences on specific research hypotheses. Even more importantly, these data cannot be considered a reliable description of the original UKB data, and they must not be presented as such.
The original datasets are described in the article by Vanoli et al in Epidemiology (2024) (DOI: 10.1097/EDE.0000000000001796) [freely available here], which also provides information about the data sources.
The work was supported by the Medical Research Council-UK (Grant ID: MR/Y003330/1).
The series of synthetic datasets (stored in two versions with csv and RDS formats) are the following:
In addition, this repository provides these additional files:
The datasets resemble the real data used in the analysis, and they were generated using the R package synthpop (www.synthpop.org.uk). The generation process involves two steps, namely the synthesis of the main data (cohort info, baseline variables, annual PM2.5 exposure) and then the sampling of death events. The R scripts for performing the data synthesis are provided in the GitHub repo (subfolder Rcode/synthcode).
The first part merges all the data including the annual PM2.5 levels in a single wide-format dataset (with a row for each subject), generates a synthetic version, adds fake IDs, and then extracts (and reshapes) the single datasets. In the second part, a Cox proportional hazard model is fitted on the original data to estimate risks associated with various predictors (including the main exposure represented by PM2.5), and then these relationships are used to simulate death events in each year. Details on the modelling aspects are provided in the article.
This process guarantees that the synthetic data do not hold specific information about the original records, thus preserving confidentiality. At the same time, the multivariate distribution and correlation across variables as well as the mortality risks resemble those of the original data, so the results of descriptive and inferential analyses are similar to those in the original assessments. However, as noted above, the data are used only for illustrative purposes, and they must not be used to test other research hypotheses.
https://twinsuk.ac.uk/resources-for-researchers/access-our-data/https://twinsuk.ac.uk/resources-for-researchers/access-our-data/
The TwinsUK cohort (https://twinsuk.ac.uk/), set up in 1992, is a major volunteer-based genomic epidemiology resource with longitudinal deep genomic and phenomics data from over 15,000 adult twins (18+) from across the UK who are highly engaged and recallable. The cohort is predominantly female (80%) for historical reasons. It is one of the most deeply characterised adult twin cohort in the world, providing a rich platform for scientists to research health and ageing longitudinally. There are over 700,000 biological samples stored and data collected on twins with repeat measures at multiple timepoints. Extremely large datasets (billions of data points) have been generated for each TwinsUK participant over 30 years, including phenotypes from questionnaires, multiple clinical visits, and record linkage, and genetic and ‘omic data from biological samples. TwinsUK ensures derived datasets from raw data are returned by collaborators to enhance the resource. TwinsUK also holds a wide range of laboratory samples, including plasma, serum, DNA, faecal microbiome and tissue (skin, fat, colonic biopsies) within HTA-regulated facilities at King's College London.
More recently, postal and at-home collection strategies have allowed sample collections from frail twins, our whole cohort for COVID-19 studies, and for new twin recruits. The cohort is recallable either on a four-year longitudinal sweep visit or, based on diagnosis or genotype.
More than 1,000 data access collaborations and 250,000 samples have been shared with external researchers, resulting in over 800 publications since 2012.
TwinsUK is now working to link to twins’ official health, education and environmental records for health research purposes, which will further enhance the resource, education and environmental records for health research purposes, which will further enhance the resource.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
Brain ageing is a highly variable, spatially and temporally heterogeneous process, marked by numerous structural and functional changes. These can cause discrepancies between individuals’ chronological age and the apparent age of their brain, as inferred from neuroimaging data. Machine learning models, and particularly Convolutional Neural Networks (CNNs), have proven adept in capturing patterns relating to ageing induced changes in the brain. The differences between the predicted and chronological ages, referred to as brain age deltas, have emerged as useful biomarkers for exploring those factors which promote accelerated ageing or resilience, such as pathologies or lifestyle factors. However, previous studies rely only on structural neuroimaging for predictions, overlooking potentially informative functional and microstructural changes. Here we show that multiple contrasts derived from different MRI modalities can predict brain age, each encoding bespoke brain ageing information. By using 3D CNNs and UK Biobank data, we found that 57 contrasts derived from structural, susceptibility-weighted, diffusion, and functional MRI can successfully predict brain age. For each contrast, different patterns of association with non-imaging phenotypes were found, resulting in a total of 191 unique, statistically significant associations. Furthermore, we found that ensembling data from multiple contrasts results in both higher prediction accuracies and stronger correlations to non-imaging measurements. Our results demonstrate that other 3D contrasts and modalities, which have not been considered so far for the task of brain age prediction, encode different information about the ageing brain. We envision our work as being the starting point for future investigations into the causal links underpinning the observed brain age deltas and non-imaging measurement associations. For instance, drug effects can be monitored, given that certain medications correlated with accelerated brain ageing. Furthermore, continued development of brain age models could facilitate their deployment in clinical trials for recruitment and monitoring, and hospitals for diagnostic and screening tasks.
Data Description
This dataset contains the full correlation results with all nIDPs in the UK Biobank. These are presented in datasets split by sex in Female and Male subjects. For easier data manipulation, two smaller datasets have also been made available, containing just those correlation which pass the False Discovery Rate (FDR) threshold.
As experiments were also conducted for ensembles using multiple contrasts, similar datasets are provided for those.
Finally, global datasets are also provided. These are the concatenation of the associations contained in the Male and Female datasets.
Paper & Code
The original paper for this article can be accessed here:
To access the codes relevant for this project, please access the project GitHub Repos:
If using this work, please cite it based on the above paper, or using the following BibTex:
@inproceedings{roibu2023brain,
title={Brain Ages Derived from Different MRI Modalities are Associated with Distinct Biological Phenotypes},
author={Roibu, Andrei-Claudiu and Adaszewski, Stanislaw and Schindler, Torsten and Smith, Stephen M and Namburete, Ana IL and Lange, Frederik J},
booktitle={2023 10th IEEE Swiss Conference on Data Science (SDS)},
pages={17--25},
year={2023},
organization={IEEE},
doi={10.1109/SDS57534.2023.00010}
}
Data Access
The data for this project is freely available upon application at the UK Biobank. For more information regarding the individual nIDPs, please access the UK Biobank Showcase website at: https://biobank.ctsu.ox.ac.uk/showcase/search.cgi
Funding
ACR is supported by EPSRC Grant EP/S024093/1, F. Hoffmann-La Roche AG and a 2021 Industrial Fellowship offered by the Royal Commission for the Exhibition of 1851. SMS is supported by a Wellcome Trust Collaborative Award 215573/Z/19/Z. AILN is grateful for support from the Academy of Medical Sciences under the Springboard Awards scheme (SBF005/1136), and the Bill and Melinda Gates Foundation. FJL is supported by a Wellcome Trust Collaborative Award (215573/Z/19/Z). The WIN is supported by core funding from the Wellcome Trust (203139/Z/16/Z). The computational aspects were supported by the Wellcome Trust (203141/Z/16/Z) and the NIHR Oxford BRC. Corresponding authors: ACR (andreiroibu@icloud.com), SA (stanislaw.adaszewski@roche.com) and AILN (ana.namburete@cs.ox.ac.uk).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Levels of sociability are continuously distributed in the general population, and decreased sociability represents an early manifestation of several brain disorders. Here, we investigated the genetic underpinnings of sociability in the population.Main question of our research: 1. Are there common genetic variants that are associated with sociability in the general population? 2. Are genetic variants that are associated with sociability also associated with neuropsychiatric disorders?Type of data uploaded in this repository:The UK Biobank project (see https://www.ukbiobank.ac.uk/) is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. The raw data that this project is based on comes from the publically available UK Biobank set, which is very large and is therefore not provided here. Here we only provide the results from our analysis, that is also described here: https://www.biorxiv.org/content/10.1101/781195v2 and currently in revision in a scientific journal. In the dataset you will find the association of 9327396 genetic variants with the phenotype sociability. This dataset is not applicable to be opened with Excel, and can best be opened on a cluster computer or using specfic software.SubjectsThe UK Biobank (UKBB) is a major population-based cohort from the United Kingdom that includes individuals aged between 37 and 73 years. We constructed a sociability measure based on the the aggregation of scores per participant on four questions from the UKBB database that link to sociability, including (1) a question about the frequency of friend/family visits, (2) a question on the number and type of social venues that are visited, (3) a question about worrying after social embarrassment and (4) a question about feeling lonely, leading to a sociability score ranging from 0-4. Participants were excluded if they had somatic problems that could be related to social withdrawal (BMI < 15 or BMI > 40, narcolepsy (all the time), stroke, severe tinnitus, deafness or brain-related cancers) or if they answered that they had “No friends/family outside household” or “Do not know” or “Prefer not to answer” to any of the questions.SNP genotyping and quality controlDetails about the available genome-wide genotyping data for UKBB participants have been reported previously (PMID: 30305743). We used third-release genotyping data (see https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100319). Briefly, 49,950 participants were genotyped using the UK BiLEVE Axiom Array and 438,427 participants were genotyped using UK Biobank Axiom Array. Genotypes were imputed into the dataset using the Haplotype Reference Consortium (HRC), and the UK10K haplotype resource. To account for ethnicity, we included only those individuals that identified themselves as "white" by self-report and plotted the Principal Components (PC) provided by the UKBB, excluding individuals considered to be outliers according to PCs 1 and 2. Genetic relatedness calculated with KING kinship and provided by the UKBB (https://kenhanscombe.github.io/ukbtools/articles/explore-ukb-data.html ; http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/UKBiobank_genotyping_QC_documentation-web.pdf) was used to identify first and second-degree relatives. Subsequently ´families´ (i.e. clusters of related individuals above an IBD>0.125 threshold) were created and only one individual from each of these created ‘families’ was included in the analysis. If self-reported sex and SNP-based sex differed, individuals were excluded from further analysis. Single nucleotide polymorphisms (SNPs) with minor allele frequency <0.005, Hardy-Weinberg equilibrium test P value<1e−6, missing genotype rate >0.05, and imputation quality of INFO <0.8 were excluded. In the current study, all analyses are based on 342,461 participants of European ancestry for which both genotype data and sociability scores were available.Genome-wide association analysisGenome-wide association analysis with the imputed marker dosages was performed in PLINK1.9, using a linear regression model with the sociability measure as the dependent variable and including sex, age, 10 first PCs, assessment center, and genotype batch as covariates. SNPs were considered significantly associated if they had p-value < 5e-8. Associated loci were considered independent of each other at r2 0.6 and lead SNPs were classified as the SNP with the smallest association p-value and at r2 0.1, using a 250kb window.The summary statistics come from the plink2 linear regression analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is used to conduct cohort study to evaluate the association between smoking and the risk of inflammatory bowel disease.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Drug treatment for nociceptive musculoskeletal pain (NMP) follows a three-step analgesic ladder, starting from non-steroidal anti-inflammatory drugs (NSAIDs), followed by weak or strong opioids until the pain is under control. Here, we conducted a genome-wide association study (GWAS) of a binary phenotype comparing NSAID users and opioid users as a proxy of treatment response to NSAID using data from the UK Biobank. We aim to find the common genetic variants associated with pain treatment response in the general population.Type of data uploaded in this repositoryUK Biobank is a large-scale biomedical database and research resource containing in-depth genetic and health information from half a million UK participants (https://www.ukbiobank.ac.uk/). The database is globally accessible to approved researchers undertaking vital research into the most common and life-threatening diseases. As the raw data is quite large and only available upon application to UKB, we only provide the results from our analysis, which is also described here: medrxiv and currently in revision in a scientific journal. In the dataset, you will find the association of 9,435,994 SNPs genetic variants with the pain treatment response (PTR) phenotype. This dataset is not applicable to be opened with Excel and can best be opened on a cluster computer or using specific software.SubjectsThe UK Biobank is a general population cohort with over 0.5 million participants aged 40–69 recruited across the United Kingdom (UK). We derived a phenotype as a proxy for the pain treatment response to NSAIDs by using recently released primary care (general practitioners', GPs') data, which contains longitudinal structured diagnosis and prescription data. To define the PTR phenotype, we first extracted all nociceptive musculoskeletal pain (NMP) treatments and diagnoses from the GP data. NMP diagnosis was primarily selected from the chapters on musculoskeletal and connective tissue diseases and relevant symptoms or signs from other chapters in the Read codes (versions 2 and 3). See Supplementary data 1 on medrxiv for the diagnosis codes included in this study. Secondly, pain prescriptions (NSAID and opioid) were extracted from the GP data using the British national formulary (BNF), dictionary of medicines and devices (dmd), and Read code (version 2) for data extraction. An overview of the extracted medication codes is provided in Supplementary data 2 on medrxiv. Only participants with an NMP diagnosis record and a pain prescription record occurring on the same date were included for analysis to ensure that we would only include pain treatment for NMP.PhenotypeBased on the information of NMP and pain prescriptions from the UK biobank, a dichotomous score was used for the binary (case/control) PTR phenotype: NSAID users were defined as controls and opioid users as cases. Two additional quality control (QC) steps were applied. First, participants with only one treatment event were removed to safeguard the inclusion of only participants with relatively long-term treatment. Second, a chronological check was applied for the first prescription of each ladder to ensure that the treatment ladder was correctly followed, i.e., initial NSAID use was followed by weak or strong opioids. Participants that were not treated according to this order were removed.SNP genotyping and quality controlGenotyping procedures have been described in detail elsewhere [PMID: 30305743].The third-release genotyping data were used for analysis (see https://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100319).Participants passing quality control were included for analysis. QC steps for the samples included removal of participants with (1) inconsistent self-reported and genetically determined sex, (2) missing individual genetic data with a frequency of more than 0.1, (3) putative sex-chromosome aneuploidy. Participants were also excluded from the analysis if they were considered outliers due to missing heterozygosity, not white British ancestry based on the genotype, and had missing covariate data. Note that when we fit the linear mixed model in GCTA, it reminded us that the number of closely related participants was low. Therefore, we didn't further remove the related individuals in the sample.Routine QC steps for genetic markers on autosomes included removal of single nucleotide polymorphisms (SNPs) with (1) an imputation quality score less than 0.8, (2) a minor allele frequency (MAF) less than 0.005, (3) a Hardy-Weinberg equilibrium (HWE) test P-value less than 1 × 10−6, and (4) a genotyping call rate less than 0.95.Genome-wide association analysisA GWAS for binary PTR phenotype was conducted using a linear function in GCTA [38] for markers on the autosomal chromosomes, adjusting for age, sex, BMI, depression history, smoking status, drinking frequency, assessment center, genotyping array, and the first ten principal components (PCs). The following variables from the UK Biobank data set...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The UK Biobank is a large-scale biomedical database and research resource, containing genetic and health information from half a million individuals aged 40 to 69 years in the United Kingdom. The genotyping methods and quality control steps previously reported (PMID: 30305743). The MGUS cases were defined by using cancer registry data (Data-Field 40011 and Data-Field 40006) by using ICD-10 code D47.2 and ICD-O-3 code 9765. The control group was created by removing participants who had any cancer-related record in either another cancer registry, hospital record, or self-reported history of cancer. From the whole cohort, participants were removed if they reported non-white British ethnic background, sex chromosome aneuploidy, genetic relatedness exclusions, recommended genomic analysis exclusions, genetic and reported sex mismatch. After exclusion steps, a total of 107 MGUS cases and 277496 controls were used for GWAS analysis. The association models in both steps also included the following covariates: age (cases: age at diagnosis, controls: age at recruitment), sex, genotyping array, and the first 10 genetic principal components (PCs).
Column Names in the summary statistics:
CHROM: Chromosome
GENPOS: HRCH_37 position
ID:rsid
ALLELE0: Non-effect Allel
ALLELE1: Effect Allel
A1FREQ: Freq of effect allel
A1FREQ_CASES
A1FREQ_CONTROLS
INFO
N
N_CASES
N_CONTROLS
TEST
BETA
SE
CHISQ
LOG10P
EXTRA
SNP: SNP names in Chr:Pos:A0:A1
P
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mapping the phenotype model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analyze the same J = 394,174 SNPs and G = 18, 364 SNP-sets used in the Framingham Heart Study analyses. Here, SNP-set annotations are based on gene boundaries defined by the NCBI’s RefSeq database in the UCSC Genome Browser [50]. Unannotated SNPs located within the same genomic region were labeled as being within the “intergenic region” between two genes. This file gives the posterior inclusion probabilities (PIPs) for the input and hidden layer neural network weights after fitting the BANNs model on the individual-level data. We assess significance for both SNPs and SNP-sets according to the “median probability model” threshold 57. Page #1 provides the variant-level association mapping results with columns corresponding to: (1) chromosome; (2) SNP ID; (3) chromosomal position in base-pair (bp) coordinates; (4) SNP PIP; and (5) SuSiE PIP, which corresponds to SNP-level posterior inclusion probabilities computed by SuSiE [46]. Page #2 provides the SNP-set level enrichment results with columns corresponding to: (1) chromosome; (2) SNP-set ID; (3-4) the starting and ending position of the SNP-set chromosomal boundaries; (5) SNP-set PIP; (6) RSS PIP, which corresponds to the posterior inclusion probabilities computed by RSS [26]; (7) the number of SNPs that have been annotated within each SNP-set; (8) the “top” associated SNP within each SNP-set; (9) the PIP of each top SNP. Pages #3 and #4 provide similar results based on analyses where each SNP-set annotation has been augmented with a ±500 kilobase (kb) buffer to account for possible regulatory elements. (ZIP)
Biobanking Market Size 2024-2028
The biobanking market size is forecast to increase by USD 1.67 billion, at a CAGR of 9.04% between 2023 and 2028.
The market is experiencing significant growth, driven by the increasing demand for regenerative medicine. This trend is fueled by advancements in genetic research and the potential for customized treatment plans based on individual genetic profiles. Another key driver is the emergence of stem cell storage in biobanks and biopreservation, offering new opportunities for medical research and therapeutic applications. However, this market also faces challenges. Ethical issues surrounding the collection, storage, and use of biological samples remain a significant obstacle. Ensuring informed consent, privacy protection, and adherence to regulatory guidelines are essential for maintaining public trust and avoiding potential legal disputes.
Companies seeking to capitalize on market opportunities must navigate these challenges effectively, while also staying abreast of technological advancements and evolving customer needs. Success in the market requires a strong commitment to ethical practices, innovative solutions, and strategic partnerships.
What will be the Size of the Biobanking Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free Sample
The market continues to evolve, driven by advancements in data management, sample collection, and research applications. Biobanks are increasingly integrating LIMS systems for efficient sample accessibility and inventory management. Forensic samples and microbial samples join the ranks of clinical and research specimens in biobanking, expanding its scope. Data analytics plays a crucial role in drug discovery and precision medicine, necessitating robust data security and access control. Ethical considerations, informed consent, and biobanking ethics remain paramount, shaping the industry's growth. Cell lines and audit trails are essential components of biobanking, ensuring transparency and traceability. Biobanking software facilitates sample availability and public health research, while temperature monitoring, humidity control, and predictive modeling optimize sample storage and processing.
Biobank networks collaborate to share resources and expertise, fostering advancements in therapeutic development, biomarker discovery, and disease research. Intellectual property rights and metadata standards ensure data integrity and enable data sharing. Short-term and long-term storage solutions, including dry ice, liquid nitrogen, and cryogenic freezers, cater to various sample preservation requirements. Automated liquid handling and temperature monitoring systems streamline sample processing and enhance quality control. Biobanking's continuous dynamism is reflected in its applications across sectors, from clinical trials to public health, and its role in advancing research and therapeutic development.
How is this Biobanking Industry segmented?
The biobanking industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Type
Physical
Virtual
Product
Equipment
Consumables
End-User
Pharmaceutical & Biotechnology Companies
Academic & Research Institutions
Hospitals
Contract Research Organizations (CROs)
Application
Regenerative Medicine
Life Science Research
Clinical Research
Drug Discovery & Development
Personalized Medicine
Sample Type
Blood Products
Human Tissues
Cell Lines
Nucleic Acids
Biological Fluids
Human Waste Products
Biobank Type
Population-Based Biobanks
Disease-Based Biobanks
Virtual Biobanks
Tissue Biobanks
Genetic Biobanks
Geography
North America
US
Canada
Europe
France
Germany
Italy
UK
Middle East and Africa
Egypt
KSA
Oman
UAE
APAC
China
India
Japan
South America
Argentina
Brazil
Rest of World (ROW)
By Type Insights
The physical segment is estimated to witness significant growth during the forecast period.
Biobanks, as repositories for biological samples including human tissues, cells, blood, DNA, and other biomolecules, play a crucial role in research and medical applications. The physical segment of the market encompasses various types of biobanks, categorized by the nature of the samples. These include tissue biobanks, cell biobanks, and blood biobanks. The increasing emphasis on personalized medicine, which customizes treatments based on individual patients' genetic makeup and biomarkers, drives the demand for high-quality biological samples. Data management is
Database of associations between traits and variants using UK Biobank cohort. Searchable atlas of genetic associations. Assists researchers to query UK Biobank. Provides unbiased view of phenotype and genotype associations across of traits.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains results of a genome-wide association study of back pain. Two files contain association summary statistics for discovery GWAS based on the analysis of 350,000 white British individuals from the UK Biobank and meta-analysis GWAS based on the meta-analysis of the same 350,000 individuals and additional 103,862 individuals of European Ancestry from the UK biobank (total N = 453,862). The phenotype of back pain was defined by the answer provided by the UK biobank participants to the following question: "Pain type(s) experienced in last month". Those who reported “Back pain”, were considered as cases, all the rest were considered as controls. Individuals who did not reply or replied: "Prefer not to answer" or "Pain all over the body" were excluded. This dataset is also available for graphical exploration in the genomic context at http://gwasarchive.org.
The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilisation of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant. This research has been conducted using the UK Biobank Resource and the use of the data is guided by the principles formulated by the UK Biobank.
When using downloaded data, please cite corresponding paper and this repository:
Insight into the genetic architecture of back pain and its risk factors from a study of 509,000 individuals. Freidin, Maxim; Tsepilov, Yakov; Palmer, Melody; Karssen, Lennart; Suri, Pradeep; Aulchenko, Yurii; Williams, Frances MK,# CHARGE Musculoskeletal Working Group. PAIN: February 06, 2019 - Volume Articles in Press - Issue - p doi: 10.1097/j.pain.0000000000001514
Maxim B Freidin, Yakov A Tsepilov, Melody Palmer, Lennart Karssen, CHARGE Musculoskeletal Working Group, Pradeep Suri, … Frances MK Williams. (2018). Genome-wide association summary statistics for back pain (Version 1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1319332
Funding:
This study was supported by the European Community’s Seventh Framework Programme funded project PainOmics (Grant agreement # 602736). The research has been conducted using the UK Biobank Resource (project # 18219).
The development of software implementing SMR/HEIDI test and database for GWAS results was supported by the Russian Ministry of Science and Education under the 5-100 Excellence Program”.
Dr. Suri’s time for this work was supported by VA Career Development Award # 1IK2RX001515 from the United States (U.S.) Department of Veterans Affairs Rehabilitation Research and Development Service. The contents of this work do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.
Dr. Tsepilov’s time for this work was supported in part by the Russian Ministry of Science and Education under the 5-100 Excellence Program.
Column headers - discovery (350K)
CHR: chromosome
POS: position (GRCh37 build)
ID: SNP rsID
REF: reference allele (coded as "0")
ALT: effect allele (coded as "1")
CASE_ALLELE_CT: allele observation count in cases
CTRL_ALLELE_CT: allele observation count in controls
ALT_FREQ: effect allele frequency
MACH_R2: imputation quality
TEST: model of association test (additive)
OBS_CT: sample size
BETA: effect size of effect allele
SE: standard error of effect size
T_STAT: Z-value of effect allele
P: P-value of association (without GC correction)
MAF: minor allele frequency
Column headers - meta-analysis (450K)
MarkerName: SNP rsID
Allele1: effect allele (coded as "1")
Allele2: reference allele (coded as "0")
Freq1: effect allele frequency
FreqSE: standard error of effect allele frequency
Effect: effect size of effect allele
StdErr: standard error of effect size
P-value: P-value of association (without GC correction)
Direction: sign of effect in discovery and replication samples
n_total: Total sample size
CHR: chromosome
POS: position (GRCh37 build)
MACH_R2_discovery: imputation quality in discovery sample
http://www.donorhealth-btru.nihr.ac.uk/wp-content/uploads/2020/04/Data-Access-Policy-v1.0-14Apr2020.pdfhttp://www.donorhealth-btru.nihr.ac.uk/wp-content/uploads/2020/04/Data-Access-Policy-v1.0-14Apr2020.pdf
In over 100 years of blood donation practice, INTERVAL is the first randomised controlled trial to assess the impact of varying the frequency of blood donation on donor health and the blood supply. It provided policy-makers with evidence that collecting blood more frequently than current intervals can be implemented over two years without impacting on donor health, allowing better management of the supply to the NHS of units of blood with in-demand blood groups. INTERVAL was designed to deliver a multi-purpose strategy: an initial purpose related to blood donation research aiming to improve NHS Blood and Transplant’s core services and a longer-term purpose related to the creation of a comprehensive resource that will enable detailed studies of health-related questions.
Approximately 50,000 generally healthy blood donors were recruited between June 2012 and June 2014 from 25 NHS Blood Donation centres across England. Approximately equal numbers of men and women; aged from 18-80; ~93% white ancestry. All participants completed brief online questionnaires at baseline and gave blood samples for research purposes. Participants were randomised to giving blood every 8/10/12 weeks (for men) and 12/14/16 weeks (for women) over a 2-year period. ~30,000 participants returned after 2 years and completed a brief online questionnaire and gave further blood samples for research purposes.
The baseline questionnaire includes brief lifestyle information (smoking, alcohol consumption, etc), iron-related questions (e.g., red meat consumption), self-reported height and weight, etc. The SF-36 questionnaire was completed online at baseline and 2-years, with a 6-monthly SF-12 questionnaire between baseline and 2-years.
All participants have had the Affymetrix Axiom UK Biobank genotyping array assayed and then imputed to 1000G+UK10K combined reference panel (80M variants in total). 4,000 participants have 50X whole-exome sequencing and 12,000 participants have 15X whole-genome sequencing. Whole-blood RNA sequencing has commenced in ~5,000 participants.
The dataset also contains data on clinical chemistry biomarkers, blood cell traits, >200 lipoproteins, metabolomics (Metabolon HD4), lipidomics, and proteomics (SomaLogic, Olink), either cohort-wide or is large sub-sets of the cohort.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This contains pre-processed LD files (Sigma matrix, S matrix, ...etc) computed on the EUR cohort of Pan-UKB LD data. It is intended to be used as an input to the GhostKnockoffGWAS pipeline.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Linkage Disequilibrium (LD) matrices for six ancestry groups from the UK Biobank.
LD matrices record the SNP-by-SNP correlations in a given sample of individuals from the general population. In this case, we threshold the matrices so that we only record the correlations between variants in the same LD block (defined by LDetect). The continental ancestry groups are defined by the Pan-UKB initiative as:
EUR
= European ancestry (N=362446)CSA
= Central/South Asian ancestry (N=8284)AFR
= African ancestry (N=6255)EAS
= East Asian ancestry (N=2700)MID
= Middle Eastern ancestry (N=1567)AMR
= Admixed American ancestry (N=987)The sample sizes here are restricted to unrelated individuals in the UK Biobank. The matrices were computed using magenpy and quantized to int8
data type for better compressibility. The standard matrices (EUR.tar.gz, AFR.tar.gz, ...
) contain pairwise correlations for 1.4 million HapMap3+ variants. For European samples, we also provide LD matrices that record pairwise correlations for up to 18 million variants (EUR_18m_variants.tar.gz
)
For more details on how these matrices were computed, please consult our manuscript:
Towards whole-genome inference of polygenic scores with fast and memory-efficient algorithms
Shadi Zabad, Chirayu Anant Haryan, Simon Gravel, Sanchit Misra, Yue Li
To access these matrices, consult the codebase of magenpy, our custom python package with special data structures for processing these LD matrices.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variable mapped to malnutrition, frailty and sarcopenia.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data contains of the information on the mqtls of smoking-related methylation and is used to perform the G-E interaction analysis (for CD).