100+ datasets found

[Dataset] Data for the course "Population Genomics" at Aarhus University
zenodo.org
application/gzip, bin
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7670839
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside

Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by

creating the folder Course_Env: mkdir Course_Env

untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env

Activate the environment: conda activate ./Course_Env

Run the unpacking script (it can take quite some time to get it done): conda-unpack

Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.

environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:

conda env create -f environment_with_args.yml -p ./Course_Env

conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.

Apply commonly used population genomic methods.

Explain the theory behind common population genomic methods.

Reflect on strengths and limitations of population genomic methods.

Interpret and analyze results of population genomic inference.

Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:

Coop chapters 1, 2, 3, Paper: Genome Diversity Project

Drift and the coalescent:

Coop chapter 4; Paper: Platypus

Exercise: Read mapping and base calling

Recombination:

Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation

Exercise: Phasing and recombination rate

Population strucure and incomplete lineage sorting:

Lecture: Coop chapter 6, Review: Incomplete lineage sorting

Exercise: Working with VCF files

Hidden Markov models:

Lecture: Durbin chapter 3, Paper: population structure

Exercise: Inference of population structure and admixture

Ancestral recombination graphs:

Lecture: Paper: Approximating the ARG, Paper: Tree inference

Exercise: ARG dashboard exercises + Inference of trees along sequence

Past population demography:

Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference

Exercise: Inferring historical populations

Direct and linked selection:

Lecture: Coop chapters 12, 13, revisit Paper: Tree inference

Admixture:

Lecture: Review: Admixture, Paper: Admixture inference

Exercise: Detecting archaic ancestry in modern humans

Genome-wide association study (GWAS):

Lecture: Coop lecture notes 99-120

Exercise: GWAS quality control

Heritability:

Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)

Exercise: Association testing

Evolution and disease:

Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)

Exercise: Estimating heritability
H
The 23andMe GWAS summary statistics for top 10,000 genetic markers...
dataverse.harvard.edu
search.dataone.org
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haoyu Zhang (2023). The 23andMe GWAS summary statistics for top 10,000 genetic markers associated with three traits [Dataset]. http://doi.org/10.7910/DVN/3NBNCV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/3NBNCV
Dataset updated
Sep 6, 2023
Dataset provided by
Harvard Dataverse
Authors
Haoyu Zhang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes GWAS (Genome-Wide Association Studies) summary statistics for top 10,000 genetic markers associated with three traits across five diverse ancestries. These traits and ancestries form part of the study outlined in the manuscript: "A new method for multiancestry polygenic prediction improves performance across diverse populations". The research manuscript can be accessed via this link: https://www.biorxiv.org/content/10.1101/2022.03.24.485519v5.abstract. The three traits explored in this dataset include height, sing back musical note (the ability to replicate a musical note), and morning person. These traits were examined across five ancestral backgrounds: African American (AFR), Native American (AMR), European (EUR), East Asian (EAS), and South Asian (SAS).
f
Population genetic data for MT and autosomal genes.
datasetcatalog.nlm.nih.gov
Updated Aug 29, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
De Hoff, Peter L.; Ferris, Patrick; Miyagi, Ayano; Olson, Bradley J. S. C.; Umen, James G.; Geng, (2013). Population genetic data for MT and autosomal genes. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001630528
Explore at:
Dataset updated
Aug 29, 2013
Authors
De Hoff, Peter L.; Ferris, Patrick; Miyagi, Ayano; Olson, Bradley J. S. C.; Umen, James G.; Geng,
Description
Notes: na not applicable.1Number of MT+ and MT− sequences analyzed for each gene.2Polymorphism rate for silent sites (non-coding and synonymous)×1000. Standard deviation in parentheses. Values are given for all sequences (total) and for the MT+ and MT− isolates separately. MT+ and MT− values that differ from the total value by >1 standard deviation are shown in bold.3Population differentiation between MT+ and MT− isolates.Values near 0 correspond to no differentiation and values near 1 correspond to complete differentiation. Bold values correspond to those genes showing significant differentiation between MT+ and MT− isolates.
Summary Statistics.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason M. Fletcher (2023). Summary Statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0050576.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0050576.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jason M. Fletcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NHANES 1991–1994 Genetic Sample (N = 6,178).Notes: Author’s calculations from NHANES Data. Sample weights used.
Number of nominally significant genes before and after filtering.
plos.figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder (2023). Number of nominally significant genes before and after filtering. [Dataset]. http://doi.org/10.1371/journal.pgen.1003443.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1003443.t004
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: Significance level is 0.01, not corrected for muliple testing. The analyses of the first two rows are for all genes that have at least one MAC in Baylor and Broad dataset. The last rows are restricted to the genes that have more than 15 minor alleles after combining Baylor and Broad datasets.
Genomic control and for all tests before and after PC adjustment.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder (2023). Genomic control and for all tests before and after PC adjustment. [Dataset]. http://doi.org/10.1371/journal.pgen.1003443.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1003443.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: These analyses are restricted to the genes that have more than 4 minor alleles in the samples used in each study. and are calculated based on the median and the 1st quantile of the p-value distribution, respectively. PC adjustment is based on the common variants (CVs) eigen-vectors.
f
Population genetic diversity statistics for the invariant genes tested for...
datasetcatalog.nlm.nih.gov
Updated Jun 10, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Garcia-Navarro, Elena; Burke, John M.; McAssey, Edward V.; Nambeesan, Savithri; Mandel, Jennifer R. (2014). Population genetic diversity statistics for the invariant genes tested for evidence of positive selection. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001190595
Explore at:
Dataset updated
Jun 10, 2014
Authors
Garcia-Navarro, Elena; Burke, John M.; McAssey, Edward V.; Nambeesan, Savithri; Mandel, Jennifer R.
Description
Panel, W = wild, P = primitive, I = improved; L = alignment length in basepairs; l = number of synonymous sites; S = number of segregating synonymous sites; π = nucleotide diversity for synonymous sites; θ = Waterson's theta for synonymous sites; Sig. = ML-HKA significance: ns = not significant, P<0.001 = ***, P<0.01 = **, P<0.05 = *. Bold genes are those that showed significant evidence of selection. Note: we were unable to successfully sequence the IPT5 gene in P.
u
Data from: Plant Expression Database
agdatacommons.nal.usda.gov
bin
Updated Feb 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sudhansu S. Dash; John Van Hemert; Lu Hong; Roger P. Wise; Julie A. Dickerson (2024). Plant Expression Database [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Plant_Expression_Database/24661179
Explore at:
binAvailable download formats
Dataset updated
Feb 9, 2024
Dataset provided by
PLEXdb
Authors
Sudhansu S. Dash; John Van Hemert; Lu Hong; Roger P. Wise; Julie A. Dickerson
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
[NOTE: PLEXdb is no longer available online. Oct 2019.] PLEXdb (Plant Expression Database) is a unified gene expression resource for plants and plant pathogens. PLEXdb is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data. PLEXdb (http://www.plexdb.org), in partnership with community databases, supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facilitate the interpretation of structure, function and regulation of genes in economically important plants. A list of Gene Atlas experiments highlights data sets that give responses across different developmental stages, conditions and tissues. Tools at PLEXdb allow users to perform complex analyses quickly and easily. The Model Genome Interrogator (MGI) tool supports mapping gene lists onto corresponding genes from model plant organisms, including rice and Arabidopsis. MGI predicts homologies, displays gene structures and supporting information for annotated genes and full-length cDNAs. The gene list-processing wizard guides users through PLEXdb functions for creating, analyzing, annotating and managing gene lists. Users can upload their own lists or create them from the output of PLEXdb tools, and then apply diverse higher level analyses, such as ANOVA and clustering. PLEXdb also provides methods for users to track how gene expression changes across many different experiments using the Gene OscilloScope. This tool can identify interesting expression patterns, such as up-regulation under diverse conditions or checking any gene’s suitability as a steady-state control. Resources in this dataset:Resource Title: Website Pointer for Plant Expression Database, Iowa State University. File Name: Web Page, url: https://www.bcb.iastate.edu/plant-expression-database [NOTE: PLEXdb is no longer available online. Oct 2019.] Project description for the Plant Expression Database (PLEXdb) and integrated tools.
d
Data from: Replicated analysis of the genetic architecture of quantitative...
datadryad.org
zip
Updated Nov 4, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna W. Santure; Jocelyn Poissant; Isabelle De Cauwer; Kees van Oers; Matthew R. Robinson; John L. Quinn; Martien A. M. Groenen; Marcel E. Visser; Ben C. Sheldon; Jon Slate (2015). Replicated analysis of the genetic architecture of quantitative traits in two wild great tit populations [Dataset]. http://doi.org/10.5061/dryad.5t32v
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.5t32v
Dataset updated
Nov 4, 2015
Dataset provided by
Dryad
Authors
Anna W. Santure; Jocelyn Poissant; Isabelle De Cauwer; Kees van Oers; Matthew R. Robinson; John L. Quinn; Martien A. M. Groenen; Marcel E. Visser; Ben C. Sheldon; Jon Slate
Time period covered
Oct 25, 2015
Area covered
5°50'E, United Kingdom, 1°20’W, 52°02’N, 5°51’E, 51°46’N, Westerheide, 52°01'N, Wytham Woods, De Hoge Veluwe National Park
Description
data_readmeA readme file explaining the information loaded to DryadNL map fileplink-style map file with marker locations for the NL dataset.The map file is in an amended plink format with chromosome in column 1, SNP name in column 2, cM position in column 3 (note this is different from defaul plink map files which have distances in Morgans) and genome order in column 4 (again this is different from the plink format which would usually have bp position in this column).

Chromosome codings in the map file (column 1) are as follows: Chromosomes 1-15 and 17-28 are the corresponding chromosomes 1-15 and 17-28 in the great tit genome Chromosome 29 = chromosome 1A Chromosome 30 = chromosome 4A Chromosome 31 = Z chromosome Chromosome 32 = Linkage group LGE22NL_numeric_ids.mapNL genotype fileGenotypes for the 1,407 NL individuals. These are standard plink files (see http://pngu.mgh.harvard.edu/~purcell/plink/) where the first column gives a family id, the second column an individual id (in this ...
f
Data from: The search for loci under selection: trends, biases and progress
datasetcatalog.nlm.nih.gov
Updated Jun 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umbers, Kate D. L.; Rymer, Paul D.; Ahrens, Collin W.; Stow, Adam; Dillon, Shannon; Dudaniec, Rachael Y.; Bragg, Jason (2022). Data from: The search for loci under selection: trends, biases and progress [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000266863
Explore at:
Dataset updated
Jun 10, 2022
Authors
Umbers, Kate D. L.; Rymer, Paul D.; Ahrens, Collin W.; Stow, Adam; Dillon, Shannon; Dudaniec, Rachael Y.; Bragg, Jason
Description
Detecting genetic variants under selection using FST outlier analysis (OA) and environmental association analyses (EAA) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field-based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analyzed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographic coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP-based selection studies and help to progress landscape and evolutionary genomics. Usage Notes Table S1 - Full data set.Data was collected by reading papers associated with environmental association analyses. Data includes location, species, methods used, genetic parameters of data sets reviewed, and analytical parameters of the analyses.Table S1_data.xlsxR code for mixed-effects linear modelsThe R code used to create the figures and estimate regressions of the data set.Ahrens et al 2018_MolEcol_review.R
d
Main model fits and substitution rate predictions for: A quantitative...
search.dataone.org
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vince Buffalo; Andrew Kern (2025). Main model fits and substitution rate predictions for: A quantitative genetic model of background selection in humans [Dataset]. http://doi.org/10.5061/dryad.qnk98sfnv
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.qnk98sfnv
Dataset updated
Jul 26, 2025
Dataset provided by
Dryad Digital Repository
Authors
Vince Buffalo; Andrew Kern
Time period covered
Jan 1, 2023
Description
Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This can be thought of as a â€œlinked selection signal" that reflects the impact of selection varying according to the placement of functional regions and recombination rates along the genome. Previous work has shown that negative selection against the steady influx of new deleterious mutations into conserved regions is the predominant mode of selection in humans. However, the theoretic model that underpins these results, classic Background Selection theory, is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of the linked selection, which models the effects of weak draft created according to how polygenic additive fitness variance is distributed along the genome. We use a recent model that jointly predicts the equilibrium fitness variance and su..., These Python pickle files contain the model outputs from bgspy (http://github.com/vsbuffalo/bprime/) for the CADD 6%, CADD 8%, PhastCons Priority, and Feature Priority Models., , # Main model fits and substitution rate predictions for: A quantitative genetic model of background selection in humans

Usage Notes

All files are in in standard Python file formats. To load the pickle files, install the accompanying bprime software available on GiHub.

Note that all TSV files here were written by analyses in Jupyter notebooks that are available on the bprime GitHub page.

Files

Model Fits

There are pickle files of model results, generated by bgspy collect.

cadd6_decode_altgrid.pkl: CADD 6%

cadd8_decode_altgrid.pkl: CADD 8%

CDS_genes_phastcons_decode_altgrid.pkl: Feature Priority

phastcons_CDS_genes_decode_altgrid.pkl: PhastCons Priority

Files Produced by Sims

empiricalB_chr10_expansion_false_h_0.5_results.npz: simulation B "empirical" B maps for fixed demography

empiricalB_chr10_expansion_1.004_9.3_h_0.5_results.npz: ...
f
Genetic diversity statistics assayed per Portuguese P. nigra population and...
datasetcatalog.nlm.nih.gov
Updated Dec 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lima-Brito, josé; Dias, Alexandra; Fady, Bruno; Gaspar, Maria João; Bagnoli, Francesca; Spanu, Ilaria; Vendramin, Giovanni; Giovanelli, Guia; Carvalho, Ana; Lousada, José; silva, maria emilia (2019). Genetic diversity statistics assayed per Portuguese P. nigra population and SSR locus. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000163598
Explore at:
Dataset updated
Dec 11, 2019
Authors
Lima-Brito, josé; Dias, Alexandra; Fady, Bruno; Gaspar, Maria João; Bagnoli, Francesca; Spanu, Ilaria; Vendramin, Giovanni; Giovanelli, Guia; Carvalho, Ana; Lousada, José; silva, maria emilia
Description
Genetic diversity statistics assayed per Portuguese P. nigra population and SSR locus. Notes: na - observed number of alleles; ne - effective number of alleles (Kimura and Crow 1964); I - Shannon’s Information Index (Lewontin 1972); h – Nei’s gene diversity index (Nei 1973); Ho – observed heterozygosity; He – expected heterozygosity (Levene 1949); the F – fixation index; and s.d. – standard deviation.
The meta-analyzed GWAS summary statistics for 35 lab biomarkers described in...
nih.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yosuke Tanigawa; Nasa Sinnott-Armstrong; Manuel Rivas (2023). The meta-analyzed GWAS summary statistics for 35 lab biomarkers described in 'Genetics of 35 blood and urine biomarkers in the UK Biobank' [Dataset]. http://doi.org/10.35092/yhjc.12355382.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.35092/yhjc.12355382.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yosuke Tanigawa; Nasa Sinnott-Armstrong; Manuel Rivas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains meta-analyzed GWAS summary statistics for 35 biomarker traits described in the following preprint:N. Sinnott-Armstrong*, Y. Tanigawa*, et al, Genetics of 38 blood and urine biomarkers in the UK Biobank. bioRxiv, 660506 (2019). doi:10.1101/660506Note that we are preparing a revised version of the manuscript and this dataset contains 35 (instead of 38) biomarker phenotypes.We provide the list of 35 biomarkers in "list_of_35_biomarkers.tsv". We used the "Phenotype_name" column in this table for the file names. For each phenotype, we provide two compressed tab-delimited files, named "[Phenotype_name].array.gz" and "[Phenotype_name].imp.gz", which contain the summary statistics for genetic variants on the genotyping array and the imputed dataset, respectively.We used METAL for the meta-analysis for 4 populations (White British, non-British White, African, and South Asian) within UK Biobank. The files have the following columns: CHROM: the chromosomePOS: the positionMarkerName: the variant identifierREF: the reference alleleALT: the alternate alleleEffect: the effect size (BETA) estimateStdErr: the standard error of effect size estimateP-value: the p-value of the associationDirection: the direction of effect sizeHetISq, HetChiSq, HetDf, HetPVal: heterogeneity statistics from METAL Note that we used GRCh37/hg19 genome reference in the analysis and the BETA is always reported for the alternate allele.Please also check the METAL documentation (https://genome.sph.umich.edu/wiki/METAL_Documentation).The summary statistic files are compressed with bgzip and indexed with tabix (the .tbi files). One should be able to read those files with the standard gzip/zcat.
Genome-wide association summary statistics for varicose veins of lower...
zenodo.org
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Zh. Sharapov; Sodbo Zh. Sharapov; Tatiana I. Shashkova; Yakov A. Tsepilov; Yakov A. Tsepilov; Tatiana I. Shashkova (2020). Genome-wide association summary statistics for varicose veins of lower extremities [Dataset]. http://doi.org/10.5281/zenodo.1323484
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1323484
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Zh. Sharapov; Sodbo Zh. Sharapov; Tatiana I. Shashkova; Yakov A. Tsepilov; Yakov A. Tsepilov; Tatiana I. Shashkova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains summary statistics for the discovery and the replication stages of the large-scale genome-wide associations study for varicose veins of lower extremities. The discovery stage was based on genetic association data provided by the Neale Lab (http://www.nealelab.is/) for 337,199 UK biobank individuals. Phenotype “varicose veins of lower extremities” was defined based on International Classification of Disease (ICD-10) billing code “I83” present in the electronic patient record. Data were adjusted for two potential confounders – body mass index and deep venous thrombosis. A replication cohort (N=71,256) was generated by means of reverse meta-analysis of two overlapping datasets: genetic association data for 408,455 UK Biobank participants provided by the Gene ATLAS database (http://geneatlas.roslin.ed.ac.uk/), and the above mentioned data provided by the Neale Lab.

Please, note, that in Shadrina et al (PLOS Genetics 2019) we only used "discovery" dataset, while in biorxiv preprint (https://doi.org/10.1101/368365) both discovery and replication datasets were used.

The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilisation of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant.

When using downloaded data, please cite corresponding paper and this repository:

Shadrina, A. S., Sharapov, S. Z., Shashkova, T. I. & Tsepilov, Y. A. Varicose veins of lower extremities: Insights from the first large-scale genetic study. PLOS Genet. 15, e1008110 (2019).

Alexandra S. Shadrina, Sodbo Zh. Sharapov, Tatiana I. Shashkova, & Yakov A. Tsepilov. (2018). Genome-wide association summary statistics for varicose veins of lower extremities (Version 1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1323484

Funding:

The work of ASS was supported by the Russian Science Foundation [Project No 17-75-20223].
The work of YAT was supported by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
The work of SZS was supported by the Institute of Cytology and Genetics [Project No 0324-2018-0017].

Column headers - discovery

SNP: SNP rsID

b: effect size of effect allele

se: standard error of effect size

chi2: T^2 value of effect allele

Pval: P-value of association (without GC correction)

N: sample size

Chr: chromosome

Pos: position (GRCh37 build)

A1: effect allele (coded as "1")

A2: reference allele (coded as "0")

Column headers - replication

SNP: SNP rsID

A1: effect allele (coded as "1")

A2: reference allele (coded as "0")

N: Total sample size

Z: Z-value of effect allele

P: P-value of association (without GC correction)

Gget data

kaggle.com

zip

Updated Jul 1, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

NG NM　WT (2023). Gget data [Dataset]. https://www.kaggle.com/datasets/ngnmwt/gget-data

Explore at:

zip(452328 bytes)Available download formats

Dataset updated

Jul 1, 2023

Authors

NG NM　WT

License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

pip install gget

import gget

gget.ref(species=None, list_species=True)[:10]


['acanthochromis_polyacanthus', 'accipiter_nisus', 'ailuropoda_melanoleuca', 'amazona_collaria', 'amphilophus_citrinellus', 'amphiprion_ocellaris', 'amphiprion_percula', 'anabas_testudineus', 'anas_platyrhynchos', 'anas_platyrhynchos_platyrhynchos']

gget.ref(species='mus_musculus')

{'mus_musculus': {'transcriptome_cdna': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/cdna/Mus_musculus.GRCm39.cdna.all.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:32', 'bytes': '49M'}, 'genome_dna': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/dna/Mus_musculus.GRCm39.dna.primary_assembly.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '18:37', 'bytes': '769M'}, 'annotation_gtf': {'ftp': 'http://ftp.ensembl.org/pub/release-108/gtf/mus_musculus/Mus_musculus.GRCm39.108.gtf.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:16', 'bytes': '31M'}, 'coding_seq_cds': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/cds/Mus_musculus.GRCm39.cds.all.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:32', 'bytes': '16M'}, 'non-coding_seq_ncRNA': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/ncrna/Mus_musculus.GRCm39.ncrna.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:45', 'bytes': '7.6M'}, 'protein_translation_pep': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/pep/Mus_musculus.GRCm39.pep.all.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:32', 'bytes': '11M'}}}



dl_links = gget.ref(species='mus_musculus', which=['ncrna'], ftp=True) import urllib.request urllib.request.urlretrieve(dl_links[0], './GRCm39_rna.fa.gz')


('./GRCm39_rna.fa.gz',

from pyGeno.Genome import * #load a genome ref = Genome(name = 'GRCh37.75') #load a gene gene = ref.get(Gene, name = 'TPST2')[0] #print the sequences of all the isoforms for prot in gene.get(Protein) : print prot.sequence

pers = Genome(name = 'GRCh37.75', SNPs = ["RNA_S1"], SNPFilter = myFilter())

d
Data from: On the genetic architecture of rapidly adapting and convergent...
datadryad.org
search.dataone.org
zip
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Whiting; Josephine Paris; Paul Parsons; Sophie Matthews; Yuridia Reynoso; Kimberly Hughes; David Reznick; Bonnie Fraser (2022). On the genetic architecture of rapidly adapting and convergent life history traits in guppies [Dataset]. http://doi.org/10.5061/dryad.w3r2280sk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w3r2280sk
Dataset updated
Mar 4, 2022
Dataset provided by
Dryad
Authors
James Whiting; Josephine Paris; Paul Parsons; Sophie Matthews; Yuridia Reynoso; Kimberly Hughes; David Reznick; Bonnie Fraser
Time period covered
Feb 16, 2022
Description
Sequencing data was derived through RAD-sequencing of four F2 cross families (F0s and F2s sequenced). Phenotype data was derived by phenotyping lab-reared individuals according to the methods in Whiting et al. 2022. The linkage map was made using LepMap3.
d
Data from: Male mouse recombination maps for each autosome identified by...
dataone.org
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lutz Froenicke; Lorinda Anderson; Johannes Wienberg; Terry Ashley (2025). Male mouse recombination maps for each autosome identified by chromosome painting [Dataset]. http://doi.org/10.5061/dryad.gb5mkkwx5
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.gb5mkkwx5
Dataset updated
Jul 26, 2025
Dataset provided by
Dryad Digital Repository
Authors
Lutz Froenicke; Lorinda Anderson; Johannes Wienberg; Terry Ashley
Time period covered
Jan 1, 2023
Description
Linkage maps constructed from genetic analysis of gene order and crossover frequency provide few clues to the basis of the genomewide distribution of meiotic recombination, such as chromosome structure, that influences meiotic recombination. To bridge this gap, we have generated the first cytological recombination map that identifies individual autosomes in the male mouse. We prepared meiotic chromosome (synaptonemal complex [SC]) spreads from 110 mouse spermatocytes, identified each autosome by multicolor fluorescence in situ hybridization of chromosome- specific DNA libraries, and mapped 12,000 sites of recombination along individual autosomes, using immunolocalization of MLH1, a mismatch repair protein that marks crossover sites. We show that SC length is strongly correlated with crossover frequency and distribution. Although the length of most SCs corresponds to that predicted from their mitotic chromosome length rank, several SCs are longer or shorter than expected, with correspond..., SC Spreads and Immunostaining Three juvenile (20â€“21 d old) C57BL/6J mice (the same line analyzed by the Mouse Genome Sequencing Project) were used to prepare and immunolabel the SC spreads, as described elsewhere (Anderson et al. 1999). Complete sets of SCs in which the SCs were well separated but not obviously stretched or broken and that had â‰¥ 19 MLH1 foci were selected for analysis. Three fluorescent images (4, 6-diamino-2-phyenylindole [DAPI], SCP3, and MLH1) were captured for each SC set. mFISH After image acquisition of the immunofluorescence signals, the spermatocyte preparations were subjected to two or three rounds of denaturation and FISH. To identify each autosome, chromosome-specific painting probes (Rabbitts et al. 1995) were combinatorially labeled with fluorescein isothiocyanate (FITC)â€“2-deoxyuridine 5-tri phosphate (dUTP), Cy5-dUTP (both from Amersham), or 6-carboxytetramethylrhodamine (TAMRA)-dUTP (Applied Biosystems) and were combined to form two different probe pools ..., , # Male mouse recombination maps for each autosome identified by chromosome painting

Description of the data and file structure

The data are presented in an Excel spreadsheet with 22 sheets.Â Sheet 1 (karyotype-absolute positi calc) defines the average length of each mouse SC, after identification using chromosome-specific DNA probes.Â Sheet 2 (Notes) contains definitions of the headings used for Sheet 3 (raw data sorted by SC) and Sheets 4 through 22 ("SC1 abs" through "SC19 abs"). Sheet 2 also contains explanations for how the karyotype was derived, and two references in which this data was used for publication are also presented.Â Sheet 3 contains the positions of all MLH1 foci observed on all of the SCs with each MLH1 focus position expressed as a fraction of SC length from the centromere.Â Sheets 4 â€“ 22 (labeled as "SC1 abs", "SC2 abs", "SC3 abs", "SC4 abs", SC5 abs, "SC6 abs", "SC7 abs", "SC8 abs", "SC9 abs", "SC10 abs", "SC11 abs", "SC12 abs", "SC13 abs", "SC14 abs", "SC15 ...
n
Data from: SweGen
swefreq-dev.nbis.se
Updated Apr 15, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). SweGen [Dataset]. https://swefreq-dev.nbis.se/dataset/SweGen/browser/gene/ENSG00000186951
Explore at:
Dataset updated
Apr 15, 2019
Description
This dataset contains whole-genome variant frequencies for 1000 Swedish individuals generated within the SweGen project. The frequency data is intended to be used as a resource for the research community and clinical genetics laboratories.

Please note that the 1000 individuals included in the SweGen project represent a cross-section of the Swedish population and that no disease information has been used for the selection. The frequency data may therefore include genetic variants that are associated with, or causative of, disease.

We request that any use of data from the SweGen project cite this article in the European Journal of Human Genetics.

Individual positions in the genome can be viewed using the Beacon or Graphical Browser. To download the variant frequency file you need to register.

A high confidence set of HLA allele frequencies is available for download under Dataset Access. For a detailed description of the SweGen HLA analysis, please see this bioRxiv preprint.
f
Data from: Reference Gene Validation for RT-qPCR, a Note on Different...
datasetcatalog.nlm.nih.gov
Updated Mar 31, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dern-Wieloch, Jutta; Brehm, Ralph; Weigel, Roswitha; Nettersheim, Daniel; Bergmann, Martin; Schumacher, Valérie; Vandekerckhove, Linos; Schorle, Hubert; De Spiegelaere, Ward; Kliesch, Sabine; Fink, Cornelia (2015). Reference Gene Validation for RT-qPCR, a Note on Different Available Software Packages [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001902746
Explore at:
Dataset updated
Mar 31, 2015
Authors
Dern-Wieloch, Jutta; Brehm, Ralph; Weigel, Roswitha; Nettersheim, Daniel; Bergmann, Martin; Schumacher, Valérie; Vandekerckhove, Linos; Schorle, Hubert; De Spiegelaere, Ward; Kliesch, Sabine; Fink, Cornelia
Description
BackgroundAn appropriate normalization strategy is crucial for data analysis from real time reverse transcription polymerase chain reactions (RT-qPCR). It is widely supported to identify and validate stable reference genes, since no single biological gene is stably expressed between cell types or within cells under different conditions. Different algorithms exist to validate optimal reference genes for normalization. Applying human cells, we here compare the three main methods to the online available RefFinder tool that integrates these algorithms along with R-based software packages which include the NormFinder and GeNorm algorithms.Results14 candidate reference genes were assessed by RT-qPCR in two sample sets, i.e. a set of samples of human testicular tissue containing carcinoma in situ (CIS), and a set of samples from the human adult Sertoli cell line (FS1) either cultured alone or in co-culture with the seminoma like cell line (TCam-2) or with equine bone marrow derived mesenchymal stem cells (eBM-MSC). Expression stabilities of the reference genes were evaluated using geNorm, NormFinder, and BestKeeper. Similar results were obtained by the three approaches for the most and least stably expressed genes. The R-based packages NormqPCR, SLqPCR and the NormFinder for R script gave identical gene rankings. Interestingly, different outputs were obtained between the original software packages and the RefFinder tool, which is based on raw Cq values for input. When the raw data were reanalysed assuming 100% efficiency for all genes, then the outputs of the original software packages were similar to the RefFinder software, indicating that RefFinder outputs may be biased because PCR efficiencies are not taken into account.ConclusionsThis report shows that assay efficiency is an important parameter for reference gene validation. New software tools that incorporate these algorithms should be carefully validated prior to use.
Data from: Consequences of the Last Glacial Period on the Genetic Diversity...
zenodo.org
bin, txt, zip
Updated Sep 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Catarina Branco; Catarina Branco; Marina Kanellou; Antonio González-Martín; Antonio González-Martín; Miguel Arenas; Miguel Arenas; Marina Kanellou (2021). Consequences of the Last Glacial Period on the Genetic Diversity of Southeast Asians [Dataset]. http://doi.org/10.5281/zenodo.5515856
Explore at:
zip, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5515856
Dataset updated
Sep 20, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Catarina Branco; Catarina Branco; Marina Kanellou; Antonio González-Martín; Antonio González-Martín; Miguel Arenas; Miguel Arenas; Marina Kanellou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
********* Observed data *********
The file ObsData.arp contains the sequences of the mtDNA hypervariable I region from 720 individuals belonging to 25 Southeast Asian populations used as input file to compute the summary statistics with Arlequin. For further details on the format and available Summary statistics see the manual of Arlequin.

********* Input files for simulations *********
For each evolutionary scenario (NONE, LGP, LDD and LGP&LDD) find a folder (named after the scenario) containing the input files to perform 100 simulations. To run the simulations one should access the command line and execute:
./ABCsampler abc_sensitivity.input
Input files for SPLATCHE3, Arlequin and ABCtoolbox are included (for further details on them see the manual of these software).

********* Selection of the best-fitting evolutionary scenario *********
The R script (ModelSelection.R) can be used to select the evolutionary scenario that better fits the observed data, using the multinomial logistic regression method and the neural networks based method.
Firstly, one will need the summary statistics obtained from observed data (the file entitled ObsSS.txt). Then, one will need the files containing the output files of the simulations under each scenario, i.e., the genetic parameters used under each simulation and the computed summary statistics. Please, note that the output of the ABCtoolbox is a single file containing all this information, but we prefer to use a file with the summary statistics and another with the parameters. Here, we provide example files obtained from 100 simulations of each scenario:
- ssNONE.txt, the summary statistics computed from 100 simulations under the scenario NONE
- parNONE.txt, the genetic and demographic parameters per simulation under the scenario NONE
- ssLGP.txt, the summary statistics computed from 100 simulations under the scenario LGP
- parLGP.txt, the genetic and demographic parameters per simulation under the scenario LGP
- ssLDD.txt, the summary statistics computed from 100 simulations under the scenario LDD
- parLDD.txt, the genetic and demographic parameters per simulation under the scenario LDD
- ssLGP_LDD.txt, the summary statistics computed from 100 simulations under the scenario LGP&LDD
- parLGP_LDD.txt, the genetic and demographic parameters per simulation under the scenario LGP&LDD
To run the script the directory containing these files has to be specified in the script.

For details see Csilléry, et al. (2012): "Approximate Bayesian computation (ABC) in R: a Vignette."

********* Parameters estimation *********
The folder named ParametersEstimation contains all the input files to estimate the genetic and demographic parameters under the selected evolutionary scenario (LGP&LDD). Within the folder, one will find the summary statistics obtained under the selected scenario and the corresponding parameters (completeEstimator_LGP-LDD.txt), the summary statists from observed data (obs11SS.txt) and all the remaining input files to run ABCestimator (for further detail on these files see the manual of ABCtoolbox).

Facebook

Twitter

Click to copy link

Link copied

Cite

Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839

[Dataset] Data for the course "Population Genomics" at Aarhus University

Explore at:

application/gzip, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7670839

Dataset updated

Jan 8, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside
Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by
1. creating the folder Course_Env: mkdir Course_Env
2. untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env
3. Activate the environment: conda activate ./Course_Env
4. Run the unpacking script (it can take quite some time to get it done): conda-unpack
Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.
environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:
1. conda env create -f environment_with_args.yml -p ./Course_Env
2. conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.
Apply commonly used population genomic methods.
Explain the theory behind common population genomic methods.
Reflect on strengths and limitations of population genomic methods.
Interpret and analyze results of population genomic inference.
Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:
- Coop chapters 1, 2, 3, Paper: Genome Diversity Project
Drift and the coalescent:
- Coop chapter 4; Paper: Platypus
- Exercise: Read mapping and base calling
Recombination:
- Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation
- Exercise: Phasing and recombination rate
Population strucure and incomplete lineage sorting:
- Lecture: Coop chapter 6, Review: Incomplete lineage sorting
- Exercise: Working with VCF files
Hidden Markov models:
- Lecture: Durbin chapter 3, Paper: population structure
- Exercise: Inference of population structure and admixture
Ancestral recombination graphs:
- Lecture: Paper: Approximating the ARG, Paper: Tree inference
- Exercise: ARG dashboard exercises + Inference of trees along sequence
Past population demography:
- Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference
- Exercise: Inferring historical populations
Direct and linked selection:
- Lecture: Coop chapters 12, 13, revisit Paper: Tree inference
Admixture:
- Lecture: Review: Admixture, Paper: Admixture inference
- Exercise: Detecting archaic ancestry in modern humans
Genome-wide association study (GWAS):
- Lecture: Coop lecture notes 99-120
- Exercise: GWAS quality control
Heritability:
- Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)
- Exercise: Association testing
Evolution and disease:
- Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)
- Exercise: Estimating heritability

Clear search

Close search

Google apps

Main menu

[Dataset] Data for the course "Population Genomics" at Aarhus University

The 23andMe GWAS summary statistics for top 10,000 genetic markers...

Population genetic data for MT and autosomal genes.

Summary Statistics.

Number of nominally significant genes before and after filtering.

Genomic control and for all tests before and after PC adjustment.

Population genetic diversity statistics for the invariant genes tested for...

Data from: Plant Expression Database

Data from: Replicated analysis of the genetic architecture of quantitative...

Data from: The search for loci under selection: trends, biases and progress

Main model fits and substitution rate predictions for: A quantitative...

Usage Notes

Files

Model Fits

Files Produced by Sims

Genetic diversity statistics assayed per Portuguese P. nigra population and...

The meta-analyzed GWAS summary statistics for 35 lab biomarkers described in...

Genome-wide association summary statistics for varicose veins of lower...

Gget data

Data from: On the genetic architecture of rapidly adapting and convergent...

Data from: Male mouse recombination maps for each autosome identified by...

Description of the data and file structure

Data from: SweGen

Data from: Reference Gene Validation for RT-qPCR, a Note on Different...

Data from: Consequences of the Last Glacial Period on the Genetic Diversity...

[Dataset] Data for the course "Population Genomics" at Aarhus UniversitySee More Versions

[Dataset] Data for the course "Population Genomics" at Aarhus University