100+ datasets found
  1. [Dataset] Data for the course "Population Genomics" at Aarhus University

    • zenodo.org
    application/gzip, bin
    Updated Jan 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

    1. Data.tar.gz Contains the datasets and executable files for some of the softwares
      You can unpack by simply doing
      tar -zxf Data.tar.gz -C ./
      This will create a folder called Data with the uncompressed material inside
    2. Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by
      1. creating the folder Course_Env: mkdir Course_Env
      2. untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env
      3. Activate the environment: conda activate ./Course_Env
      4. Run the unpacking script (it can take quite some time to get it done): conda-unpack
    3. Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.
    4. environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:
      1. conda env create -f environment_with_args.yml -p ./Course_Env
      2. conda activate ./Course_Env

    The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

    Description

    The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

    The participants must at the end of the course be able to:

    • Identify an experimental platform relevant to a population genomic analysis.
    • Apply commonly used population genomic methods.
    • Explain the theory behind common population genomic methods.
    • Reflect on strengths and limitations of population genomic methods.
    • Interpret and analyze results of population genomic inference.
    • Formulate population genetics hypotheses based on data

    The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

    Curriculum

    The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

    Course plan

    1. Course intro and overview:
    2. Drift and the coalescent:
    3. Recombination:
    4. Population strucure and incomplete lineage sorting:
    5. Hidden Markov models:
    6. Ancestral recombination graphs:
    7. Past population demography:
    8. Direct and linked selection:
    9. Admixture:
    10. Genome-wide association study (GWAS):
    11. Heritability:
      • Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)
      • Exercise: Association testing
    12. Evolution and disease:
      • Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)
      • Exercise: Estimating heritability
  2. H

    The 23andMe GWAS summary statistics for top 10,000 genetic markers...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoyu Zhang (2023). The 23andMe GWAS summary statistics for top 10,000 genetic markers associated with three traits [Dataset]. http://doi.org/10.7910/DVN/3NBNCV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Haoyu Zhang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes GWAS (Genome-Wide Association Studies) summary statistics for top 10,000 genetic markers associated with three traits across five diverse ancestries. These traits and ancestries form part of the study outlined in the manuscript: "A new method for multiancestry polygenic prediction improves performance across diverse populations". The research manuscript can be accessed via this link: https://www.biorxiv.org/content/10.1101/2022.03.24.485519v5.abstract. The three traits explored in this dataset include height, sing back musical note (the ability to replicate a musical note), and morning person. These traits were examined across five ancestral backgrounds: African American (AFR), Native American (AMR), European (EUR), East Asian (EAS), and South Asian (SAS).

  3. f

    Population genetic data for MT and autosomal genes.

    • datasetcatalog.nlm.nih.gov
    Updated Aug 29, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    De Hoff, Peter L.; Ferris, Patrick; Miyagi, Ayano; Olson, Bradley J. S. C.; Umen, James G.; Geng, (2013). Population genetic data for MT and autosomal genes. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001630528
    Explore at:
    Dataset updated
    Aug 29, 2013
    Authors
    De Hoff, Peter L.; Ferris, Patrick; Miyagi, Ayano; Olson, Bradley J. S. C.; Umen, James G.; Geng,
    Description

    Notes: na not applicable.1Number of MT+ and MT− sequences analyzed for each gene.2Polymorphism rate for silent sites (non-coding and synonymous)×1000. Standard deviation in parentheses. Values are given for all sequences (total) and for the MT+ and MT− isolates separately. MT+ and MT− values that differ from the total value by >1 standard deviation are shown in bold.3Population differentiation between MT+ and MT− isolates.Values near 0 correspond to no differentiation and values near 1 correspond to complete differentiation. Bold values correspond to those genes showing significant differentiation between MT+ and MT− isolates.

  4. Summary Statistics.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason M. Fletcher (2023). Summary Statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0050576.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jason M. Fletcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NHANES 1991–1994 Genetic Sample (N = 6,178).Notes: Author’s calculations from NHANES Data. Sample weights used.

  5. Number of nominally significant genes before and after filtering.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder (2023). Number of nominally significant genes before and after filtering. [Dataset]. http://doi.org/10.1371/journal.pgen.1003443.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: Significance level is 0.01, not corrected for muliple testing. The analyses of the first two rows are for all genes that have at least one MAC in Baylor and Broad dataset. The last rows are restricted to the genes that have more than 15 minor alleles after combining Baylor and Broad datasets.

  6. Genomic control and for all tests before and after PC adjustment.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder (2023). Genomic control and for all tests before and after PC adjustment. [Dataset]. http://doi.org/10.1371/journal.pgen.1003443.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Li Liu; Aniko Sabo; Benjamin M. Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A. Bodea; Donna Muzny; Jeffrey G. Reid; Eric Banks; Hillary Coon; Mark DePristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E. Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D. Buxbaum; Edwin H. Cook Jr; Bernie Devlin; Gerard D. Schellenberg; James S. Sutcliffe; Mark J. Daly; Richard A. Gibbs; Kathryn Roeder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: These analyses are restricted to the genes that have more than 4 minor alleles in the samples used in each study. and are calculated based on the median and the 1st quantile of the p-value distribution, respectively. PC adjustment is based on the common variants (CVs) eigen-vectors.

  7. f

    Population genetic diversity statistics for the invariant genes tested for...

    • datasetcatalog.nlm.nih.gov
    Updated Jun 10, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Garcia-Navarro, Elena; Burke, John M.; McAssey, Edward V.; Nambeesan, Savithri; Mandel, Jennifer R. (2014). Population genetic diversity statistics for the invariant genes tested for evidence of positive selection. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001190595
    Explore at:
    Dataset updated
    Jun 10, 2014
    Authors
    Garcia-Navarro, Elena; Burke, John M.; McAssey, Edward V.; Nambeesan, Savithri; Mandel, Jennifer R.
    Description

    Panel, W = wild, P = primitive, I = improved; L = alignment length in basepairs; l = number of synonymous sites; S = number of segregating synonymous sites; π = nucleotide diversity for synonymous sites; θ = Waterson's theta for synonymous sites; Sig. = ML-HKA significance: ns = not significant, P<0.001 = ***, P<0.01 = **, P<0.05 = *. Bold genes are those that showed significant evidence of selection. Note: we were unable to successfully sequence the IPT5 gene in P.

  8. u

    Data from: Plant Expression Database

    • agdatacommons.nal.usda.gov
    bin
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudhansu S. Dash; John Van Hemert; Lu Hong; Roger P. Wise; Julie A. Dickerson (2024). Plant Expression Database [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Plant_Expression_Database/24661179
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    PLEXdb
    Authors
    Sudhansu S. Dash; John Van Hemert; Lu Hong; Roger P. Wise; Julie A. Dickerson
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    [NOTE: PLEXdb is no longer available online. Oct 2019.] PLEXdb (Plant Expression Database) is a unified gene expression resource for plants and plant pathogens. PLEXdb is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data. PLEXdb (http://www.plexdb.org), in partnership with community databases, supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facilitate the interpretation of structure, function and regulation of genes in economically important plants. A list of Gene Atlas experiments highlights data sets that give responses across different developmental stages, conditions and tissues. Tools at PLEXdb allow users to perform complex analyses quickly and easily. The Model Genome Interrogator (MGI) tool supports mapping gene lists onto corresponding genes from model plant organisms, including rice and Arabidopsis. MGI predicts homologies, displays gene structures and supporting information for annotated genes and full-length cDNAs. The gene list-processing wizard guides users through PLEXdb functions for creating, analyzing, annotating and managing gene lists. Users can upload their own lists or create them from the output of PLEXdb tools, and then apply diverse higher level analyses, such as ANOVA and clustering. PLEXdb also provides methods for users to track how gene expression changes across many different experiments using the Gene OscilloScope. This tool can identify interesting expression patterns, such as up-regulation under diverse conditions or checking any gene’s suitability as a steady-state control. Resources in this dataset:Resource Title: Website Pointer for Plant Expression Database, Iowa State University. File Name: Web Page, url: https://www.bcb.iastate.edu/plant-expression-database [NOTE: PLEXdb is no longer available online. Oct 2019.] Project description for the Plant Expression Database (PLEXdb) and integrated tools.

  9. d

    Data from: Replicated analysis of the genetic architecture of quantitative...

    • datadryad.org
    zip
    Updated Nov 4, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna W. Santure; Jocelyn Poissant; Isabelle De Cauwer; Kees van Oers; Matthew R. Robinson; John L. Quinn; Martien A. M. Groenen; Marcel E. Visser; Ben C. Sheldon; Jon Slate (2015). Replicated analysis of the genetic architecture of quantitative traits in two wild great tit populations [Dataset]. http://doi.org/10.5061/dryad.5t32v
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 4, 2015
    Dataset provided by
    Dryad
    Authors
    Anna W. Santure; Jocelyn Poissant; Isabelle De Cauwer; Kees van Oers; Matthew R. Robinson; John L. Quinn; Martien A. M. Groenen; Marcel E. Visser; Ben C. Sheldon; Jon Slate
    Time period covered
    Oct 25, 2015
    Area covered
    5°50'E, United Kingdom, 1°20’W, 52°02’N, 5°51’E, 51°46’N, Westerheide, 52°01'N, Wytham Woods, De Hoge Veluwe National Park
    Description

    data_readmeA readme file explaining the information loaded to DryadNL map fileplink-style map file with marker locations for the NL dataset.The map file is in an amended plink format with chromosome in column 1, SNP name in column 2, cM position in column 3 (note this is different from defaul plink map files which have distances in Morgans) and genome order in column 4 (again this is different from the plink format which would usually have bp position in this column).

    Chromosome codings in the map file (column 1) are as follows: Chromosomes 1-15 and 17-28 are the corresponding chromosomes 1-15 and 17-28 in the great tit genome Chromosome 29 = chromosome 1A Chromosome 30 = chromosome 4A Chromosome 31 = Z chromosome Chromosome 32 = Linkage group LGE22NL_numeric_ids.mapNL genotype fileGenotypes for the 1,407 NL individuals. These are standard plink files (see http://pngu.mgh.harvard.edu/~purcell/plink/) where the first column gives a family id, the second column an individual id (in this ...

  10. f

    Data from: The search for loci under selection: trends, biases and progress

    • datasetcatalog.nlm.nih.gov
    Updated Jun 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umbers, Kate D. L.; Rymer, Paul D.; Ahrens, Collin W.; Stow, Adam; Dillon, Shannon; Dudaniec, Rachael Y.; Bragg, Jason (2022). Data from: The search for loci under selection: trends, biases and progress [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000266863
    Explore at:
    Dataset updated
    Jun 10, 2022
    Authors
    Umbers, Kate D. L.; Rymer, Paul D.; Ahrens, Collin W.; Stow, Adam; Dillon, Shannon; Dudaniec, Rachael Y.; Bragg, Jason
    Description

    Detecting genetic variants under selection using FST outlier analysis (OA) and environmental association analyses (EAA) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field-based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analyzed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographic coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP-based selection studies and help to progress landscape and evolutionary genomics. Usage Notes Table S1 - Full data set.Data was collected by reading papers associated with environmental association analyses. Data includes location, species, methods used, genetic parameters of data sets reviewed, and analytical parameters of the analyses.Table S1_data.xlsxR code for mixed-effects linear modelsThe R code used to create the figures and estimate regressions of the data set.Ahrens et al 2018_MolEcol_review.R

  11. d

    Main model fits and substitution rate predictions for: A quantitative...

    • search.dataone.org
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vince Buffalo; Andrew Kern (2025). Main model fits and substitution rate predictions for: A quantitative genetic model of background selection in humans [Dataset]. http://doi.org/10.5061/dryad.qnk98sfnv
    Explore at:
    Dataset updated
    Jul 26, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Vince Buffalo; Andrew Kern
    Time period covered
    Jan 1, 2023
    Description

    Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This can be thought of as a “linked selection signal" that reflects the impact of selection varying according to the placement of functional regions and recombination rates along the genome. Previous work has shown that negative selection against the steady influx of new deleterious mutations into conserved regions is the predominant mode of selection in humans. However, the theoretic model that underpins these results, classic Background Selection theory, is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of the linked selection, which models the effects of weak draft created according to how polygenic additive fitness variance is distributed along the genome. We use a recent model that jointly predicts the equilibrium fitness variance and su..., These Python pickle files contain the model outputs from bgspy (http://github.com/vsbuffalo/bprime/) for the CADD 6%, CADD 8%, PhastCons Priority, and Feature Priority Models., , # Main model fits and substitution rate predictions for: A quantitative genetic model of background selection in humans

    Usage Notes

    All files are in in standard Python file formats. To load the pickle files, install the accompanying bprime software available on GiHub.

    Note that all TSV files here were written by analyses in Jupyter notebooks that are available on the bprime GitHub page.

    Files

    Model Fits

    There are pickle files of model results, generated by bgspy collect.

    • cadd6_decode_altgrid.pkl: CADD 6%
    • cadd8_decode_altgrid.pkl: CADD 8%
    • CDS_genes_phastcons_decode_altgrid.pkl: Feature Priority
    • phastcons_CDS_genes_decode_altgrid.pkl: PhastCons Priority

    Files Produced by Sims

    • empiricalB_chr10_expansion_false_h_0.5_results.npz: simulation B "empirical" B maps for fixed demography

    • empiricalB_chr10_expansion_1.004_9.3_h_0.5_results.npz: ...

  12. f

    Genetic diversity statistics assayed per Portuguese P. nigra population and...

    • datasetcatalog.nlm.nih.gov
    Updated Dec 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lima-Brito, josé; Dias, Alexandra; Fady, Bruno; Gaspar, Maria João; Bagnoli, Francesca; Spanu, Ilaria; Vendramin, Giovanni; Giovanelli, Guia; Carvalho, Ana; Lousada, José; silva, maria emilia (2019). Genetic diversity statistics assayed per Portuguese P. nigra population and SSR locus. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000163598
    Explore at:
    Dataset updated
    Dec 11, 2019
    Authors
    Lima-Brito, josé; Dias, Alexandra; Fady, Bruno; Gaspar, Maria João; Bagnoli, Francesca; Spanu, Ilaria; Vendramin, Giovanni; Giovanelli, Guia; Carvalho, Ana; Lousada, José; silva, maria emilia
    Description

    Genetic diversity statistics assayed per Portuguese P. nigra population and SSR locus. Notes: na - observed number of alleles; ne - effective number of alleles (Kimura and Crow 1964); I - Shannon’s Information Index (Lewontin 1972); h – Nei’s gene diversity index (Nei 1973); Ho – observed heterozygosity; He – expected heterozygosity (Levene 1949); the F – fixation index; and s.d. – standard deviation.

  13. The meta-analyzed GWAS summary statistics for 35 lab biomarkers described in...

    • nih.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yosuke Tanigawa; Nasa Sinnott-Armstrong; Manuel Rivas (2023). The meta-analyzed GWAS summary statistics for 35 lab biomarkers described in 'Genetics of 35 blood and urine biomarkers in the UK Biobank' [Dataset]. http://doi.org/10.35092/yhjc.12355382.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yosuke Tanigawa; Nasa Sinnott-Armstrong; Manuel Rivas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains meta-analyzed GWAS summary statistics for 35 biomarker traits described in the following preprint:N. Sinnott-Armstrong*, Y. Tanigawa*, et al, Genetics of 38 blood and urine biomarkers in the UK Biobank. bioRxiv, 660506 (2019). doi:10.1101/660506Note that we are preparing a revised version of the manuscript and this dataset contains 35 (instead of 38) biomarker phenotypes.We provide the list of 35 biomarkers in "list_of_35_biomarkers.tsv". We used the "Phenotype_name" column in this table for the file names. For each phenotype, we provide two compressed tab-delimited files, named "[Phenotype_name].array.gz" and "[Phenotype_name].imp.gz", which contain the summary statistics for genetic variants on the genotyping array and the imputed dataset, respectively.We used METAL for the meta-analysis for 4 populations (White British, non-British White, African, and South Asian) within UK Biobank. The files have the following columns: CHROM: the chromosomePOS: the positionMarkerName: the variant identifierREF: the reference alleleALT: the alternate alleleEffect: the effect size (BETA) estimateStdErr: the standard error of effect size estimateP-value: the p-value of the associationDirection: the direction of effect sizeHetISq, HetChiSq, HetDf, HetPVal: heterogeneity statistics from METAL Note that we used GRCh37/hg19 genome reference in the analysis and the BETA is always reported for the alternate allele.Please also check the METAL documentation (https://genome.sph.umich.edu/wiki/METAL_Documentation).The summary statistic files are compressed with bgzip and indexed with tabix (the .tbi files). One should be able to read those files with the standard gzip/zcat.

  14. Genome-wide association summary statistics for varicose veins of lower...

    • zenodo.org
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Zh. Sharapov; Sodbo Zh. Sharapov; Tatiana I. Shashkova; Yakov A. Tsepilov; Yakov A. Tsepilov; Tatiana I. Shashkova (2020). Genome-wide association summary statistics for varicose veins of lower extremities [Dataset]. http://doi.org/10.5281/zenodo.1323484
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Zh. Sharapov; Sodbo Zh. Sharapov; Tatiana I. Shashkova; Yakov A. Tsepilov; Yakov A. Tsepilov; Tatiana I. Shashkova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains summary statistics for the discovery and the replication stages of the large-scale genome-wide associations study for varicose veins of lower extremities. The discovery stage was based on genetic association data provided by the Neale Lab (http://www.nealelab.is/) for 337,199 UK biobank individuals. Phenotype “varicose veins of lower extremities” was defined based on International Classification of Disease (ICD-10) billing code “I83” present in the electronic patient record. Data were adjusted for two potential confounders – body mass index and deep venous thrombosis. A replication cohort (N=71,256) was generated by means of reverse meta-analysis of two overlapping datasets: genetic association data for 408,455 UK Biobank participants provided by the Gene ATLAS database (http://geneatlas.roslin.ed.ac.uk/), and the above mentioned data provided by the Neale Lab.

    Please, note, that in Shadrina et al (PLOS Genetics 2019) we only used "discovery" dataset, while in biorxiv preprint (https://doi.org/10.1101/368365) both discovery and replication datasets were used.

    The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilisation of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant.

    When using downloaded data, please cite corresponding paper and this repository:

    1. Shadrina, A. S., Sharapov, S. Z., Shashkova, T. I. & Tsepilov, Y. A. Varicose veins of lower extremities: Insights from the first large-scale genetic study. PLOS Genet. 15, e1008110 (2019).

    2. Alexandra S. Shadrina, Sodbo Zh. Sharapov, Tatiana I. Shashkova, & Yakov A. Tsepilov. (2018). Genome-wide association summary statistics for varicose veins of lower extremities (Version 1) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.1323484

    Funding:

    The work of ASS was supported by the Russian Science Foundation [Project No 17-75-20223].
    The work of YAT was supported by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
    The work of SZS was supported by the Institute of Cytology and Genetics [Project No 0324-2018-0017].

    Column headers - discovery

    1. SNP: SNP rsID
    2. b: effect size of effect allele
    3. se: standard error of effect size
    4. chi2: T^2 value of effect allele
    5. Pval: P-value of association (without GC correction)
    6. N: sample size
    7. Chr: chromosome
    8. Pos: position (GRCh37 build)
    9. A1: effect allele (coded as "1")
    10. A2: reference allele (coded as "0")

    Column headers - replication

    1. SNP: SNP rsID
    2. A1: effect allele (coded as "1")
    3. A2: reference allele (coded as "0")
    4. N: Total sample size
    5. Z: Z-value of effect allele
    6. P: P-value of association (without GC correction)
  15. Gget data

    • kaggle.com
    zip
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NG NM WT (2023). Gget data [Dataset]. https://www.kaggle.com/datasets/ngnmwt/gget-data
    Explore at:
    zip(452328 bytes)Available download formats
    Dataset updated
    Jul 1, 2023
    Authors
    NG NM WT
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description


















































































    pip install gget
    
    import gget
    
    gget.ref(species=None, list_species=True)[:10]
    
    
    ['acanthochromis_polyacanthus', 'accipiter_nisus', 'ailuropoda_melanoleuca', 'amazona_collaria', 'amphilophus_citrinellus', 'amphiprion_ocellaris', 'amphiprion_percula', 'anabas_testudineus', 'anas_platyrhynchos', 'anas_platyrhynchos_platyrhynchos']
    
    gget.ref(species='mus_musculus')
    
    {'mus_musculus': {'transcriptome_cdna': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/cdna/Mus_musculus.GRCm39.cdna.all.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:32', 'bytes': '49M'}, 'genome_dna': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/dna/Mus_musculus.GRCm39.dna.primary_assembly.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '18:37', 'bytes': '769M'}, 'annotation_gtf': {'ftp': 'http://ftp.ensembl.org/pub/release-108/gtf/mus_musculus/Mus_musculus.GRCm39.108.gtf.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:16', 'bytes': '31M'}, 'coding_seq_cds': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/cds/Mus_musculus.GRCm39.cds.all.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:32', 'bytes': '16M'}, 'non-coding_seq_ncRNA': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/ncrna/Mus_musculus.GRCm39.ncrna.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:45', 'bytes': '7.6M'}, 'protein_translation_pep': {'ftp': 'http://ftp.ensembl.org/pub/release-108/fasta/mus_musculus/pep/Mus_musculus.GRCm39.pep.all.fa.gz', 'ensembl_release': 108, 'release_date': '2022-10-04', 'release_time': '19:32', 'bytes': '11M'}}}
    
    
    
    dl_links = gget.ref(species='mus_musculus', which=['ncrna'], ftp=True) import urllib.request urllib.request.urlretrieve(dl_links[0], './GRCm39_rna.fa.gz')
    
    
    ('./GRCm39_rna.fa.gz', 
    from pyGeno.Genome import * #load a genome ref = Genome(name = 'GRCh37.75') #load a gene gene = ref.get(Gene, name = 'TPST2')[0] #print the sequences of all the isoforms for prot in gene.get(Protein) : print prot.sequence
    
    pers = Genome(name = 'GRCh37.75', SNPs = ["RNA_S1"], SNPFilter = myFilter())
    
    
  16. d

    Data from: On the genetic architecture of rapidly adapting and convergent...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Whiting; Josephine Paris; Paul Parsons; Sophie Matthews; Yuridia Reynoso; Kimberly Hughes; David Reznick; Bonnie Fraser (2022). On the genetic architecture of rapidly adapting and convergent life history traits in guppies [Dataset]. http://doi.org/10.5061/dryad.w3r2280sk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    Dryad
    Authors
    James Whiting; Josephine Paris; Paul Parsons; Sophie Matthews; Yuridia Reynoso; Kimberly Hughes; David Reznick; Bonnie Fraser
    Time period covered
    Feb 16, 2022
    Description

    Sequencing data was derived through RAD-sequencing of four F2 cross families (F0s and F2s sequenced). Phenotype data was derived by phenotyping lab-reared individuals according to the methods in Whiting et al. 2022. The linkage map was made using LepMap3.

  17. d

    Data from: Male mouse recombination maps for each autosome identified by...

    • dataone.org
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lutz Froenicke; Lorinda Anderson; Johannes Wienberg; Terry Ashley (2025). Male mouse recombination maps for each autosome identified by chromosome painting [Dataset]. http://doi.org/10.5061/dryad.gb5mkkwx5
    Explore at:
    Dataset updated
    Jul 26, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Lutz Froenicke; Lorinda Anderson; Johannes Wienberg; Terry Ashley
    Time period covered
    Jan 1, 2023
    Description

    Linkage maps constructed from genetic analysis of gene order and crossover frequency provide few clues to the basis of the genomewide distribution of meiotic recombination, such as chromosome structure, that influences meiotic recombination. To bridge this gap, we have generated the first cytological recombination map that identifies individual autosomes in the male mouse. We prepared meiotic chromosome (synaptonemal complex [SC]) spreads from 110 mouse spermatocytes, identified each autosome by multicolor fluorescence in situ hybridization of chromosome- specific DNA libraries, and mapped 12,000 sites of recombination along individual autosomes, using immunolocalization of MLH1, a mismatch repair protein that marks crossover sites. We show that SC length is strongly correlated with crossover frequency and distribution. Although the length of most SCs corresponds to that predicted from their mitotic chromosome length rank, several SCs are longer or shorter than expected, with correspond..., SC Spreads and Immunostaining Three juvenile (20–21 d old) C57BL/6J mice (the same line analyzed by the Mouse Genome Sequencing Project) were used to prepare and immunolabel the SC spreads, as described elsewhere (Anderson et al. 1999). Complete sets of SCs in which the SCs were well separated but not obviously stretched or broken and that had ≥ 19 MLH1 foci were selected for analysis. Three fluorescent images (4, 6-diamino-2-phyenylindole [DAPI], SCP3, and MLH1) were captured for each SC set. mFISH After image acquisition of the immunofluorescence signals, the spermatocyte preparations were subjected to two or three rounds of denaturation and FISH. To identify each autosome, chromosome-specific painting probes (Rabbitts et al. 1995) were combinatorially labeled with fluorescein isothiocyanate (FITC)–2-deoxyuridine 5-tri phosphate (dUTP), Cy5-dUTP (both from Amersham), or 6-carboxytetramethylrhodamine (TAMRA)-dUTP (Applied Biosystems) and were combined to form two different probe pools ..., , # Male mouse recombination maps for each autosome identified by chromosome painting

    Description of the data and file structure

    The data are presented in an Excel spreadsheet with 22 sheets. Sheet 1 (karyotype-absolute positi calc) defines the average length of each mouse SC, after identification using chromosome-specific DNA probes. Sheet 2 (Notes) contains definitions of the headings used for Sheet 3 (raw data sorted by SC) and Sheets 4 through 22 ("SC1 abs" through "SC19 abs"). Sheet 2 also contains explanations for how the karyotype was derived, and two references in which this data was used for publication are also presented. Sheet 3 contains the positions of all MLH1 foci observed on all of the SCs with each MLH1 focus position expressed as a fraction of SC length from the centromere. Sheets 4 – 22 (labeled as "SC1 abs", "SC2 abs", "SC3 abs", "SC4 abs", SC5 abs, "SC6 abs", "SC7 abs", "SC8 abs", "SC9 abs", "SC10 abs", "SC11 abs", "SC12 abs", "SC13 abs", "SC14 abs", "SC15 ...

  18. n

    Data from: SweGen

    • swefreq-dev.nbis.se
    Updated Apr 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). SweGen [Dataset]. https://swefreq-dev.nbis.se/dataset/SweGen/browser/gene/ENSG00000186951
    Explore at:
    Dataset updated
    Apr 15, 2019
    Description

    This dataset contains whole-genome variant frequencies for 1000 Swedish individuals generated within the SweGen project. The frequency data is intended to be used as a resource for the research community and clinical genetics laboratories.

    Please note that the 1000 individuals included in the SweGen project represent a cross-section of the Swedish population and that no disease information has been used for the selection. The frequency data may therefore include genetic variants that are associated with, or causative of, disease.

    We request that any use of data from the SweGen project cite this article in the European Journal of Human Genetics.

    Individual positions in the genome can be viewed using the Beacon or Graphical Browser. To download the variant frequency file you need to register.

    A high confidence set of HLA allele frequencies is available for download under Dataset Access. For a detailed description of the SweGen HLA analysis, please see this bioRxiv preprint.

  19. f

    Data from: Reference Gene Validation for RT-qPCR, a Note on Different...

    • datasetcatalog.nlm.nih.gov
    Updated Mar 31, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dern-Wieloch, Jutta; Brehm, Ralph; Weigel, Roswitha; Nettersheim, Daniel; Bergmann, Martin; Schumacher, Valérie; Vandekerckhove, Linos; Schorle, Hubert; De Spiegelaere, Ward; Kliesch, Sabine; Fink, Cornelia (2015). Reference Gene Validation for RT-qPCR, a Note on Different Available Software Packages [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001902746
    Explore at:
    Dataset updated
    Mar 31, 2015
    Authors
    Dern-Wieloch, Jutta; Brehm, Ralph; Weigel, Roswitha; Nettersheim, Daniel; Bergmann, Martin; Schumacher, Valérie; Vandekerckhove, Linos; Schorle, Hubert; De Spiegelaere, Ward; Kliesch, Sabine; Fink, Cornelia
    Description

    BackgroundAn appropriate normalization strategy is crucial for data analysis from real time reverse transcription polymerase chain reactions (RT-qPCR). It is widely supported to identify and validate stable reference genes, since no single biological gene is stably expressed between cell types or within cells under different conditions. Different algorithms exist to validate optimal reference genes for normalization. Applying human cells, we here compare the three main methods to the online available RefFinder tool that integrates these algorithms along with R-based software packages which include the NormFinder and GeNorm algorithms.Results14 candidate reference genes were assessed by RT-qPCR in two sample sets, i.e. a set of samples of human testicular tissue containing carcinoma in situ (CIS), and a set of samples from the human adult Sertoli cell line (FS1) either cultured alone or in co-culture with the seminoma like cell line (TCam-2) or with equine bone marrow derived mesenchymal stem cells (eBM-MSC). Expression stabilities of the reference genes were evaluated using geNorm, NormFinder, and BestKeeper. Similar results were obtained by the three approaches for the most and least stably expressed genes. The R-based packages NormqPCR, SLqPCR and the NormFinder for R script gave identical gene rankings. Interestingly, different outputs were obtained between the original software packages and the RefFinder tool, which is based on raw Cq values for input. When the raw data were reanalysed assuming 100% efficiency for all genes, then the outputs of the original software packages were similar to the RefFinder software, indicating that RefFinder outputs may be biased because PCR efficiencies are not taken into account.ConclusionsThis report shows that assay efficiency is an important parameter for reference gene validation. New software tools that incorporate these algorithms should be carefully validated prior to use.

  20. Data from: Consequences of the Last Glacial Period on the Genetic Diversity...

    • zenodo.org
    bin, txt, zip
    Updated Sep 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Catarina Branco; Catarina Branco; Marina Kanellou; Antonio González-Martín; Antonio González-Martín; Miguel Arenas; Miguel Arenas; Marina Kanellou (2021). Consequences of the Last Glacial Period on the Genetic Diversity of Southeast Asians [Dataset]. http://doi.org/10.5281/zenodo.5515856
    Explore at:
    zip, txt, binAvailable download formats
    Dataset updated
    Sep 20, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Catarina Branco; Catarina Branco; Marina Kanellou; Antonio González-Martín; Antonio González-Martín; Miguel Arenas; Miguel Arenas; Marina Kanellou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ********* Observed data *********
    The file ObsData.arp contains the sequences of the mtDNA hypervariable I region from 720 individuals belonging to 25 Southeast Asian populations used as input file to compute the summary statistics with Arlequin. For further details on the format and available Summary statistics see the manual of Arlequin.

    ********* Input files for simulations *********
    For each evolutionary scenario (NONE, LGP, LDD and LGP&LDD) find a folder (named after the scenario) containing the input files to perform 100 simulations. To run the simulations one should access the command line and execute:
    ./ABCsampler abc_sensitivity.input
    Input files for SPLATCHE3, Arlequin and ABCtoolbox are included (for further details on them see the manual of these software).

    ********* Selection of the best-fitting evolutionary scenario *********
    The R script (ModelSelection.R) can be used to select the evolutionary scenario that better fits the observed data, using the multinomial logistic regression method and the neural networks based method.
    Firstly, one will need the summary statistics obtained from observed data (the file entitled ObsSS.txt). Then, one will need the files containing the output files of the simulations under each scenario, i.e., the genetic parameters used under each simulation and the computed summary statistics. Please, note that the output of the ABCtoolbox is a single file containing all this information, but we prefer to use a file with the summary statistics and another with the parameters. Here, we provide example files obtained from 100 simulations of each scenario:
    - ssNONE.txt, the summary statistics computed from 100 simulations under the scenario NONE
    - parNONE.txt, the genetic and demographic parameters per simulation under the scenario NONE
    - ssLGP.txt, the summary statistics computed from 100 simulations under the scenario LGP
    - parLGP.txt, the genetic and demographic parameters per simulation under the scenario LGP
    - ssLDD.txt, the summary statistics computed from 100 simulations under the scenario LDD
    - parLDD.txt, the genetic and demographic parameters per simulation under the scenario LDD
    - ssLGP_LDD.txt, the summary statistics computed from 100 simulations under the scenario LGP&LDD
    - parLGP_LDD.txt, the genetic and demographic parameters per simulation under the scenario LGP&LDD
    To run the script the directory containing these files has to be specified in the script.

    For details see Csilléry, et al. (2012): "Approximate Bayesian computation (ABC) in R: a Vignette."

    ********* Parameters estimation *********
    The folder named ParametersEstimation contains all the input files to estimate the genetic and demographic parameters under the selected evolutionary scenario (LGP&LDD). Within the folder, one will find the summary statistics obtained under the selected scenario and the corresponding parameters (completeEstimator_LGP-LDD.txt), the summary statists from observed data (obs11SS.txt) and all the remaining input files to run ABCestimator (for further detail on these files see the manual of ABCtoolbox).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Organization logo

[Dataset] Data for the course "Population Genomics" at Aarhus University

Explore at:
application/gzip, binAvailable download formats
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

  1. Data.tar.gz Contains the datasets and executable files for some of the softwares
    You can unpack by simply doing
    tar -zxf Data.tar.gz -C ./
    This will create a folder called Data with the uncompressed material inside
  2. Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by
    1. creating the folder Course_Env: mkdir Course_Env
    2. untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env
    3. Activate the environment: conda activate ./Course_Env
    4. Run the unpacking script (it can take quite some time to get it done): conda-unpack
  3. Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.
  4. environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:
    1. conda env create -f environment_with_args.yml -p ./Course_Env
    2. conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

  • Identify an experimental platform relevant to a population genomic analysis.
  • Apply commonly used population genomic methods.
  • Explain the theory behind common population genomic methods.
  • Reflect on strengths and limitations of population genomic methods.
  • Interpret and analyze results of population genomic inference.
  • Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

  1. Course intro and overview:
  2. Drift and the coalescent:
  3. Recombination:
  4. Population strucure and incomplete lineage sorting:
  5. Hidden Markov models:
  6. Ancestral recombination graphs:
  7. Past population demography:
  8. Direct and linked selection:
  9. Admixture:
  10. Genome-wide association study (GWAS):
  11. Heritability:
    • Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)
    • Exercise: Association testing
  12. Evolution and disease:
    • Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)
    • Exercise: Estimating heritability
Search
Clear search
Close search
Google apps
Main menu