Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.
The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.
Description
The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.
The participants must at the end of the course be able to:
The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.
Curriculum
The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.
Course plan
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Phylogenetic re-analyses of Insulin-like Growth Factor Binding Proteins (IGFBPs) based on amino acid sequences. The sequences and alignment described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below) were used to analyze additional IGFBP sequences identified in the genome databases of Anolis carolinensis (anole lizard), Latimeria chalumnae (coelacanth) and Lepisosteus oculatus (spotted gar). Phylogenetic trees were made using neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods, both supported by bootstrap analyses (details below). Figures (PDF-files) of the finished trees are included in the files IGFBP_NJ_figure.pdf and IGFBP_PhyML_figure.pdf. Branch colors are based on chromosomal locations and follow the trees published in Ocampo Daza et al. (2011) (link below). Species abbreviations Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Canis familiaris (Cfa, dog), Monodelphis domestica (Mdo, opossum), Gallus gallus (Gga, chicken), Taeniopygia guttata (Tgu, zebra finch), Anolis carolinensis (Aca, anole lizard), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka),Gasterosteus aculeatus (Gac, stickleback), Tetraodon nigroviridis (Tni, green-spotted pufferfish),Takifugu rubripes (Tru, fugu), Ciona intestinalis (Cin, vase tunicate), Ciona savignyi (Csa, Pacific transparent tunicate) and Branchiostoma floridae (Bfl, Florida lancelet). Sequences used Detailed information about all sequences that were used is included in the file Sequence_info_Tab1.xlsx (MS Excel spreadsheet). This includes database identifiers and chromosome/linkage group locations as well as notes on the manual curation/annotation of the sequences. Alignment The full amino acid sequence alignment used for the phylogenetic analyses is included in an interleaved format (.aln) and a sequential format (.fasta) in the files IGFBP_alignment_interleaved.aln and IGFBP_alignment_sequential.fasta. The alignment was made using the ClustalW algorithm and edited manually as described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below). Anole lizard, coelacanth and spotted gar sequences marked with asterisks are fragments and do not span the full length of the alignment (details in the file Sequence_info_Tab1.xlsx). Phylogenetic analysis, NJ method The Neighbor Joining tree was made in ClustalX 2.0, with settings as described in Ocampo Daza et al. (2011) (link below). The tree is supported by a bootstrap analysis with 1000 bootstrap replicates. The raw output is included in the file IGFBP_NJ.txt and the final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_NJ_rooted.phb. Both files are in the Newick/Phylip data format. Phylogenetic trees, PhyML method The Phylogenetic Maximum Likelihood tree was made using the PhyML3.0 algorithm implemented through the web-based interface available at http://www.atgc-montpellier.fr/phyml/. The following settings were used: . Amino acid subst. model : LG. Proportion of invariable sites : estimated. Number of subst. rate categs : 8. Gamma distribution parameter : estimated. 'Middle' of each rate class : mean. Amino acid equilibrium frequencies : empirical. Optimise tree topology : yes. Tree topology search : NNIs. Starting tree : BioNJ. Add random input tree : no. Optimise branch lengths : yes. Optimise substitution model parameters : yes The tree is supported by a bootstrap analysis with 100 bootstrap replicates. The final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_PhyML.phb (Newick/Phylip format). The raw output files of the PhyML analysis are included in the following files: . igfbp_ml_121119_phy_stdout.txt . igfbp_ml_121119_phy_phyml_tree.txt . igfbp_ml_121119_phy_phyml_stats.txt . igfbp_ml_121119_phy_phyml_boot_trees.txt . igfbp_ml_121119_phy_phyml_boot_stats File formats All phylogenetic data is included in the Newick/Phylip format. For more information on the PhyML output files and data formats, see http://www.atgc-montpellier.fr/download/papers/phyml_manual_2009.pdf.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
F-box protein 5 (FBXO5), an essential subunit of the ubiquitin protein ligase complex, is increasingly recognized to exhibit important biological effects in regulating tumor occurrence and progression. The present research was intended to systematically investigate the latent roles of FBXO5 in prognosis and immunological function across cancers. Pan-cancer analyses of FBXO5 were performed based upon publicly available online databases, mainly including the Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), UCSC Xena, cBioPortal, and ImmuCellAI, revealing the possible relationships between FBXO5 and prognosis, DNA methylation, tumor microenvironment (TME), infiltration of immune cells, immune-related genes, immune checkpoints, tumor mutation burden (TMB), and microsatellite instability (MSI). The results suggested that FBXO5 was expressed at a high level in numerous tumor cell lines with significant upregulation in most cancers as opposed to normal tissues. Of note, elevated expression of FBXO5 was significantly related to an unfavorable prognosis in many cancer types. Furthermore, DNA methylation and TME were confirmed to display evident correlation with the expression of FBXO5 in several malignancies. Moreover, FBXO5 expression was remarkably positively correlated with the levels of infiltrating Treg cells and Tcm cells in most tumors, but negatively correlated with tumor-infiltrating CD8+ T cells, NK/NKT cells, and Th2 cells. Meanwhile, FBXO5 was demonstrated to be co-expressed with the genes encoding immune activating and suppressive factors, chemokines, chemokine receptors, and major histocompatibility complex (MHC). Immune checkpoints, TMB, and MSI were also overtly associated with FBXO5 dysregulation among diverse kinds of cancers. Additionally, the enrichment analyses showed close relationships between FBXO5 expression and the processes related to cell cycle and immune inflammatory response. These findings provided a detailed comprehension of the oncogenic function of FBXO5. Because of its crucial roles in cancer immunity and tumorigenesis, FBXO5 may serve as a novel prognostic indicator and immunotherapeutic target for various malignancies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sequence based phylogenetic analyses of vertebrate oxytocin receptor (OTR) and vasopressin receptor (VPR) genes using amino acid sequences predicted primarily from the Ensembl (http://www.ensembl.org) and Pre Ensembl (http://pre.ensembl.org) genome browsers. These analyses are based on our previously published study identifying OTR and VPR sequences in vertebrate genomes, including previously unrecognised subtypes of V2 receptors - Ocampo Daza D., Lewicka M. and Larhammar D. (2012) The oxytocin/vasopressin receptor family has at least five members in the gnathostome lineage, inclucing two distinct V2 subtypes, General and Comparative Endocrinology 175(1):135-143 (link below). These updated analyses include more species and suggest an update of VPR gene nomenclature. Species and genome assembly information, database identifiers, location data and annotation notes for all identified sequences are included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. These tables also detail the updated vs. outdated nomenclature. All identified and curated amino acid sequences are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'. Legends: Sequences marked * are not full-length, sequences marked # are not full-length and the prediction of the intracellular loop 3 (IL3) is not clear. The sequence marked § is a putative pseudogene. See details in 'Master_OTR_VPR_sequence_tables.xlsx'. Numbers in sequence names indicate the chromosome/linkage group where known. File information 1: Species included in these analyses, with abbreviations: human (Homo sapiens, Hsa), mouse (Mus musculus, Mmu), grey short-tailed opossum (Monodelphis domestica, Mdo), chicken (Gallus gallus, Gga), Carolina anole lizard (Anolis carolinensis, Aca), Western clawed frog (Xenopus tropicalis, Xtr), coelacanth (Latimeria chalumnae, Lch), spotted gar (Lepisosteus oculatus, Loc), zebrafish (Danio rerio, Dre), three-spined stickleback (Gasterosteus aculeatus, Gac), medaka (Oryzias latipes, Ola), Southern platyfish (Xiphophorus maculatus, Xma), Japanese pufferfish (Takifugu rubripes, Tru) and Elephant shark (Callorhinchus milii, Cmi). Alignment file included in FASTA-format: 'align_OTR_VPR_edited.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. This alignment has been curated and edited as described in the Methods sections and Supplementary Material 3 of Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) (link below), removing parts of the amino terminal, carboxy terminal and intracellular loop 3. The alignment was created using the MUSCLE algorithm applied through eBioX (http://www.ebioinformatics.org/ebiox/) using standard settings with 16 iterations. The alignment was edited manually in eBioX. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). All trees were made using the alignment described above. Corresponding figures for each phylogenetic tree are also included as PDF-files. Red nodes and support values indicate values lower than 50%. The neighbor joining (NJ) tree, 'NJ_tree_OPR_VPR.phb', was made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. Phylogenetic Maximum Likelihood (PhyML) trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/) through the PhyML-aBayes application. One tree is supported by a non-parametric bootstrap analysis with 100 replicates, 'PhyML_tree_OTR_VPR_boot.phb', and one is supported by an SH-like approximate likelihood ratio test (aLRT), 'PhyML_tree_OTR_VPR_aLRT.phb'. Both PhyML trees were made with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads). File information 2: The alignment file '120922_align_Tni.fasta' includes OTR and VPR sequences identified in the spotted green pufferfish (Tetraodon nigroviridis, Tni) genome. The alignment file '120922_align_Psi_Cpi.fasta' includes OTR and VPR sequences identified in the Chinese softshell turtle (Pelodiscus sinensis, Psi) and painted turtle (Chrysemys picta bellii, Cpi) genomes. These alignments are based on the alignment used for the study described in Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) and were made using the ClustalW algorithm in ClustalX 2.0 (http://www.clustal.org/clustal2/) with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20). For the spotted green pufferfish, only the automatic Ensembl predictions were used to verify all family members. For the two turtles, the identified seqences were curated manually in order to ratify erroneous automatic exon predictions and to predict exons or whole gene predictions that had not been identified. Genome assembly information, database identifiers, location data and annotation notes for these sequences are also included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. The un-aligned sequence predictions are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'. These sequences were tested in NJ trees made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. The file '120922_NJ_tree_Tni.phb' includes spotted green pufferfish and the file '121022_NJ_tree_Psi_Cpi.phb' includes the two turtle species. Both tree files are in Phylip/Newick format. Corresponding figures for each phylogenetic tree are also included as PDF-files, with the spotted green pufferfish and turtle sequences marked in color.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sequence based phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1 and RH2 clades, with additional analyses including pinopsins, vertebrate ancient (V/A) opsins and Ciona intestinalis opsins. The phylogenetic analyses were made using amino acid sequences predicted from the Ensembl genome browser (http://www.ensembl.org) version 60 (Nov 2010) and the Lepisosteus oculatus (spotted gar) genome assembly LepOcu1 (http://www.ncbi.nlm.nih.gov/genome/assembly/327908/), as well as sequences identified in the NCBI RefSeq database. Database identifiers, location data, genome assembly, and annotation notes for all sequences are included in 'Supplementary Table OPN.xlsx' (Excel spreadsheet). File information: Alignment files are included in FASTA-format: 'align_visual_opsins.fasta' and 'align_visual_opsins_VA_pinops.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. The second alignment file includes additional pinopsin, V/A opsin and Ciona intestinalis opsin sequences, as detailed in 'Supplementary Table OPN.xlsx'. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). The phylogenetic analyses were carried out based on the included alignments using both neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods. Phylogenetic trees are rooted with the human OPN3 amino acid sequence. Corresponding figures for all phylogenetic trees are also included as PDF files. Sequence names/leaf names include species abbreviations (see below) as well as chromosome numbers where known. For the human and zebrafish sequences the full HGNC and ZFIN gene symbols are included. For other species the clade name is indicated in the sequence names/leaf names. The species included in these analyses were (abbreviations and common names in parenthesis): Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Monodelphis domestica (Mdo, grey short-tailed opossum), Gallus gallus (Gga, chicken), Anolis carolinensis (Aca, Carolina anole lizard), Xenopus (Silurana) tropicalis (Xtr, Western clawed frog), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka), Gasterosteus aculeatus (Gac, three-spined stickleback), Tetraodon nigroviridis (Tni, green spotted pufferfish), Geotria australis (Gau, pouched lamprey) and Ciona intestinalis (Cin, transparent sea squirt). Method details: Alignments were created using the ClustalW algorithm with the following settings: Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20. The alignments were edited manually in order to curate short, incomplete or highly divergent amino acid sequence predictions from the genome databases. In this way erroneous automatic exon predictions and exons that had not been predicted could be ratified. Phylogenetic analyses were carried out based on the included alignments. NJ trees were made using standard settings in ClustalX 2.0.12 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. PhyML trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads). PhyML trees are supported by a non-parametric bootstrap analysis with 100 replicates applied through PhyML.
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustered regularly interspaced short palindromic repeats (CRISPR)-based HIV-1 genome editing has shown promising outcomes in in vitro and in vivo viral infection models. However, existing HIV-1 sequence variants have been shown to reduce CRISPR-mediated efficiency and induce viral escape. Two metrics, global patient coverage and global subtype coverage, were used to identify guide RNA (gRNA) sequences that account for this viral diversity from the perspectives of cross-patient and cross-subtype gRNA design, respectively. Computational evaluation using these parameters and over 3.6 million possible 20-bp sequences resulted in nine lead gRNAs, two of which were previously published. This analysis revealed the benefit and necessity of considering all sequence variants for gRNA design. Of the other seven identified novel gRNAs, two were of note as they targeted interesting functional regions. One was a gRNA predicted to induce structural disruption in the nucleocapsid binding site (Ψ), which holds the potential to stop HIV-1 replication during the viral genome packaging process. The other was a reverse transcriptase (RT)-targeting gRNA that was predicted to cleave the subdomain responsible for dNTP incorporation. CRISPR-mediated sequence edits were predicted to occur on critical residues where HIV-1 has been shown to develop resistance against antiretroviral therapy (ART), which may provide additional evolutionary pressure at the DNA level. Given these observations, consideration of broad-spectrum gRNAs and cross-subtype diversity for gRNA design is not only required for the development of generalizable CRISPR-based HIV-1 therapy, but also helps identify optimal target sites.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.
The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.
Description
The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.
The participants must at the end of the course be able to:
The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.
Curriculum
The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.
Course plan