7 datasets found
  1. [Dataset] Data for the course "Population Genomics" at Aarhus University

    • zenodo.org
    application/gzip, bin
    Updated Jan 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

    1. Data.tar.gz Contains the datasets and executable files for some of the softwares
      You can unpack by simply doing
      tar -zxf Data.tar.gz -C ./
      This will create a folder called Data with the uncompressed material inside
    2. Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by
      1. creating the folder Course_Env: mkdir Course_Env
      2. untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env
      3. Activate the environment: conda activate ./Course_Env
      4. Run the unpacking script (it can take quite some time to get it done): conda-unpack
    3. Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.
    4. environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:
      1. conda env create -f environment_with_args.yml -p ./Course_Env
      2. conda activate ./Course_Env

    The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

    Description

    The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

    The participants must at the end of the course be able to:

    • Identify an experimental platform relevant to a population genomic analysis.
    • Apply commonly used population genomic methods.
    • Explain the theory behind common population genomic methods.
    • Reflect on strengths and limitations of population genomic methods.
    • Interpret and analyze results of population genomic inference.
    • Formulate population genetics hypotheses based on data

    The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

    Curriculum

    The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

    Course plan

    1. Course intro and overview:
    2. Drift and the coalescent:
    3. Recombination:
    4. Population strucure and incomplete lineage sorting:
    5. Hidden Markov models:
    6. Ancestral recombination graphs:
    7. Past population demography:
    8. Direct and linked selection:
    9. Admixture:
    10. Genome-wide association study (GWAS):
    11. Heritability:
      • Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)
      • Exercise: Association testing
    12. Evolution and disease:
      • Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)
      • Exercise: Estimating heritability
  2. f

    Phylogenetic analyses of the insulin-like growth factor binding protein...

    • figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Ocampo Daza; Christina A Bergqvist; Dan Larhammar (2023). Phylogenetic analyses of the insulin-like growth factor binding protein (IGFBP) family [Dataset]. http://doi.org/10.6084/m9.figshare.103144.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Daniel Ocampo Daza; Christina A Bergqvist; Dan Larhammar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Phylogenetic re-analyses of Insulin-like Growth Factor Binding Proteins (IGFBPs) based on amino acid sequences. The sequences and alignment described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below) were used to analyze additional IGFBP sequences identified in the genome databases of Anolis carolinensis (anole lizard), Latimeria chalumnae (coelacanth) and Lepisosteus oculatus (spotted gar). Phylogenetic trees were made using neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods, both supported by bootstrap analyses (details below). Figures (PDF-files) of the finished trees are included in the files IGFBP_NJ_figure.pdf and IGFBP_PhyML_figure.pdf. Branch colors are based on chromosomal locations and follow the trees published in Ocampo Daza et al. (2011) (link below). Species abbreviations Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Canis familiaris (Cfa, dog), Monodelphis domestica (Mdo, opossum), Gallus gallus (Gga, chicken), Taeniopygia guttata (Tgu, zebra finch), Anolis carolinensis (Aca, anole lizard), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka),Gasterosteus aculeatus (Gac, stickleback), Tetraodon nigroviridis (Tni, green-spotted pufferfish),Takifugu rubripes (Tru, fugu), Ciona intestinalis (Cin, vase tunicate), Ciona savignyi (Csa, Pacific transparent tunicate) and Branchiostoma floridae (Bfl, Florida lancelet). Sequences used Detailed information about all sequences that were used is included in the file Sequence_info_Tab1.xlsx (MS Excel spreadsheet). This includes database identifiers and chromosome/linkage group locations as well as notes on the manual curation/annotation of the sequences. Alignment The full amino acid sequence alignment used for the phylogenetic analyses is included in an interleaved format (.aln) and a sequential format (.fasta) in the files IGFBP_alignment_interleaved.aln and IGFBP_alignment_sequential.fasta. The alignment was made using the ClustalW algorithm and edited manually as described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below). Anole lizard, coelacanth and spotted gar sequences marked with asterisks are fragments and do not span the full length of the alignment (details in the file Sequence_info_Tab1.xlsx). Phylogenetic analysis, NJ method The Neighbor Joining tree was made in ClustalX 2.0, with settings as described in Ocampo Daza et al. (2011) (link below). The tree is supported by a bootstrap analysis with 1000 bootstrap replicates. The raw output is included in the file IGFBP_NJ.txt and the final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_NJ_rooted.phb. Both files are in the Newick/Phylip data format. Phylogenetic trees, PhyML method The Phylogenetic Maximum Likelihood tree was made using the PhyML3.0 algorithm implemented through the web-based interface available at http://www.atgc-montpellier.fr/phyml/. The following settings were used: . Amino acid subst. model : LG. Proportion of invariable sites : estimated. Number of subst. rate categs : 8. Gamma distribution parameter : estimated. 'Middle' of each rate class : mean. Amino acid equilibrium frequencies : empirical. Optimise tree topology : yes. Tree topology search : NNIs. Starting tree : BioNJ. Add random input tree : no. Optimise branch lengths : yes. Optimise substitution model parameters : yes The tree is supported by a bootstrap analysis with 100 bootstrap replicates. The final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_PhyML.phb (Newick/Phylip format). The raw output files of the PhyML analysis are included in the following files: . igfbp_ml_121119_phy_stdout.txt . igfbp_ml_121119_phy_phyml_tree.txt . igfbp_ml_121119_phy_phyml_stats.txt . igfbp_ml_121119_phy_phyml_boot_trees.txt . igfbp_ml_121119_phy_phyml_boot_stats File formats All phylogenetic data is included in the Newick/Phylip format. For more information on the PhyML output files and data formats, see http://www.atgc-montpellier.fr/download/papers/phyml_manual_2009.pdf.

  3. f

    DataSheet_2_Prognostic Significance and Immunological Role of FBXO5 in Human...

    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peng Liu; Xiaojuan Wang; Lili Pan; Bing Han; Zhiying He (2023). DataSheet_2_Prognostic Significance and Immunological Role of FBXO5 in Human Cancers: A Systematic Pan-Cancer Analysis.pdf [Dataset]. http://doi.org/10.3389/fimmu.2022.901784.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Peng Liu; Xiaojuan Wang; Lili Pan; Bing Han; Zhiying He
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    F-box protein 5 (FBXO5), an essential subunit of the ubiquitin protein ligase complex, is increasingly recognized to exhibit important biological effects in regulating tumor occurrence and progression. The present research was intended to systematically investigate the latent roles of FBXO5 in prognosis and immunological function across cancers. Pan-cancer analyses of FBXO5 were performed based upon publicly available online databases, mainly including the Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), UCSC Xena, cBioPortal, and ImmuCellAI, revealing the possible relationships between FBXO5 and prognosis, DNA methylation, tumor microenvironment (TME), infiltration of immune cells, immune-related genes, immune checkpoints, tumor mutation burden (TMB), and microsatellite instability (MSI). The results suggested that FBXO5 was expressed at a high level in numerous tumor cell lines with significant upregulation in most cancers as opposed to normal tissues. Of note, elevated expression of FBXO5 was significantly related to an unfavorable prognosis in many cancer types. Furthermore, DNA methylation and TME were confirmed to display evident correlation with the expression of FBXO5 in several malignancies. Moreover, FBXO5 expression was remarkably positively correlated with the levels of infiltrating Treg cells and Tcm cells in most tumors, but negatively correlated with tumor-infiltrating CD8+ T cells, NK/NKT cells, and Th2 cells. Meanwhile, FBXO5 was demonstrated to be co-expressed with the genes encoding immune activating and suppressive factors, chemokines, chemokine receptors, and major histocompatibility complex (MHC). Immune checkpoints, TMB, and MSI were also overtly associated with FBXO5 dysregulation among diverse kinds of cancers. Additionally, the enrichment analyses showed close relationships between FBXO5 expression and the processes related to cell cycle and immune inflammatory response. These findings provided a detailed comprehension of the oncogenic function of FBXO5. Because of its crucial roles in cancer immunity and tumorigenesis, FBXO5 may serve as a novel prognostic indicator and immunotherapeutic target for various malignancies.

  4. f

    Phylogenetic analyses of the vertebrate oxytocin and vasopressin receptor...

    • figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Ocampo Daza; Dan Larhammar (2023). Phylogenetic analyses of the vertebrate oxytocin and vasopressin receptor gene family [Dataset]. http://doi.org/10.6084/m9.figshare.707336.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Daniel Ocampo Daza; Dan Larhammar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sequence based phylogenetic analyses of vertebrate oxytocin receptor (OTR) and vasopressin receptor (VPR) genes using amino acid sequences predicted primarily from the Ensembl (http://www.ensembl.org) and Pre Ensembl (http://pre.ensembl.org) genome browsers. These analyses are based on our previously published study identifying OTR and VPR sequences in vertebrate genomes, including previously unrecognised subtypes of V2 receptors - Ocampo Daza D., Lewicka M. and Larhammar D. (2012) The oxytocin/vasopressin receptor family has at least five members in the gnathostome lineage, inclucing two distinct V2 subtypes, General and Comparative Endocrinology 175(1):135-143 (link below). These updated analyses include more species and suggest an update of VPR gene nomenclature. Species and genome assembly information, database identifiers, location data and annotation notes for all identified sequences are included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. These tables also detail the updated vs. outdated nomenclature. All identified and curated amino acid sequences are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'. Legends: Sequences marked * are not full-length, sequences marked # are not full-length and the prediction of the intracellular loop 3 (IL3) is not clear. The sequence marked § is a putative pseudogene. See details in 'Master_OTR_VPR_sequence_tables.xlsx'. Numbers in sequence names indicate the chromosome/linkage group where known. File information 1: Species included in these analyses, with abbreviations: human (Homo sapiens, Hsa), mouse (Mus musculus, Mmu), grey short-tailed opossum (Monodelphis domestica, Mdo), chicken (Gallus gallus, Gga), Carolina anole lizard (Anolis carolinensis, Aca), Western clawed frog (Xenopus tropicalis, Xtr), coelacanth (Latimeria chalumnae, Lch), spotted gar (Lepisosteus oculatus, Loc), zebrafish (Danio rerio, Dre), three-spined stickleback (Gasterosteus aculeatus, Gac), medaka (Oryzias latipes, Ola), Southern platyfish (Xiphophorus maculatus, Xma), Japanese pufferfish (Takifugu rubripes, Tru) and Elephant shark (Callorhinchus milii, Cmi). Alignment file included in FASTA-format: 'align_OTR_VPR_edited.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. This alignment has been curated and edited as described in the Methods sections and Supplementary Material 3 of Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) (link below), removing parts of the amino terminal, carboxy terminal and intracellular loop 3. The alignment was created using the MUSCLE algorithm applied through eBioX (http://www.ebioinformatics.org/ebiox/) using standard settings with 16 iterations. The alignment was edited manually in eBioX. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). All trees were made using the alignment described above. Corresponding figures for each phylogenetic tree are also included as PDF-files. Red nodes and support values indicate values lower than 50%. The neighbor joining (NJ) tree, 'NJ_tree_OPR_VPR.phb', was made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. Phylogenetic Maximum Likelihood (PhyML) trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/‎) through the PhyML-aBayes application. One tree is supported by a non-parametric bootstrap analysis with 100 replicates, 'PhyML_tree_OTR_VPR_boot.phb', and one is supported by an SH-like approximate likelihood ratio test (aLRT), 'PhyML_tree_OTR_VPR_aLRT.phb'. Both PhyML trees were made with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads). File information 2: The alignment file '120922_align_Tni.fasta' includes OTR and VPR sequences identified in the spotted green pufferfish (Tetraodon nigroviridis, Tni) genome. The alignment file '120922_align_Psi_Cpi.fasta' includes OTR and VPR sequences identified in the Chinese softshell turtle (Pelodiscus sinensis, Psi) and painted turtle (Chrysemys picta bellii, Cpi) genomes. These alignments are based on the alignment used for the study described in Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) and were made using the ClustalW algorithm in ClustalX 2.0 (http://www.clustal.org/clustal2/) with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20). For the spotted green pufferfish, only the automatic Ensembl predictions were used to verify all family members. For the two turtles, the identified seqences were curated manually in order to ratify erroneous automatic exon predictions and to predict exons or whole gene predictions that had not been identified. Genome assembly information, database identifiers, location data and annotation notes for these sequences are also included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. The un-aligned sequence predictions are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'. These sequences were tested in NJ trees made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. The file '120922_NJ_tree_Tni.phb' includes spotted green pufferfish and the file '121022_NJ_tree_Psi_Cpi.phb' includes the two turtle species. Both tree files are in Phylip/Newick format. Corresponding figures for each phylogenetic tree are also included as PDF-files, with the spotted green pufferfish and turtle sequences marked in color.

  5. f

    Phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1...

    • figshare.com
    xlsx
    Updated Jan 9, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Lagman; Daniel Ocampo Daza; Görel Sundström; Dan Larhammar (2017). Phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1 and RH2 clades [Dataset]. http://doi.org/10.6084/m9.figshare.705157.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 9, 2017
    Dataset provided by
    figshare
    Authors
    David Lagman; Daniel Ocampo Daza; Görel Sundström; Dan Larhammar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sequence based phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1 and RH2 clades, with additional analyses including pinopsins, vertebrate ancient (V/A) opsins and Ciona intestinalis opsins. The phylogenetic analyses were made using amino acid sequences predicted from the Ensembl genome browser (http://www.ensembl.org) version 60 (Nov 2010) and the Lepisosteus oculatus (spotted gar) genome assembly LepOcu1 (http://www.ncbi.nlm.nih.gov/genome/assembly/327908/), as well as sequences identified in the NCBI RefSeq database. Database identifiers, location data, genome assembly, and annotation notes for all sequences are included in 'Supplementary Table OPN.xlsx' (Excel spreadsheet). File information: Alignment files are included in FASTA-format: 'align_visual_opsins.fasta' and 'align_visual_opsins_VA_pinops.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. The second alignment file includes additional pinopsin, V/A opsin and Ciona intestinalis opsin sequences, as detailed in 'Supplementary Table OPN.xlsx'. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). The phylogenetic analyses were carried out based on the included alignments using both neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods. Phylogenetic trees are rooted with the human OPN3 amino acid sequence. Corresponding figures for all phylogenetic trees are also included as PDF files. Sequence names/leaf names include species abbreviations (see below) as well as chromosome numbers where known. For the human and zebrafish sequences the full HGNC and ZFIN gene symbols are included. For other species the clade name is indicated in the sequence names/leaf names. The species included in these analyses were (abbreviations and common names in parenthesis): Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Monodelphis domestica (Mdo, grey short-tailed opossum), Gallus gallus (Gga, chicken), Anolis carolinensis (Aca, Carolina anole lizard), Xenopus (Silurana) tropicalis (Xtr, Western clawed frog), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka), Gasterosteus aculeatus (Gac, three-spined stickleback), Tetraodon nigroviridis (Tni, green spotted pufferfish), Geotria australis (Gau, pouched lamprey) and Ciona intestinalis (Cin, transparent sea squirt). Method details: Alignments were created using the ClustalW algorithm with the following settings: Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20. The alignments were edited manually in order to curate short, incomplete or highly divergent amino acid sequence predictions from the genome databases. In this way erroneous automatic exon predictions and exons that had not been predicted could be ratified. Phylogenetic analyses were carried out based on the included alignments. NJ trees were made using standard settings in ClustalX 2.0.12 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. PhyML trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/‎) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads). PhyML trees are supported by a non-parametric bootstrap analysis with 100 replicates applied through PhyML.

  6. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

  7. f

    DataSheet_1_Computational Design of gRNAs Targeting Genetic Variants Across...

    • figshare.com
    pdf
    Updated Jun 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cheng-Han Chung; Alexander G. Allen; Andrew Atkins; Robert W. Link; Michael R. Nonnemacher; Will Dampier; Brian Wigdahl (2023). DataSheet_1_Computational Design of gRNAs Targeting Genetic Variants Across HIV-1 Subtypes for CRISPR-Mediated Antiviral Therapy.pdf [Dataset]. http://doi.org/10.3389/fcimb.2021.593077.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Frontiers
    Authors
    Cheng-Han Chung; Alexander G. Allen; Andrew Atkins; Robert W. Link; Michael R. Nonnemacher; Will Dampier; Brian Wigdahl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Clustered regularly interspaced short palindromic repeats (CRISPR)-based HIV-1 genome editing has shown promising outcomes in in vitro and in vivo viral infection models. However, existing HIV-1 sequence variants have been shown to reduce CRISPR-mediated efficiency and induce viral escape. Two metrics, global patient coverage and global subtype coverage, were used to identify guide RNA (gRNA) sequences that account for this viral diversity from the perspectives of cross-patient and cross-subtype gRNA design, respectively. Computational evaluation using these parameters and over 3.6 million possible 20-bp sequences resulted in nine lead gRNAs, two of which were previously published. This analysis revealed the benefit and necessity of considering all sequence variants for gRNA design. Of the other seven identified novel gRNAs, two were of note as they targeted interesting functional regions. One was a gRNA predicted to induce structural disruption in the nucleocapsid binding site (Ψ), which holds the potential to stop HIV-1 replication during the viral genome packaging process. The other was a reverse transcriptase (RT)-targeting gRNA that was predicted to cleave the subdomain responsible for dNTP incorporation. CRISPR-mediated sequence edits were predicted to occur on critical residues where HIV-1 has been shown to develop resistance against antiretroviral therapy (ART), which may provide additional evolutionary pressure at the DNA level. Given these observations, consideration of broad-spectrum gRNAs and cross-subtype diversity for gRNA design is not only required for the development of generalizable CRISPR-based HIV-1 therapy, but also helps identify optimal target sites.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Organization logo

[Dataset] Data for the course "Population Genomics" at Aarhus University

Explore at:
application/gzip, binAvailable download formats
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

  1. Data.tar.gz Contains the datasets and executable files for some of the softwares
    You can unpack by simply doing
    tar -zxf Data.tar.gz -C ./
    This will create a folder called Data with the uncompressed material inside
  2. Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by
    1. creating the folder Course_Env: mkdir Course_Env
    2. untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env
    3. Activate the environment: conda activate ./Course_Env
    4. Run the unpacking script (it can take quite some time to get it done): conda-unpack
  3. Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.
  4. environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:
    1. conda env create -f environment_with_args.yml -p ./Course_Env
    2. conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

  • Identify an experimental platform relevant to a population genomic analysis.
  • Apply commonly used population genomic methods.
  • Explain the theory behind common population genomic methods.
  • Reflect on strengths and limitations of population genomic methods.
  • Interpret and analyze results of population genomic inference.
  • Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

  1. Course intro and overview:
  2. Drift and the coalescent:
  3. Recombination:
  4. Population strucure and incomplete lineage sorting:
  5. Hidden Markov models:
  6. Ancestral recombination graphs:
  7. Past population demography:
  8. Direct and linked selection:
  9. Admixture:
  10. Genome-wide association study (GWAS):
  11. Heritability:
    • Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)
    • Exercise: Association testing
  12. Evolution and disease:
    • Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)
    • Exercise: Estimating heritability
Search
Clear search
Close search
Google apps
Main menu