7 datasets found

[Dataset] Data for the course "Population Genomics" at Aarhus University
zenodo.org
application/gzip, bin
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7670839
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside

Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by

creating the folder Course_Env: mkdir Course_Env

untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env

Activate the environment: conda activate ./Course_Env

Run the unpacking script (it can take quite some time to get it done): conda-unpack

Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.

environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:

conda env create -f environment_with_args.yml -p ./Course_Env

conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.

Apply commonly used population genomic methods.

Explain the theory behind common population genomic methods.

Reflect on strengths and limitations of population genomic methods.

Interpret and analyze results of population genomic inference.

Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:

Coop chapters 1, 2, 3, Paper: Genome Diversity Project

Drift and the coalescent:

Coop chapter 4; Paper: Platypus

Exercise: Read mapping and base calling

Recombination:

Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation

Exercise: Phasing and recombination rate

Population strucure and incomplete lineage sorting:

Lecture: Coop chapter 6, Review: Incomplete lineage sorting

Exercise: Working with VCF files

Hidden Markov models:

Lecture: Durbin chapter 3, Paper: population structure

Exercise: Inference of population structure and admixture

Ancestral recombination graphs:

Lecture: Paper: Approximating the ARG, Paper: Tree inference

Exercise: ARG dashboard exercises + Inference of trees along sequence

Past population demography:

Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference

Exercise: Inferring historical populations

Direct and linked selection:

Lecture: Coop chapters 12, 13, revisit Paper: Tree inference

Admixture:

Lecture: Review: Admixture, Paper: Admixture inference

Exercise: Detecting archaic ancestry in modern humans

Genome-wide association study (GWAS):

Lecture: Coop lecture notes 99-120

Exercise: GWAS quality control

Heritability:

Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)

Exercise: Association testing

Evolution and disease:

Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)

Exercise: Estimating heritability
f
Phylogenetic analyses of the insulin-like growth factor binding protein...
figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Ocampo Daza; Christina A Bergqvist; Dan Larhammar (2023). Phylogenetic analyses of the insulin-like growth factor binding protein (IGFBP) family [Dataset]. http://doi.org/10.6084/m9.figshare.103144.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.103144.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Daniel Ocampo Daza; Christina A Bergqvist; Dan Larhammar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Phylogenetic re-analyses of Insulin-like Growth Factor Binding Proteins (IGFBPs) based on amino acid sequences. The sequences and alignment described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below) were used to analyze additional IGFBP sequences identified in the genome databases of Anolis carolinensis (anole lizard), Latimeria chalumnae (coelacanth) and Lepisosteus oculatus (spotted gar). Phylogenetic trees were made using neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods, both supported by bootstrap analyses (details below). Figures (PDF-files) of the finished trees are included in the files IGFBP_NJ_figure.pdf and IGFBP_PhyML_figure.pdf. Branch colors are based on chromosomal locations and follow the trees published in Ocampo Daza et al. (2011) (link below). Species abbreviations Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Canis familiaris (Cfa, dog), Monodelphis domestica (Mdo, opossum), Gallus gallus (Gga, chicken), Taeniopygia guttata (Tgu, zebra finch), Anolis carolinensis (Aca, anole lizard), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka),Gasterosteus aculeatus (Gac, stickleback), Tetraodon nigroviridis (Tni, green-spotted pufferfish),Takifugu rubripes (Tru, fugu), Ciona intestinalis (Cin, vase tunicate), Ciona savignyi (Csa, Pacific transparent tunicate) and Branchiostoma floridae (Bfl, Florida lancelet). Sequences used Detailed information about all sequences that were used is included in the file Sequence_info_Tab1.xlsx (MS Excel spreadsheet). This includes database identifiers and chromosome/linkage group locations as well as notes on the manual curation/annotation of the sequences. Alignment The full amino acid sequence alignment used for the phylogenetic analyses is included in an interleaved format (.aln) and a sequential format (.fasta) in the files IGFBP_alignment_interleaved.aln and IGFBP_alignment_sequential.fasta. The alignment was made using the ClustalW algorithm and edited manually as described in Ocampo Daza et al. (2011) Endocrinology 152(6):2278-89 (link below). Anole lizard, coelacanth and spotted gar sequences marked with asterisks are fragments and do not span the full length of the alignment (details in the file Sequence_info_Tab1.xlsx). Phylogenetic analysis, NJ method The Neighbor Joining tree was made in ClustalX 2.0, with settings as described in Ocampo Daza et al. (2011) (link below). The tree is supported by a bootstrap analysis with 1000 bootstrap replicates. The raw output is included in the file IGFBP_NJ.txt and the final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_NJ_rooted.phb. Both files are in the Newick/Phylip data format. Phylogenetic trees, PhyML method The Phylogenetic Maximum Likelihood tree was made using the PhyML3.0 algorithm implemented through the web-based interface available at http://www.atgc-montpellier.fr/phyml/. The following settings were used: . Amino acid subst. model : LG. Proportion of invariable sites : estimated. Number of subst. rate categs : 8. Gamma distribution parameter : estimated. 'Middle' of each rate class : mean. Amino acid equilibrium frequencies : empirical. Optimise tree topology : yes. Tree topology search : NNIs. Starting tree : BioNJ. Add random input tree : no. Optimise branch lengths : yes. Optimise substitution model parameters : yes The tree is supported by a bootstrap analysis with 100 bootstrap replicates. The final tree, rooted with the lancelet IGFBP sequence, is included in the file IGFBP_PhyML.phb (Newick/Phylip format). The raw output files of the PhyML analysis are included in the following files: . igfbp_ml_121119_phy_stdout.txt . igfbp_ml_121119_phy_phyml_tree.txt . igfbp_ml_121119_phy_phyml_stats.txt . igfbp_ml_121119_phy_phyml_boot_trees.txt . igfbp_ml_121119_phy_phyml_boot_stats File formats All phylogenetic data is included in the Newick/Phylip format. For more information on the PhyML output files and data formats, see http://www.atgc-montpellier.fr/download/papers/phyml_manual_2009.pdf.
f
DataSheet_2_Prognostic Significance and Immunological Role of FBXO5 in Human...
frontiersin.figshare.com
pdf
Updated Jun 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peng Liu; Xiaojuan Wang; Lili Pan; Bing Han; Zhiying He (2023). DataSheet_2_Prognostic Significance and Immunological Role of FBXO5 in Human Cancers: A Systematic Pan-Cancer Analysis.pdf [Dataset]. http://doi.org/10.3389/fimmu.2022.901784.s002
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2022.901784.s002
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Peng Liu; Xiaojuan Wang; Lili Pan; Bing Han; Zhiying He
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
F-box protein 5 (FBXO5), an essential subunit of the ubiquitin protein ligase complex, is increasingly recognized to exhibit important biological effects in regulating tumor occurrence and progression. The present research was intended to systematically investigate the latent roles of FBXO5 in prognosis and immunological function across cancers. Pan-cancer analyses of FBXO5 were performed based upon publicly available online databases, mainly including the Cancer Genome Atlas (TCGA), Genotype-Tissue Expression (GTEx), UCSC Xena, cBioPortal, and ImmuCellAI, revealing the possible relationships between FBXO5 and prognosis, DNA methylation, tumor microenvironment (TME), infiltration of immune cells, immune-related genes, immune checkpoints, tumor mutation burden (TMB), and microsatellite instability (MSI). The results suggested that FBXO5 was expressed at a high level in numerous tumor cell lines with significant upregulation in most cancers as opposed to normal tissues. Of note, elevated expression of FBXO5 was significantly related to an unfavorable prognosis in many cancer types. Furthermore, DNA methylation and TME were confirmed to display evident correlation with the expression of FBXO5 in several malignancies. Moreover, FBXO5 expression was remarkably positively correlated with the levels of infiltrating Treg cells and Tcm cells in most tumors, but negatively correlated with tumor-infiltrating CD8+ T cells, NK/NKT cells, and Th2 cells. Meanwhile, FBXO5 was demonstrated to be co-expressed with the genes encoding immune activating and suppressive factors, chemokines, chemokine receptors, and major histocompatibility complex (MHC). Immune checkpoints, TMB, and MSI were also overtly associated with FBXO5 dysregulation among diverse kinds of cancers. Additionally, the enrichment analyses showed close relationships between FBXO5 expression and the processes related to cell cycle and immune inflammatory response. These findings provided a detailed comprehension of the oncogenic function of FBXO5. Because of its crucial roles in cancer immunity and tumorigenesis, FBXO5 may serve as a novel prognostic indicator and immunotherapeutic target for various malignancies.
f
Phylogenetic analyses of the vertebrate oxytocin and vasopressin receptor...
figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Ocampo Daza; Dan Larhammar (2023). Phylogenetic analyses of the vertebrate oxytocin and vasopressin receptor gene family [Dataset]. http://doi.org/10.6084/m9.figshare.707336.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.707336.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Daniel Ocampo Daza; Dan Larhammar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sequence based phylogenetic analyses of vertebrate oxytocin receptor (OTR) and vasopressin receptor (VPR) genes using amino acid sequences predicted primarily from the Ensembl (http://www.ensembl.org) and Pre Ensembl (http://pre.ensembl.org) genome browsers. These analyses are based on our previously published study identifying OTR and VPR sequences in vertebrate genomes, including previously unrecognised subtypes of V2 receptors - Ocampo Daza D., Lewicka M. and Larhammar D. (2012) The oxytocin/vasopressin receptor family has at least five members in the gnathostome lineage, inclucing two distinct V2 subtypes, General and Comparative Endocrinology 175(1):135-143 (link below). These updated analyses include more species and suggest an update of VPR gene nomenclature. Species and genome assembly information, database identifiers, location data and annotation notes for all identified sequences are included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. These tables also detail the updated vs. outdated nomenclature. All identified and curated amino acid sequences are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'. Legends: Sequences marked * are not full-length, sequences marked # are not full-length and the prediction of the intracellular loop 3 (IL3) is not clear. The sequence marked § is a putative pseudogene. See details in 'Master_OTR_VPR_sequence_tables.xlsx'. Numbers in sequence names indicate the chromosome/linkage group where known. File information 1: Species included in these analyses, with abbreviations: human (Homo sapiens, Hsa), mouse (Mus musculus, Mmu), grey short-tailed opossum (Monodelphis domestica, Mdo), chicken (Gallus gallus, Gga), Carolina anole lizard (Anolis carolinensis, Aca), Western clawed frog (Xenopus tropicalis, Xtr), coelacanth (Latimeria chalumnae, Lch), spotted gar (Lepisosteus oculatus, Loc), zebrafish (Danio rerio, Dre), three-spined stickleback (Gasterosteus aculeatus, Gac), medaka (Oryzias latipes, Ola), Southern platyfish (Xiphophorus maculatus, Xma), Japanese pufferfish (Takifugu rubripes, Tru) and Elephant shark (Callorhinchus milii, Cmi). Alignment file included in FASTA-format: 'align_OTR_VPR_edited.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. This alignment has been curated and edited as described in the Methods sections and Supplementary Material 3 of Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) (link below), removing parts of the amino terminal, carboxy terminal and intracellular loop 3. The alignment was created using the MUSCLE algorithm applied through eBioX (http://www.ebioinformatics.org/ebiox/) using standard settings with 16 iterations. The alignment was edited manually in eBioX. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). All trees were made using the alignment described above. Corresponding figures for each phylogenetic tree are also included as PDF-files. Red nodes and support values indicate values lower than 50%. The neighbor joining (NJ) tree, 'NJ_tree_OPR_VPR.phb', was made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. Phylogenetic Maximum Likelihood (PhyML) trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/‎) through the PhyML-aBayes application. One tree is supported by a non-parametric bootstrap analysis with 100 replicates, 'PhyML_tree_OTR_VPR_boot.phb', and one is supported by an SH-like approximate likelihood ratio test (aLRT), 'PhyML_tree_OTR_VPR_aLRT.phb'. Both PhyML trees were made with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads). File information 2: The alignment file '120922_align_Tni.fasta' includes OTR and VPR sequences identified in the spotted green pufferfish (Tetraodon nigroviridis, Tni) genome. The alignment file '120922_align_Psi_Cpi.fasta' includes OTR and VPR sequences identified in the Chinese softshell turtle (Pelodiscus sinensis, Psi) and painted turtle (Chrysemys picta bellii, Cpi) genomes. These alignments are based on the alignment used for the study described in Ocampo Daza D. et al. (2012) Gen. Comp. Endocrinol 175(1) and were made using the ClustalW algorithm in ClustalX 2.0 (http://www.clustal.org/clustal2/) with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20). For the spotted green pufferfish, only the automatic Ensembl predictions were used to verify all family members. For the two turtles, the identified seqences were curated manually in order to ratify erroneous automatic exon predictions and to predict exons or whole gene predictions that had not been identified. Genome assembly information, database identifiers, location data and annotation notes for these sequences are also included in the Excel workbook 'Master_OTR_VPR_sequence_tables.xlsx'. The un-aligned sequence predictions are included in the FASTA file 'Master_OTR_VPR_sequences.fasta'. These sequences were tested in NJ trees made using standard settings in ClustalX 2.0 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. The file '120922_NJ_tree_Tni.phb' includes spotted green pufferfish and the file '121022_NJ_tree_Psi_Cpi.phb' includes the two turtle species. Both tree files are in Phylip/Newick format. Corresponding figures for each phylogenetic tree are also included as PDF-files, with the spotted green pufferfish and turtle sequences marked in color.
f
Phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1...
figshare.com
xlsx
Updated Jan 9, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Lagman; Daniel Ocampo Daza; Görel Sundström; Dan Larhammar (2017). Phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1 and RH2 clades [Dataset]. http://doi.org/10.6084/m9.figshare.705157.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.705157.v1
Dataset updated
Jan 9, 2017
Dataset provided by
figshare
Authors
David Lagman; Daniel Ocampo Daza; Görel Sundström; Dan Larhammar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sequence based phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1 and RH2 clades, with additional analyses including pinopsins, vertebrate ancient (V/A) opsins and Ciona intestinalis opsins. The phylogenetic analyses were made using amino acid sequences predicted from the Ensembl genome browser (http://www.ensembl.org) version 60 (Nov 2010) and the Lepisosteus oculatus (spotted gar) genome assembly LepOcu1 (http://www.ncbi.nlm.nih.gov/genome/assembly/327908/), as well as sequences identified in the NCBI RefSeq database. Database identifiers, location data, genome assembly, and annotation notes for all sequences are included in 'Supplementary Table OPN.xlsx' (Excel spreadsheet). File information: Alignment files are included in FASTA-format: 'align_visual_opsins.fasta' and 'align_visual_opsins_VA_pinops.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. The second alignment file includes additional pinopsin, V/A opsin and Ciona intestinalis opsin sequences, as detailed in 'Supplementary Table OPN.xlsx'. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). The phylogenetic analyses were carried out based on the included alignments using both neighbor joining (NJ) and phylogenetic maximum likelihood (PhyML) methods. Phylogenetic trees are rooted with the human OPN3 amino acid sequence. Corresponding figures for all phylogenetic trees are also included as PDF files. Sequence names/leaf names include species abbreviations (see below) as well as chromosome numbers where known. For the human and zebrafish sequences the full HGNC and ZFIN gene symbols are included. For other species the clade name is indicated in the sequence names/leaf names. The species included in these analyses were (abbreviations and common names in parenthesis): Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Monodelphis domestica (Mdo, grey short-tailed opossum), Gallus gallus (Gga, chicken), Anolis carolinensis (Aca, Carolina anole lizard), Xenopus (Silurana) tropicalis (Xtr, Western clawed frog), Latimeria chalumnae (Lch, coelacanth), Lepisosteus oculatus (Loc, spotted gar), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka), Gasterosteus aculeatus (Gac, three-spined stickleback), Tetraodon nigroviridis (Tni, green spotted pufferfish), Geotria australis (Gau, pouched lamprey) and Ciona intestinalis (Cin, transparent sea squirt). Method details: Alignments were created using the ClustalW algorithm with the following settings: Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20. The alignments were edited manually in order to curate short, incomplete or highly divergent amino acid sequence predictions from the genome databases. In this way erroneous automatic exon predictions and exons that had not been predicted could be ratified. Phylogenetic analyses were carried out based on the included alignments. NJ trees were made using standard settings in ClustalX 2.0.12 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. PhyML trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/‎) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The JTT model of amino acid substitution was chosen using ProtTest 3.0 (https://bitbucket.org/diegodl/prottest3/downloads). PhyML trees are supported by a non-parametric bootstrap analysis with 100 replicates applied through PhyML.
c
Protein Structural Domain Classification
cathdb.info
ec.i4cologne.com
+3more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
Explore at:
Unique identifier
https://identifiers.org/MIR:00100005
Dataset updated
Sep 30, 2024
Description
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
f
DataSheet_1_Computational Design of gRNAs Targeting Genetic Variants Across...
figshare.com
pdf
Updated Jun 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cheng-Han Chung; Alexander G. Allen; Andrew Atkins; Robert W. Link; Michael R. Nonnemacher; Will Dampier; Brian Wigdahl (2023). DataSheet_1_Computational Design of gRNAs Targeting Genetic Variants Across HIV-1 Subtypes for CRISPR-Mediated Antiviral Therapy.pdf [Dataset]. http://doi.org/10.3389/fcimb.2021.593077.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fcimb.2021.593077.s001
Dataset updated
Jun 11, 2023
Dataset provided by
Frontiers
Authors
Cheng-Han Chung; Alexander G. Allen; Andrew Atkins; Robert W. Link; Michael R. Nonnemacher; Will Dampier; Brian Wigdahl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clustered regularly interspaced short palindromic repeats (CRISPR)-based HIV-1 genome editing has shown promising outcomes in in vitro and in vivo viral infection models. However, existing HIV-1 sequence variants have been shown to reduce CRISPR-mediated efficiency and induce viral escape. Two metrics, global patient coverage and global subtype coverage, were used to identify guide RNA (gRNA) sequences that account for this viral diversity from the perspectives of cross-patient and cross-subtype gRNA design, respectively. Computational evaluation using these parameters and over 3.6 million possible 20-bp sequences resulted in nine lead gRNAs, two of which were previously published. This analysis revealed the benefit and necessity of considering all sequence variants for gRNA design. Of the other seven identified novel gRNAs, two were of note as they targeted interesting functional regions. One was a gRNA predicted to induce structural disruption in the nucleocapsid binding site (Ψ), which holds the potential to stop HIV-1 replication during the viral genome packaging process. The other was a reverse transcriptase (RT)-targeting gRNA that was predicted to cleave the subdomain responsible for dNTP incorporation. CRISPR-mediated sequence edits were predicted to occur on critical residues where HIV-1 has been shown to develop resistance against antiretroviral therapy (ART), which may provide additional evolutionary pressure at the DNA level. Given these observations, consideration of broad-spectrum gRNAs and cross-subtype diversity for gRNA design is not only required for the development of generalizable CRISPR-based HIV-1 therapy, but also helps identify optimal target sites.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839

[Dataset] Data for the course "Population Genomics" at Aarhus University

Explore at:

application/gzip, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7670839

Dataset updated

Jan 8, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside
Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by
1. creating the folder Course_Env: mkdir Course_Env
2. untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env
3. Activate the environment: conda activate ./Course_Env
4. Run the unpacking script (it can take quite some time to get it done): conda-unpack
Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.
environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:
1. conda env create -f environment_with_args.yml -p ./Course_Env
2. conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.
Apply commonly used population genomic methods.
Explain the theory behind common population genomic methods.
Reflect on strengths and limitations of population genomic methods.
Interpret and analyze results of population genomic inference.
Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:
- Coop chapters 1, 2, 3, Paper: Genome Diversity Project
Drift and the coalescent:
- Coop chapter 4; Paper: Platypus
- Exercise: Read mapping and base calling
Recombination:
- Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation
- Exercise: Phasing and recombination rate
Population strucure and incomplete lineage sorting:
- Lecture: Coop chapter 6, Review: Incomplete lineage sorting
- Exercise: Working with VCF files
Hidden Markov models:
- Lecture: Durbin chapter 3, Paper: population structure
- Exercise: Inference of population structure and admixture
Ancestral recombination graphs:
- Lecture: Paper: Approximating the ARG, Paper: Tree inference
- Exercise: ARG dashboard exercises + Inference of trees along sequence
Past population demography:
- Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference
- Exercise: Inferring historical populations
Direct and linked selection:
- Lecture: Coop chapters 12, 13, revisit Paper: Tree inference
Admixture:
- Lecture: Review: Admixture, Paper: Admixture inference
- Exercise: Detecting archaic ancestry in modern humans
Genome-wide association study (GWAS):
- Lecture: Coop lecture notes 99-120
- Exercise: GWAS quality control
Heritability:
- Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)
- Exercise: Association testing
Evolution and disease:
- Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)
- Exercise: Estimating heritability

Clear search

Close search

Google apps

Main menu

[Dataset] Data for the course "Population Genomics" at Aarhus University

Phylogenetic analyses of the insulin-like growth factor binding protein...

DataSheet_2_Prognostic Significance and Immunological Role of FBXO5 in Human...

Phylogenetic analyses of the vertebrate oxytocin and vasopressin receptor...

Phylogenetic analyses of the visual opsin genes of the LWS, SWS1, SWS2, RH1...

Protein Structural Domain Classification

DataSheet_1_Computational Design of gRNAs Targeting Genetic Variants Across...

[Dataset] Data for the course "Population Genomics" at Aarhus UniversitySee More Versions

[Dataset] Data for the course "Population Genomics" at Aarhus University