PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
The relationship between protein structure and function is a foundational concept in undergraduate biochemistry. We find this theme is best presented with assignments that encourage exploration and analysis. Here, we share a series of four assignments that use open-source, online molecular visualization and bioinformatics tools to examine the interaction between the SARS-CoV-2 spike protein and the ACE2 receptor. The interaction between these two proteins initiates SARS-CoV-2 infection of human host cells and is the cause of COVID-19. In assignment I, students identify sequences with homology to the SARS-CoV-2 spike protein and use them to build a primary sequence alignment. Students make connections to a linked primary research article as an example of how scientists use molecular and phylogenetic analysis to explore the origins of a novel virus. Assignments II through IV teach students to use an online molecular visualization tool for analysis of secondary, tertiary, and quaternary structure. Emphasis is placed on identification of noncovalent interactions that stabilize the SARS-CoV-2 spike protein and mediate its interaction with ACE2. We assigned this project to upper-level undergraduate biochemistry students at a public university and liberal arts college. Students in our courses completed the project as individual homework assignments. However, we can easily envision implementation of this project during multiple in-class sessions or in a biochemistry laboratory using in-person or remote learning. We share this project as a resource for instructors who aim to teach protein structure and function using inquiry-based molecular visualization activities.
Primary image: Exploration of SARS-CoV-2 spike protein: student generated data from assignments I - IV. Includes examples of figures submitted by students, including a sequence alignment and representations of 3D protein structure generated using UCSF Chimera. The primary image includes student generated data and a cartoon from Pixabay, an online repository of copyright free art.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for 2,631 draft genomes (TOBG-GENOMES.tar.gz) generated using the Tara Oceans microbial metagenomic data sets - additionally includes:SECONDARY_CONTIGS.province.tar.gz -- SECONDARY contigs by provinceTOBG-READCOUNT.tar.gz -- read count values of each sample against SECONDARY contigsTOBG-BINS.tar.gz -- genome bins 5 contigs)PRIMARY_CONTIGS.province.tar.gz -- PRIMARY contigs by province and sampleLarger files have been split - to restore the full tar ball, use cat to combine. E.G. cat PRIMARY_CONTIGS.MEDITERRANEAN0* > PRIMARY_CONTIGS.MEDITERRANEAN.tar.gz cat And then decompressIndian Monsoon = 1 filesArabian Sea = 1 fileRed Sea = 1 fileMediterranean = 2 filesEast Africa Coastal = 2 filesChile-Peru Coastal = 2 filesNorth Pacific = 2 filesNorth Atlantic = 4 filesSouth Atlantic = 4 filesSouth Pacific = 7 files
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We have taken water sample during winter and rainy season from Rabindra sarovar lake located at south kolkata and did metagenomic analysis. For metagenome analysis we followed genome resolved metagenomics approach. In this method samples are assembled after sequencing, then assembled samples are binned using binning tool to recover genome bins(n=27). Here in this repository we have uploaded recovered genome bins (.fna) and their respective 16S (.fa) as well as protein sequences (.faa) with annotation. A newick tree (.nwk) which infer phylogenetic position of 27 bins is addded here. This tree can edit anytime using ITOLv.4.In-addition we have also uploaded assembled data of both season(.fasta).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Many animals utilise self-built structures (extended phenotypes) to enhance body functions, such as thermoregulation, prey capture or defence. Yet, it is unclear whether the evolution of animal constructions supplements or substitutes body functions – with disparate feedbacks on trait evolution. Here, using brown spiders (Araneae: marronoid clade), we explored if the evolutionary loss and gain of silken webs as extended prey capture devices correlates with alterations in traits known to play an important role in predatory strikes – locomotor performance (sprint speed) and leg spination (expression of capture spines on front legs). We found that in this group high locomotor performance, with running speeds of over 100 body lengths per second, evolved repeatedly – both in web building and cursorial spiders. There was no correlation with running speed, and leg spination was only poorly correlated, relative to the use of extended phenotypes, indicating that web use does not reduce selective pressures on body functions involved in prey capture and defence per se. Consequently, extended prey capture devices serve as supplements rather than substitutions to body traits and may only be beneficial in conjunction with certain life history traits, explaining the rare evolution and repeated loss of trapping strategies in predatory animals. Methods Animal collection and material sourcing Spiders were collected in New South Wales, South Queensland, Tasmania, the South Island of New Zealand and Germany under scientific licenses SL101868, FA18285, PTU19-001938 and 71225-RES. Tissue samples and specimens for morphology for some species were sourced from museum and institutional collections. Species were identified with primary or (if available) secondary taxonomic literature. In addition, in some cases, specimens were compared with type specimens for taxonomic identification. Vouchers were preserved in ethanol and deposited at curated arachnological collections. The full list of specimens used in the phylogenomic study, including their collection data and voucher locations are found in supplemental material S1 and S2 of the associated article. DNA extraction and UCE analysis Genomic DNA extraction of all samples was performed using either the leg(s) or the whole specimen (depending on the size of the spider), following the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA) manufacturer’s protocol, and quantified using a Qubit fluorometer (Life Technologies, Inc.). UCE library preparations were performed following the protocol of Starrett et al. [1] and Derkarabetian et al. [2] as well as the Hybridization Capture for Targeted NGS manual v4.01 protocol (https://arborbiosci.com/wp-content/uploads/2018/04/myBaits-Manual-v4.pdf). Library preparation for a subset of the samples (n = 23) was conducted using the MYbaits Arachnida 1.1Kv1 kit (Arbor Biosciences, Ann Arbor, MI, USA) 1 and sequenced on a NovaSeq 6000 at the Bauer Core Facility at Harvard University. For the remaining samples (n = 75), the extracted DNA was dried using an Eppendorf Concentrator plus speed-vac and transported to NGS Division, Arbor Biosciences (Ann Arbor, MI) for UCE library preparation using the Spider 2Kv1 kit [3]. Processing of the raw demultiplexed read data was performed using the PHYLUCE v1.6.8 pipeline [4]. Reads were cleaned with the Trimmomatic wrapper [5] and Illumiprocessor [6], using default settings, and then assembled using both Trinity v2.1.1 [7], with default settings, and ABySS v1.5.2 8, and the results combined into a single assembly file. Probes were matched to contigs using the Spider 2Kv1 probeset file using minimum coverage and minimum identity values of 65. The UCE loci were aligned using MAFFT [9] and trimmed using GBLOCKS [10, 11] with custom gblocks settings (b1 = 0.5, b2 = 0.5, b3 = 6, b4 = 6) applied in the PHYLUCE pipeline. Aligned UCEs were then imported into Geneious 11.1.5 [12] and visually inspected for obvious alignment or sequencing errors. Method references [1] Starrett, J., Derkarabetian, S., Hedin, M., Bryson Jr, R.W., McCormack, J.E. & Faircloth, B.C. 2017 High phylogenetic utility of an ultraconserved element probe set designed for Arachnida. Molecular Ecology Resources 17, 812-823. [2] Derkarabetian, S., Benavides, L.R. & Giribet, G. 2019 Sequence capture phylogenomics of historical ethanol‐preserved museum specimens: Unlocking the rest of the vault. Molecular Ecology Resources 19, 1531-1544. [3] Kulkarni, S., Wood, H., Lloyd, M. & Hormiga, G. 2020 Spider‐specific probe set for ultraconserved elements offers new perspectives on the evolutionary history of spiders (Arachnida, Araneae). Molecular Ecology Resources 20, 185-203. [4] Faircloth, B.C. 2016 PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics 32, 786-788. [5] Bolger, A.M., Lohse, M. & Usadel, B. 2014 Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120. [6] Faircloth, B. 2013 Illumiprocessor: a trimmomatic wrapper for parallel adapter and quality trimming. (doi:https://doi.org/10.6079/J9ILL). [7] Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R. & Zeng, Q. 2011 Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology 29, 644-652. [8] Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J. & Birol, I. 2009 ABySS: a parallel assembler for short read sequence data. Genome Res 19, 1117-1123. [9] Katoh, K. & Standley, D.M. 2013 MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30, 772-780. [10] Castresana, J. 2000 Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17, 540-552. [11] Talavera, G. & Castresana, J. 2007 Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577. [12] Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S. & Duran, C. 2012 Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647-1649.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset primarily contains images of cells from a large variety of fluorescent markers and image modalities, as well as ground truth segmentations. The dataset also contains a few images of non-cell biological structures and other natural images that can be naturally segmented into repetitive structures. In some cases the nucleus is labelled in a secondary blue channel. The dataset was used to train the Cellpose segmentation algorithm, and can be accessed at https://www.cellpose.org/dataset. For a complete description of the dataset, see the related paper: Cellpose: a generalist algorithm for cellular segmentationCarsen Stringer, Tim Wang, Michalis Michaelos, Marius PachitariubioRxiv 2020.02.02.931238; doi: https://doi.org/10.1101/2020.02.02.931238
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Subfamilies of cytochrome P450 proteins have been strongly linked to the metabolism of physiologically disruptive compounds such as alkaloids, terpenoids, and other xenobiotics. Consistent with this function, these genes have adaptively evolved in response to environmental pressures exerted on animals, such as herbivores, that consume elevated amounts of toxic xenobiotics or plant secondary metabolites (PSMs). Theory on evolutionary tradeoffs predicts that highly specialized herbivores should exhibit a relatively narrow toolkit of adaptations to accommodate the concomitantly narrow arrays of PSMs in their diets. The bamboo lemurs of Madagascar (genera Prolemur and Hapalemur) represent an interesting test case for this theory because of their dietary hyper-specialization, as these lemurs consume bamboo and grasses at rates otherwise unseen in the order Primates. To test whether the hyper-specialized folivory of these primates is reflected in a similarly specialized and narrow P450 gene suite, we assembled a dataset of confidently assembled CYP1-3 genes for two species of bamboo lemur and 13 additional lemur species. With this dataset, we tested the predictions that bamboo lemurs would exhibit, first, greater rates of gene loss for xenobiotic-metabolizing P450s and, second, relaxed selection on xenobiotic-metabolizing P450 subfamilies relative to lemurs without such dietary hyper-specialization. We found support for the prediction of gene loss in the CYP2B, CYP2C, CYP2D, CYP2J, and CYP3A subfamilies, all of which encode xenobiotic metabolizers. We inferred relaxation of selection for the CYP1A and CYP2D subfamilies. The CYP2F subfamily exhibited a signal of significant intensification of selection in the bamboo-lemur lineage. The evolution of the P450 genes in bamboo lemurs provides support for the evolutionary tradeoff hypothesis, and we further hypothesize that, rather than adapting to a general array of PSMs, bamboo lemurs have instead adapted to the primary toxin in their diet, the highly potent poison cyanide.
Methods
Data gathering
In addition to a novel genome assembly for Hapalemur griseus, we mined data from publicly available genome assemblies for 14 species: Prolemur simus (Hawkins et al., 2018), Lemur catta (Palmada-Flores et al., 2022), Eulemur flavifrons and E. macaco (Meyer et al. 2015), Propithecus coquereli (Lowe and Eddy 1997; Guevara et al. 2021), Indri indri (accession number: GCA_004363605.1), Daubentonia madagascariensis (accession number: GCA_004027145.1), Mirza coquereli (accession number: GCA_004024645.1), Mirza zaza (Hunnicutt et al., 2020), and Microcebus murinus (Averdam et al., 2011; Lecompte et al., 2016), as well as the following additional species of mouse lemur: Mic. griseorufus, Mic. mittermeieri, Mic. ravelobensis, and Mic. tavaratra (Hunnicutt et al., 2020). These assemblies, along with any associated annotation files, were downloaded locally and formatted into BLAST databases within Geneious Prime, version 2022.1.1.
We located the loci for all annotated CYP1-3 homologs in the L. catta, Prop. coquereli, and Mic. murinus by using the associated annotation (GFF3) files for each. We defined these loci by the non-P450 genes that bounded them; therefore, those surrounding genes were used initially as queries for local BLAST searches. In this way, each locus was linked to two searches per species. The three reference genomes listed above were used because they are all members of separate strepsirrhine families (Lemuridae, Indriidae, and Cheirogaleidae, respectively), and they were each therefore used as a starting point to extract the desired CYP1-3 genes or loci for confamilial species. Ideally, a pair of BLAST searches would return results that included the same scaffold. By locating both BLAST hits on each of these scaffolds, we were able to extract genomic regions that were hypothetically orthologous to those P450 loci in the L. catta assembly. After locating the scaffolds in each assembly corresponding to each P450 locus, we used LASTZ (Harris, 2007) to interrogate the homology of those scaffolds by aligning them to the confirmed P450 locus from the appropriate confamilial reference genome. Positive results from these alignments were checked using the Mauve genome aligner (Darling et al., 2004) on the same sequences. If output from both of these aligners indicated that the reference had homology with the query scaffold(s), then the annotations from the reference genome were used to extract the corresponding sequence in the other species’ genome. In this way, we mined the genome assemblies listed above for as many complete P450 genes loci as we could confidently locate.
Inference of gene birth and death
For this first portion of the study, we used only species for which each CYP1-3 locus could be wholly collected from a single scaffold or reasonably reconstructed if not found on a single scaffold using the process described above. In order to model the events of gene birth and death in this subset of lemur species, our alignment strategy followed a similar workflow as outlined in previous work with other datasets (Chaney et al., 2018, 2020), but several modifications were made for this project in order to allow for more standardization and automation across subfamilies. First, all of the P450 genes were extracted from each species’ locus according to the annotation file associated with its confamilial reference. Then, all of the genes from a given P450 subfamily were aligned using MAFFT (Katoh & Standley, 2013), and the resulting alignment was stripped of all sites containing any gaps using trimAl (Capella-Gutierrez et al., 2009). After the best-fitting nucleotide substitution model was inferred by jModelTest (Darriba et al., 2012), this stripped alignment was visualized with PhyML 3.0 and the strength of that resulting phylogenetic tree was tested by comparing it to 1000 bootstrap replicates (Guindon et al., 2010).
The gene trees constructed with PhyML were then passed to Possvm (Grau-Bové & Sebé-Pedrós, 2021). This program uses the intrinsic information contained in a phylogram to infer speciation and gene-duplication events; it does this using the species-overlap algorithm in the ETE3 toolkit (Huerta-Cepas et al., 2007, 2016). Briefly, this algorithm compares the intersection of species present in both descendants of an internal node of a tree to the union of species present in those descendants; using these values, the algorithm computes a species-overlap score which it then uses to identify each internal node as either a speciation event, having an overlap score, or a duplication event, having a high overlap score (Huerta-Cepas et al., 2007). Once the identities of each node were estimated in this way, we then manually examined each subtree rooted by a node called as a duplication event to infer whether any gene loss had occurred. This was examined on a case-by-case basis using the reasoning that, after a duplication event, each descendant of that node should recapitulate the organismal phylogeny present at the time of duplication. Therefore, any species missing in one of those subtrees must have lost one of the duplicates born in the earlier duplication event as long as the subtree in question was well-resolved in terms of bootstrap support. In cases where multiple species lineages may be absent, we deferred to the parsimonious hypothesis that a loss event would have occurred prior to the divergence of those lineages, rather than a more complicated hypothesis that the same paralog had been independently lost in both species after their split. We visualized the Possvm output using the program Treerecs (Comte et al., 2020) and then, in some cases, manually modified the depicted gene-evolution scenario in order to accommodate the Possvm results.
References
Averdam, A., Kuschal, C., Otto, N., Westphal, N., Roos, C., Reinhardt, R., & Walter, L. (2011). Sequence analysis of the grey mouse lemur (Microcebus murinus) MHC class II DQ and DR region. Immunogenetics, 63(2), 85–93. https://doi.org/10.1007/s00251-010-0487-3
Capella-Gutierrez, S., Silla-Martinez, J. M., & Gabaldon, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972–1973. https://doi.org/10.1093/bioinformatics/btp348
Chaney, M. E., Piontkivska, H., & Tosi, A. J. (2018). Retained duplications and deletions of CYP2C genes among primates. Molecular Phylogenetics and Evolution, 125, 204–212. https://doi.org/10.1016/j.ympev.2018.03.037
Chaney, M. E., Romine, M. G., Piontkivska, H., & Tosi, A. J. (2020). Diversifying selection detected in only a minority of xenobiotic-metabolizing CYP1-3 genes among primate species. Xenobiotica, 50. https://doi.org/10.1080/00498254.2020.1785580
Comte, N., Morel, B., Hasić, D., Guéguen, L., Boussau, B., Daubin, V., Penel, S., Scornavacca, C., Gouy, M., Stamatakis, A., Tannier, E., & Parsons, D. P. (2020). Treerecs: An integrated phylogenetic tool, from sequences to reconciliations. Bioinformatics, 36(18), 4822–4824. https://doi.org/10.1093/bioinformatics/btaa615
Darling, A. C. E., Mau, B., Blattner, F. R., & Perna, N. T. (2004). Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Research, 14(7), 1394–1403. https://doi.org/10.1101/gr.2289704
Darriba, D., Taboada, G. L., Doalla, R., & Posada, D. (2012). jModelTest 2: More models, new heuristics and parallel computing. Nature Methods, 9(8), 772.
Guevara, E. E., Webster, T. H., Lawler, R. R., Bradley, B. J., Greene, L. K., Ranaivonasy, J., Ratsirarson, J., Harris, R. A., Liu, Y., Murali, S., Raveendran, M., Hughes, D. S. T., Muzny, D. M., Yoder, A. D., Worley, K. C., & Rogers, J. (2021). Comparative genomic analysis of sifakas (Propithecus) reveals selection for folivory and high heterozygosity despite endangered status. Science Advances, 7(17), 1–13. https://doi.org/10.1126/sciadv.abd2274
Guindon, S., Dufayard,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Holm oak (Quercus ilex) is the most important and representative species of the Mediterranean forest and of the Spanish agrosilvo-pastoral “dehesa” ecosystem. Despite its environmental and economic interest, Holm oak is an orphan species whose biology is very little known, especially at the molecular level. In order to increase the knowledge on the chemical composition and metabolism of this tree species, the employment of a holistic and multi-omics approach, in the Systems Biology direction would be necessary. However, for orphan and recalcitrant plant species, specific analytical and bioinformatics tools have to be developed in order to obtain adequate quality and data-density before to coping with the study of its biology. By using a plant sample consisting of a pool generated by mixing equal amounts of homogenized tissue from acorn embryo, leaves, and roots, protocols for transcriptome (NGS-Illumina), proteome (shotgun LC-MS/MS), and metabolome (GC-MS) studies have been optimized. These analyses resulted in the identification of around 62629 transcripts, 2380 protein species, and 62 metabolites. Data are compared with those reported for model plant species, whose genome has been sequenced and is well annotated, including Arabidopsis, japonica rice, poplar, and eucalyptus. RNA and protein sequencing favored each other, increasing the number and confidence of the proteins identified and correcting erroneous RNA sequences. The integration of the large amount of data reported using bioinformatics tools allows the Holm oak metabolic network to be partially reconstructed: from the 127 metabolic pathways reported in KEGG pathway database, 123 metabolic pathways can be visualized when using the described methodology. They included: carbohydrate and energy metabolism, amino acid metabolism, lipid metabolism, nucleotide metabolism, and biosynthesis of secondary metabolites. The TCA cycle was the pathway most represented with 5 out of 10 metabolites, 6 out of 8 protein enzymes, and 8 out of 8 enzyme transcripts. On the other hand, gaps, missed pathways, included metabolism of terpenoids and polyketides and lipid metabolism. The multi-omics resource generated in this work will set the basis for ongoing and future studies, bringing the Holm oak closer to model species, to obtain a better understanding of the molecular mechanisms underlying phenotypes of interest (productive, tolerant to environmental cues, nutraceutical value) and to select elite genotypes to be used in restoration and reforestation programs, especially in a future climate change scenario.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].