Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlights the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
Facebook
TwitterPROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
Facebook
TwitterCATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Conserved syntenic regions among publicly available cotton genomes were analyzed by CottonGen and made available using the Tripal Synteny Viewer developed by the Fei Bioinformatics Lab from the Boyce Thomson Institute at Cornell University. Analysis was done using MCScanX (Wang et al. 2012) with default settings and blast files were made using blastp with an expectation value cutoff < 1e-10, maximum alignment of 5, and maximum scores of 5. The synteny viewer displays all the conserved syntenic blocks between a selected chromosome of a genome and another genome in a circular and tabular layout. Once a block is chosen in the circular or tabular layout, all the genes in the block are shown in a graphic and tabular format. The gene names have hyperlinks to gene pages where detailed information of the gene can be accessed. The ‘synteny’ section of the gene page displays all the orthologs and the paralogs with link to the corresponding syntenic blocks or gene pages. Resources in this dataset:Resource Title: Website Pointer for Cottongen Synteny Viewer. File Name: Web Page, url: https://www.cottongen.org/synview/search Synteny among Cotton genomes can be viewed using the new Tripal Synteny Viewer. Conserved syntenic regions among publicly available cotton genomes were analyzed by CottonGen and made available using the Tripal Synteny Viewer developed by the Fei Bioinformatics Lab from the Boyce Thomson Institute at Cornell University. Analysis was done using MCScanX (Wang et al. 2012) with default settings and blast files were made using blastp with an expectation value cutoff < 1e-10, maximum alignment of 5, and maximum scores of 5. The synteny viewer dynamically displays all the conserved syntenic blocks between a selected chromosome of a genome and another genome in a circular and tabular layout. Once a block is chosen in the circular or tabular layout, all the genes in the block are shown in a graphic and tabular format. The gene names have hyperlinks to gene pages where detailed information of the gene can be accessed. The ‘synteny’ section of the gene page displays all the orthologs and the paralogs with link to the corresponding syntenic blocks or gene pages.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ribosomal proteins are building blocks of ribosome, and are thus essential proteins of any cell. Work reported in the literature has eluded to the important roles ribosomal proteins in ensuring cell viability, as well as encoding phylogeny of a species in addition to 16S or 18S rRNA. It is such phylogenetic significance that lends ribosomal protein use in different microbial identification protocols such as matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) or other soft ionization mass spectrometry proteomics approaches. To support this role of ribosomal proteins, this work collated the ribosomal protein complement of different microbial species to help build a comprehensive library of ribosomal proteins useful to help annotate ribosomal protein mass peaks. Concatenating protein name, amino acid sequence, nucleotide sequence, number of residues and molecular weight, the library should find use in various methodologies that require annotation of the protein mass peaks.
Facebook
TwitterThis is the HQSNP DB (high-quality SNP database) developed by CHG bioinformatics group. The high-quality SNP is defined as a SNP having allele frequency or genotyping data. The majority of the HQSNPs come from HapMap, others come from JSNP (Japanese SNP database), TSC (The SNP Consortium), Affymetrix 120K SNP, and Perlegen SNP. There are four kinds of SNP search you can do: * Get SNPs by dbSNP rs#: Choose this search if you have already selected a list of SNPs and you just want to get the SNP information. The program will generate a Excel file containing the SNP flanking sequence, variation, quality, function, etc. In the Excel file, there are 10 highlighted fields. You can send only those highlighted information to Illumina to get SNP pre-score. (The same fields are presented in other types of searches as well.) * Get gene SNPs by gene names: Choose this search if you have a list of gene names and you want to get the SNP information in these genes. The gene name can be official gene symbol, Ensembl gene ID, RefSeq accession ID, LocusLink number, etc. * Get gene SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get all gene SNP information in these regions. The software will find all the Ensembl genes in the regions and find SNPs associated to each Ensembl gene. * Get genome scan SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get evenly spaced SNPs in these regions. A SNP selection tool (SNPselector) was built upon HQSNP. It took snp ID list, gene name list, or genome region list as input and searched SNPs for genome scan or gene assoctiation study. It could take an optional ABI SNP file (exported from ABI SNP search web page) as input for checking whether the candidate SNP is available from ABI. It could also take an optional Illumina SNP pre-score file as input to select SNP for Illumina SNP assay. It generated results sorted by tag SNP in LD block, SNP quality, SNP function, SNP regulatory potential, and SNP mutation risk. SNPselector is now retired from public use (as of September 30, 2010).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MicroRNAs (miRNA) are small endogenous RNA molecules, which regulate target gene expression at post-transcriptional level. Besides, miRNA activity can be controlled by a newly discovered regulatory mechanism called endogenous target mimicry (eTM). In target mimicry, eTMs bind to the corresponding miRNAs to block the binding of specific transcript leading to increase mRNA expression. Thus, miRNA-eTM-target-mRNA regulation modules involving a wide range of biological processes; an increasing need for a comprehensive eTM database arose. Except miRSponge with limited number of Arabidopsis eTM data no available database and/or repository was developed and released for plant eTMs yet. Here, we present an online plant eTM database, called PeTMbase (http://petmbase.org), with a highly efficient search tool. To establish the repository a number of identified eTMs was obtained utilizing from high-throughput RNA-sequencing data of 11 plant species. Each transcriptome libraries is first mapped to corresponding plant genome, then long non-coding RNA (lncRNA) transcripts are characterized. Furthermore, additional lncRNAs retrieved from GREENC and PNRD were incorporated into the lncRNA catalog. Then, utilizing the lncRNA and miRNA sources a total of 2,728 eTMs were successfully predicted. Our regularly updated database, PeTMbase, provides high quality information regarding miRNA:eTM modules and will aid functional genomics studies particularly, on miRNA regulatory networks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Finalisation of the reconciliation procedure by recomputing the block events based on the final event set and summarizing of ancestral states and events from each gene (sub)family into a pangenome-wide synthesis of species evolutionary history.The event block reports are analogous to the output of step #II.5; the previous and final block set are mostly overlapping, and block_ids of identical blocks are preserved across these sets.The genome history synthesis is generated by ancestral_content.py script ; note it could be done within the same run as as ancestral content reconstruction (previous step #II.7). The output notably includes:- synthesis/ folder: - genome_synthesis.* files: species tree annotated with the sum of events (duplication, transfer, origination, loss, replacement) and of gene counts over all families, in extended Newick format (with bracketed comments at nodes) and as a serialized Python object (pickle) - genome_*_synthesis.nhx files: idem but separating annotation of gene count (states) and event count - phylogenetic_profiles/ folder: tables of state count per gene family or orthologous subfamily, and by species tree node (any or extant nodes = "leaf" only) - annottables/ folder: tables of orthologous gene subfamilies specifically gained/lost at a species tree node, or specifically present/absent in a clade, with the list of corresponding genes in extant genomes, their coordinates and their functional annotation.- ortho_subfam_*.tab files: tables of orthologous subfamilies assignation of gene sequence identifiers or of unique event identifiers, respectively. - specific_gene_dump.tab file: relational database table dump listing the specific gene sets (see annottables/ folder) - phylogenetic_profile_dump.tab file: relational database table dump listing presence/absence states of (sub)families at each species tree node (see - phylogenetic_profiles/ folder)- reconciliation_collection/ folder: updated and final set of reconciled gene trees in phyloXML and serialized Python object (pickle) formats.
Facebook
TwitterIntroductory bioinformatics exercises often walk students through the use of computational tools, but often provide little understanding of what a computational tool does "under the hood." A solid understanding of how a bioinformatics computational algorithm functions, including its limitations, is key for interpreting the output in a biologically relevant context. This introductory bioinformatics exercise integrates an introduction to web-based sequence alignment algorithms with models to facilitate student reflection and appreciation for how computational tools provide similarity output data. The exercise concludes with a set of inquiry-based questions in which students may apply computational tools to solve a real biological problem.
In the module, students first define sequence similarity and then investigate how similarity can be quantitatively compared between two similar length proteins using a Blocks Substitution Matrix (BLOSUM) scoring matrix. Students then look for local regions of similarity between a sequence query and subjects within a large database using Basic Local Alignment Search Tool (BLAST). Lastly, students access text-based FASTA-formatted sequence information via National Center for Biotechnology Information (NCBI) databases as they collect sequences for a multiple sequence alignment using Clustal Omega to generate a phylogram and evaluate evolutionary relationships. The combination of diverse, inquiry-based questions, paper models, and web-based computational resources provides students with a solid basis for more advanced bioinformatics topics and an appreciation for the importance of bioinformatics tools across the discipline of biology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bacteroides thetaiotaomicron is a Gram-negative obligate anaerobe that is a major inhabitant of the human gut. Known for its unique ability to digest complex polysaccharides from plants, the bacterium plays important roles in aiding human digestion and extraction of building blocks and energy from food. However, the bacterium could also become an opportunistic pathogen when displaced from its natural habitat. This work sought to provide fundamental information about the genetic repertoire of B. thetaiotaomicron strain 7330 by parsing its annotated genome sequence from Genbank. Comprising gene name, gene function, and gene sequence, the resource should find use in fundamental and applied microbiology research seeking to explore the metabolic basis of the physiological role of the bacterium in the gut microbiome as well as possible ways in which its genetic repertoire could be exploited in biotechnology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains labeled, weighted networks of chemical-gene, gene-gene, gene-disease, and chemical-disease relationships based on single sentences in PubMed abstracts. All raw dependency paths are provided in addition to the labeled relationships.
PART I: Connects dependency paths to labels, or "themes". Each record contains a dependency path followed by its score for each theme, and indicators of whether or not the path is part of the flagship path set for each theme (meaning that it was manually reviewed and determined to reflect that theme). The themes themselves are listed below and are in our paper (reference below).
PART II: Connects sentences to dependency paths. It consists of sentences and associated metadata, entity pairs found in the sentences, and dependency paths connecting those entity pairs. Each record contains the following information:
The "with-themes.txt" files only contain dependency paths with corresponding theme assignments from Part I. The plain ".txt" files contain all dependency paths.
This release contains the annotated network for the September 15, 2019 version of PubTator. The version discussed in our paper, below, is an older one - from April 30, 2016. If you're interested in that network, it can be found in Version 1 of this repository. We will be releasing updated networks periodically, as the PubTator community continues to release new versions of named entity annotations for Medline each month or so.
------------------------------------------------------------------------------------
REFERENCES
Percha B, Altman RBA (2017) A global network of biomedical relationships derived from text. Bioinformatics, 34(15): 2614-2624.
Percha B, Altman RBA (2015) Learning the structure of biomedical relationships from unstructured text. PLoS Computational Biology, 11(7): e1004216.
This project depends on named entity annotations from the PubTator project:
https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/
Reference:
Wei CH et. al., PubTator: a Web-based text mining tool for assisting Biocuration, Nucleic acids research, 2013, 41 (W1): W518-W522.
Dependency parsing was provided by the Stanford CoreNLP toolkit (version 3.9.1):
https://stanfordnlp.github.io/CoreNLP/index.html
Reference:
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.
------------------------------------------------------------------------------------
THEMES
chemical-gene
(A+) agonism, activation
(A-) antagonism, blocking
(B) binding, ligand (esp. receptors)
(E+) increases expression/production
(E-) decreases expression/production
(E) affects expression/production (neutral)
(N) inhibits
gene-chemical
(O) transport, channels
(K) metabolism, pharmacokinetics
(Z) enzyme activity
chemical-disease
(T) treatment/therapy (including investigatory)
(C) inhibits cell growth (esp. cancers)
(Sa) side effect/adverse event
(Pr) prevents, suppresses
(Pa) alleviates, reduces
(J) role in disease pathogenesis
disease-chemical
(Mp) biomarkers (of disease progression)
gene-disease
(U) causal mutations
(Ud) mutations affecting disease course
(D) drug targets
(J) role in pathogenesis
(Te) possible therapeutic effect
(Y) polymorphisms alter risk
(G) promotes progression
disease-gene
(Md) biomarkers (diagnostic)
(X) overexpression in disease
(L) improper regulation linked to disease
gene-gene
(B) binding, ligand (esp. receptors)
(W) enhances response
(V+) activates, stimulates
(E+) increases expression/production
(E) affects expression/production (neutral)
(I) signaling pathway
(H) same protein or complex
(Rg) regulation
(Q) production by cell population
------------------------------------------------------------------------------------
FORMATTING NOTE
A few users have mentioned that the dependency paths in the "part-i" files are all lowercase text, whereas those in the "part-ii" files maintain the case of the original sentence. This complicates mapping between the two sets of files.
We kept the part-ii files in the same case as the original sentence to facilitate downstream debugging - it's easier to tell which words in a particular sentence are contributing to the dependency path if their original case is maintained. When working with the part-ii "with-themes" files, if you simply convert the dependency path to lowercase, it is guaranteed to match to one of the paths in the corresponding part-i file and you'll be able to get the theme scores.
Apologies for the additional complexity, and please reach out to us if you have any questions (see correspondence information in the Bioinformatics manuscript, above).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This directory contains data sets and scripts used in "host range of SE: a new class of mobile DNA elements nesting in Gammaproteobacteria" authored by Desmila Idola, Hiroshi Mori, Yuji Nagata, Lisa Nonaka, and Hirokazu Yano.
The directories: "Data", "Results", "Scripts", and "synteny_block_search" are used for psi-blast search and visualization of the results.
Synteny block search requires protein database and gff files available from Zenodo repository:
Gammaproteobacteria protein dataset: doi 10.5281/zenodo.5880327 Betaproteobacteria protein dataset: doi 10.5281/zenodo.5885688 Alphaproteobacteria protein dataset: doi 10.5281/zenodo.7839301
"attS_sequencing" directory contains README.txt which contains all commands used for attS amplicon sequencing.
"resequencing" directory contains README.txt and illumina reads assemblies of single gene knockout mutants as well as parent strain BHY606.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
At present, national provisions on copyright and database protection regarding exceptions and limitations for research purposes differ both in detail and substance. Scientists within the EU working with copyright protected works or with protected databases have to be aware that regulations may vary considerably from country to country. This can be a major stumbling block to international collaboration in science. The document addresses legal issues that hamper an integrative system for managing biodiversity knowledge in Europe. It describes the importance for scientists to have access to documents and data in order to synthesize disparate information and to facilitate data mining (or similar research techniques). It explores some aspects of copyright and database protection that influence access to and re-use of biodiversity data and information and refers to exceptions and limitations of copyright or database protection provided for within the relevant EU Directives.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Normal epithelial cells rapidly undergo apoptosis as soon as they lose contact with the extracellular matrix (ECM), which is termed as anoikis. However, cancer cells tend to develop a resistance mechanism to anoikis. This acquired ability is termed as anoikis resistance. Cancer cells, with anoikis resistance, can spread to distant tissues or organs via the peripheral circulatory system and cause cancer metastasis. Thus, inhibition of anoikis resistance blocks the metastatic ability of cancer cells. Anoikis-resistant CAL27 (CAL27AR) cells were induced from CAL27 cells using the suspension culture approach. Transcriptome analysis was performed using RNA-Seq to study the differentially expressed genes (DEGs) between the CAL27AR cells and the parental CAL27 cells. Gene function annotation and Gene Ontology (GO) enrichment analysis were performed using DAVID database. Signaling pathways involved in DEGs were analyzed using Gene Set Enrichment Analysis (GSEA) software. Analysis results were confirmed by reverse transcription PCR (RT-PCR), Western blotting, and gene correlation analysis based on the TCGA database. The figure in here is some full-length uncropped blots about our study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The single nucleotide polymorphism (SNP) rs835487 is associated with hip osteoarthritis (OA) at the genome-wide significance level and is located within CHST11, which codes for carbohydrate sulfotransferase 11. This enzyme post-translationally modifies proteoglycan prior to its deposition in the cartilage extracellular matrix. Using bioinformatics and experimental analyses, our aims were to characterise the rs835487 association signal and to identify the causal functional variant/s. Database searches revealed that rs835487 resides within a linkage disequilibrium (LD) block of only 2.7 kb and is in LD (r2 ≥ 0.8) with six other SNPs. These are all located within intron 2 of CHST11, in a region that has predicted enhancer activity and which shows a high degree of conservation in primates. Luciferase reporter assays revealed that of the seven SNPs, rs835487 and rs835488, which have a pairwise r2 of 0.962, are the top functional candidates; the haplotype composed of the OA-risk conferring G allele of rs835487 and the corresponding T allele of rs835488 (the G-T haplotype) demonstrated significantly different enhancer activity relative to the haplotype composed of the non-risk A allele of rs835487 and the corresponding C allele of rs835488 (the A-C haplotype) (p < 0.001). Electrophoretic mobility shift assays and supershifts identified several transcription factors that bind more strongly to the risk-conferring G and T alleles of the two SNPs, including SP1, SP3, YY1 and SUB1. CHST11 was found to be upregulated in OA versus non-OA cartilage (p < 0.001) and was expressed dynamically during chondrogenesis. Its expression in adult cartilage did not however correlate with rs835487 genotype. Our data demonstrate that the OA susceptibility is mediated by differential protein binding to the alleles of rs835487 and rs835488, which are located within an enhancer whose target may be CHST11 during chondrogenesis or an alternative gene.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background Gastric cancer is the world’s leading tumor disease in terms of morbidity and mortality and is currently treated clinically with a comprehensive approach based on surgery. Studies have demonstrated the antitumor effects of neferine, but the anti-cancer mechanism for gastric cancer is not yet clear. Methods The Pubchem and Swiss TargetPrediction databases were searched to retrieve the targets of action of neferine. Meanwhile, relevant gene expression data were downloaded by means of the Gene Expression Omnibus(GEO) database to screen for differential genes and build a drug-disease network. The selected genes were analysed by bioinformatics analysis. Finally, gastric cancer treatment potential of neferine was determined through molecular docking. The molecular mechanism of neferine in the treatment of gastric cancer was verified by CCK8 assay, monoclonal assay, apoptotic and cycle assay, qRT-PCR and Western Blot. Results The results of network pharmacological analyses illustrate that the core genes are closely related to apoptosis, cell cycle, and cell proliferation. Through molecular docking, it was confirmed that neferine were closely related to key proteins. The results of in vitro experiments indicated that neferine could significantly inhibit the viability of gastric cancer cells, induce apoptosis of gastric cancer cells, and block the cell cycle of gastric cancer cells in the G0/G1 phase. Conclusion In summary, neferine inhibited the proliferation of gastric cancer cells through the CDK4/CDK6/CyclinD1 complex. This study provides a theoretical basis for the treatment of gastric cancer with neferine and an idea for the development of neferine for gastric cancer.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlights the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.