16 datasets found

Data from: Knowledge-based prediction of protein backbone conformation using...
zenodo.org
data.niaid.nih.gov
+1more
application/gzip, txt
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann; Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann (2022). Data from: Knowledge-based prediction of protein backbone conformation using a structural alphabet [Dataset]. http://doi.org/10.5061/dryad.3f5q5
Explore at:
txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.3f5q5
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann; Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlights the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
e
Data from: PROSITE
prosite.expasy.org
identifiers.org
+7more
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE [Dataset]. https://prosite.expasy.org/
Explore at:
Dataset updated
Oct 15, 2025
Description
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
c
Protein Structural Domain Classification
cathdb.info
ec.i4cologne.com
+3more
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
Explore at:
Unique identifier
https://identifiers.org/MIR:00100005
Dataset updated
Sep 30, 2024
Description
CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.
u
Data from: CottonGen Synteny Viewer
agdatacommons.nal.usda.gov
datasetcatalog.nlm.nih.gov
+2more
bin
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main (2024). CottonGen Synteny Viewer [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/CottonGen_Synteny_Viewer/24853278
Explore at:
binAvailable download formats
Dataset updated
Feb 13, 2024
Dataset provided by
MainLab, Washington State University
Authors
Taein Lee; Sook Jung; Ksenija Gasic; Todd Campbell; Jing Yu; Jodi Humann; Heidi Hough; Dorrie Main
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Conserved syntenic regions among publicly available cotton genomes were analyzed by CottonGen and made available using the Tripal Synteny Viewer developed by the Fei Bioinformatics Lab from the Boyce Thomson Institute at Cornell University. Analysis was done using MCScanX (Wang et al. 2012) with default settings and blast files were made using blastp with an expectation value cutoff < 1e-10, maximum alignment of 5, and maximum scores of 5. The synteny viewer displays all the conserved syntenic blocks between a selected chromosome of a genome and another genome in a circular and tabular layout. Once a block is chosen in the circular or tabular layout, all the genes in the block are shown in a graphic and tabular format. The gene names have hyperlinks to gene pages where detailed information of the gene can be accessed. The ‘synteny’ section of the gene page displays all the orthologs and the paralogs with link to the corresponding syntenic blocks or gene pages. Resources in this dataset:Resource Title: Website Pointer for Cottongen Synteny Viewer. File Name: Web Page, url: https://www.cottongen.org/synview/search Synteny among Cotton genomes can be viewed using the new Tripal Synteny Viewer. Conserved syntenic regions among publicly available cotton genomes were analyzed by CottonGen and made available using the Tripal Synteny Viewer developed by the Fei Bioinformatics Lab from the Boyce Thomson Institute at Cornell University. Analysis was done using MCScanX (Wang et al. 2012) with default settings and blast files were made using blastp with an expectation value cutoff < 1e-10, maximum alignment of 5, and maximum scores of 5. The synteny viewer dynamically displays all the conserved syntenic blocks between a selected chromosome of a genome and another genome in a circular and tabular layout. Once a block is chosen in the circular or tabular layout, all the genes in the block are shown in a graphic and tabular format. The gene names have hyperlinks to gene pages where detailed information of the gene can be accessed. The ‘synteny’ section of the gene page displays all the orthologs and the paralogs with link to the corresponding syntenic blocks or gene pages.
Comprehensive library of ribosomal proteins of different microbial species
figshare.com
xlsx
Updated Sep 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenfa Ng (2021). Comprehensive library of ribosomal proteins of different microbial species [Dataset]. http://doi.org/10.6084/m9.figshare.16695964.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16695964.v1
Dataset updated
Sep 29, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wenfa Ng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ribosomal proteins are building blocks of ribosome, and are thus essential proteins of any cell. Work reported in the literature has eluded to the important roles ribosomal proteins in ensuring cell viability, as well as encoding phylogeny of a species in addition to 16S or 18S rRNA. It is such phylogenetic significance that lends ribosomal protein use in different microbial identification protocols such as matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) or other soft ionization mass spectrometry proteomics approaches. To support this role of ribosomal proteins, this work collated the ribosomal protein complement of different microbial species to help build a comprehensive library of ribosomal proteins useful to help annotate ribosomal protein mass peaks. Concatenating protein name, amino acid sequence, nucleotide sequence, number of residues and molecular weight, the library should find use in various methodologies that require annotation of the protein mass peaks.
d
High Quality SNP Database
dknet.org
scicrunch.org
+2more
Updated May 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). High Quality SNP Database [Dataset]. http://identifiers.org/RRID:SCR_007230
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007230
Dataset updated
May 11, 2024
Description
This is the HQSNP DB (high-quality SNP database) developed by CHG bioinformatics group. The high-quality SNP is defined as a SNP having allele frequency or genotyping data. The majority of the HQSNPs come from HapMap, others come from JSNP (Japanese SNP database), TSC (The SNP Consortium), Affymetrix 120K SNP, and Perlegen SNP. There are four kinds of SNP search you can do: * Get SNPs by dbSNP rs#: Choose this search if you have already selected a list of SNPs and you just want to get the SNP information. The program will generate a Excel file containing the SNP flanking sequence, variation, quality, function, etc. In the Excel file, there are 10 highlighted fields. You can send only those highlighted information to Illumina to get SNP pre-score. (The same fields are presented in other types of searches as well.) * Get gene SNPs by gene names: Choose this search if you have a list of gene names and you want to get the SNP information in these genes. The gene name can be official gene symbol, Ensembl gene ID, RefSeq accession ID, LocusLink number, etc. * Get gene SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get all gene SNP information in these regions. The software will find all the Ensembl genes in the regions and find SNPs associated to each Ensembl gene. * Get genome scan SNPs by genome regions: Choose this search if you have a list of genome regions and you want to get evenly spaced SNPs in these regions. A SNP selection tool (SNPselector) was built upon HQSNP. It took snp ID list, gene name list, or genome region list as input and searched SNPs for genome scan or gene assoctiation study. It could take an optional ABI SNP file (exported from ABI SNP search web page) as input for checking whether the candidate SNP is available from ABI. It could also take an optional Illumina SNP pre-score file as input to select SNP for Illumina SNP assay. It generated results sorted by tag SNP in LD block, SNP quality, SNP function, SNP regulatory potential, and SNP mutation risk. SNPselector is now retired from public use (as of September 30, 2010).
m
Data from: PeTMbase: A database of plant endogenous target mimics (eTMs)
data.mendeley.com
Updated Nov 23, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gökhan Karakülah (2016). PeTMbase: A database of plant endogenous target mimics (eTMs) [Dataset]. http://doi.org/10.17632/htgxryrcv2.1
Explore at:
Unique identifier
https://doi.org/10.17632/htgxryrcv2.1
Dataset updated
Nov 23, 2016
Authors
Gökhan Karakülah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MicroRNAs (miRNA) are small endogenous RNA molecules, which regulate target gene expression at post-transcriptional level. Besides, miRNA activity can be controlled by a newly discovered regulatory mechanism called endogenous target mimicry (eTM). In target mimicry, eTMs bind to the corresponding miRNAs to block the binding of specific transcript leading to increase mRNA expression. Thus, miRNA-eTM-target-mRNA regulation modules involving a wide range of biological processes; an increasing need for a comprehensive eTM database arose. Except miRSponge with limited number of Arabidopsis eTM data no available database and/or repository was developed and released for plant eTMs yet. Here, we present an online plant eTM database, called PeTMbase (http://petmbase.org), with a highly efficient search tool. To establish the repository a number of identified eTMs was obtained utilizing from high-throughput RNA-sequencing data of 11 plant species. Each transcriptome libraries is first mapped to corresponding plant genome, then long non-coding RNA (lncRNA) transcripts are characterized. Furthermore, additional lncRNAs retrieved from GREENC and PNRD were incorporated into the lncRNA catalog. Then, utilizing the lncRNA and miRNA sources a total of 2,728 eTMs were successfully predicted. Our regularly updated database, PeTMbase, provides high quality information regarding miRNA:eTM modules and will aid functional genomics studies particularly, on miRNA regulatory networks.
II.8 - Agrogenom synthesis of genome evolution scenarios
figshare.com
application/gzip
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florent Lassalle (2023). II.8 - Agrogenom synthesis of genome evolution scenarios [Dataset]. http://doi.org/10.6084/m9.figshare.4924439.v3
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4924439.v3
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Florent Lassalle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finalisation of the reconciliation procedure by recomputing the block events based on the final event set and summarizing of ancestral states and events from each gene (sub)family into a pangenome-wide synthesis of species evolutionary history.The event block reports are analogous to the output of step #II.5; the previous and final block set are mostly overlapping, and block_ids of identical blocks are preserved across these sets.The genome history synthesis is generated by ancestral_content.py script ; note it could be done within the same run as as ancestral content reconstruction (previous step #II.7). The output notably includes:- synthesis/ folder: - genome_synthesis.* files: species tree annotated with the sum of events (duplication, transfer, origination, loss, replacement) and of gene counts over all families, in extended Newick format (with bracketed comments at nodes) and as a serialized Python object (pickle) - genome_*_synthesis.nhx files: idem but separating annotation of gene count (states) and event count - phylogenetic_profiles/ folder: tables of state count per gene family or orthologous subfamily, and by species tree node (any or extant nodes = "leaf" only) - annottables/ folder: tables of orthologous gene subfamilies specifically gained/lost at a species tree node, or specifically present/absent in a clade, with the list of corresponding genes in extant genomes, their coordinates and their functional annotation.- ortho_subfam_*.tab files: tables of orthologous subfamilies assignation of gene sequence identifiers or of unique event identifiers, respectively. - specific_gene_dump.tab file: relational database table dump listing the specific gene sets (see annottables/ folder) - phylogenetic_profile_dump.tab file: relational database table dump listing presence/absence states of (sub)families at each species tree node (see - phylogenetic_profiles/ folder)- reconciliation_collection/ folder: updated and final set of reconciled gene trees in phyloXML and serialized Python object (pickle) formats.
q
Sequence Similarity: An inquiry based and "under the hood" approach for...
qubeshub.org
Updated Aug 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Kleinschmit*; Benita Brink; Steven Roof; Carlos Goller; Sabrina Robertson (2021). Sequence Similarity: An inquiry based and "under the hood" approach for incorporating molecular sequence alignment in introductory undergraduate biology courses [Dataset]. http://doi.org/10.24918/cs.2019.5
Explore at:
Unique identifier
https://doi.org/10.24918/cs.2019.5
Dataset updated
Aug 28, 2021
Dataset provided by
QUBES
Authors
Adam Kleinschmit*; Benita Brink; Steven Roof; Carlos Goller; Sabrina Robertson
Description
Introductory bioinformatics exercises often walk students through the use of computational tools, but often provide little understanding of what a computational tool does "under the hood." A solid understanding of how a bioinformatics computational algorithm functions, including its limitations, is key for interpreting the output in a biologically relevant context. This introductory bioinformatics exercise integrates an introduction to web-based sequence alignment algorithms with models to facilitate student reflection and appreciation for how computational tools provide similarity output data. The exercise concludes with a set of inquiry-based questions in which students may apply computational tools to solve a real biological problem.

In the module, students first define sequence similarity and then investigate how similarity can be quantitatively compared between two similar length proteins using a Blocks Substitution Matrix (BLOSUM) scoring matrix. Students then look for local regions of similarity between a sequence query and subjects within a large database using Basic Local Alignment Search Tool (BLAST). Lastly, students access text-based FASTA-formatted sequence information via National Center for Biotechnology Information (NCBI) databases as they collect sequences for a multiple sequence alignment using Clustal Omega to generate a phylogram and evaluate evolutionary relationships. The combination of diverse, inquiry-based questions, paper models, and web-based computational resources provides students with a solid basis for more advanced bioinformatics topics and an appreciation for the importance of bioinformatics tools across the discipline of biology.
Gene database of Bacteroides thetaiotaomicron strain 7330
figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Aug 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenfa Ng (2020). Gene database of Bacteroides thetaiotaomicron strain 7330 [Dataset]. http://doi.org/10.6084/m9.figshare.12812174.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12812174.v1
Dataset updated
Aug 15, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Wenfa Ng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bacteroides thetaiotaomicron is a Gram-negative obligate anaerobe that is a major inhabitant of the human gut. Known for its unique ability to digest complex polysaccharides from plants, the bacterium plays important roles in aiding human digestion and extraction of building blocks and energy from food. However, the bacterium could also become an opportunistic pathogen when displaced from its natural habitat. This work sought to provide fundamental information about the genetic repertoire of B. thetaiotaomicron strain 7330 by parsing its annotated genome sequence from Genbank. Comprising gene name, gene function, and gene sequence, the resource should find use in fundamental and applied microbiology research seeking to explore the metabolic basis of the physiological role of the bacterium in the gut microbiome as well as possible ways in which its genetic repertoire could be exploited in biotechnology.
Data from: A global network of biomedical relationships derived from text
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bethany Percha; Russ B. Altman; Bethany Percha; Russ B. Altman (2020). A global network of biomedical relationships derived from text [Dataset]. http://doi.org/10.5281/zenodo.3459420
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3459420
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bethany Percha; Russ B. Altman; Bethany Percha; Russ B. Altman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains labeled, weighted networks of chemical-gene, gene-gene, gene-disease, and chemical-disease relationships based on single sentences in PubMed abstracts. All raw dependency paths are provided in addition to the labeled relationships.

PART I: Connects dependency paths to labels, or "themes". Each record contains a dependency path followed by its score for each theme, and indicators of whether or not the path is part of the flagship path set for each theme (meaning that it was manually reviewed and determined to reflect that theme). The themes themselves are listed below and are in our paper (reference below).

PART II: Connects sentences to dependency paths. It consists of sentences and associated metadata, entity pairs found in the sentences, and dependency paths connecting those entity pairs. Each record contains the following information:

PubMed ID

Sentence number (0 = title)

First entity name, formatted

First entity name, location (characters from start of abstract)

Second entity name, formatted

Second entity name, location

First entity name, raw string

Second entity name, raw string

First entity name, database ID(s)

Second entity name, database ID(s)

First entity type (Chemical, Gene, Disease)

Second entity type (Chemical, Gene, Disease)

Dependency path

Sentence, tokenized

The "with-themes.txt" files only contain dependency paths with corresponding theme assignments from Part I. The plain ".txt" files contain all dependency paths.

This release contains the annotated network for the September 15, 2019 version of PubTator. The version discussed in our paper, below, is an older one - from April 30, 2016. If you're interested in that network, it can be found in Version 1 of this repository. We will be releasing updated networks periodically, as the PubTator community continues to release new versions of named entity annotations for Medline each month or so.

------------------------------------------------------------------------------------
REFERENCES

Percha B, Altman RBA (2017) A global network of biomedical relationships derived from text. Bioinformatics, 34(15): 2614-2624.
Percha B, Altman RBA (2015) Learning the structure of biomedical relationships from unstructured text. PLoS Computational Biology, 11(7): e1004216.

This project depends on named entity annotations from the PubTator project:
https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

Reference:
Wei CH et. al., PubTator: a Web-based text mining tool for assisting Biocuration, Nucleic acids research, 2013, 41 (W1): W518-W522.

Dependency parsing was provided by the Stanford CoreNLP toolkit (version 3.9.1):
https://stanfordnlp.github.io/CoreNLP/index.html

Reference:
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

------------------------------------------------------------------------------------
THEMES

chemical-gene
(A+) agonism, activation
(A-) antagonism, blocking
(B) binding, ligand (esp. receptors)
(E+) increases expression/production
(E-) decreases expression/production
(E) affects expression/production (neutral)
(N) inhibits

gene-chemical
(O) transport, channels
(K) metabolism, pharmacokinetics
(Z) enzyme activity

chemical-disease
(T) treatment/therapy (including investigatory)
(C) inhibits cell growth (esp. cancers)
(Sa) side effect/adverse event
(Pr) prevents, suppresses
(Pa) alleviates, reduces
(J) role in disease pathogenesis

disease-chemical
(Mp) biomarkers (of disease progression)

gene-disease
(U) causal mutations
(Ud) mutations affecting disease course
(D) drug targets
(J) role in pathogenesis
(Te) possible therapeutic effect
(Y) polymorphisms alter risk
(G) promotes progression

disease-gene
(Md) biomarkers (diagnostic)
(X) overexpression in disease
(L) improper regulation linked to disease

gene-gene
(B) binding, ligand (esp. receptors)
(W) enhances response
(V+) activates, stimulates
(E+) increases expression/production
(E) affects expression/production (neutral)
(I) signaling pathway
(H) same protein or complex
(Rg) regulation
(Q) production by cell population

------------------------------------------------------------------------------------
FORMATTING NOTE

A few users have mentioned that the dependency paths in the "part-i" files are all lowercase text, whereas those in the "part-ii" files maintain the case of the original sentence. This complicates mapping between the two sets of files.

We kept the part-ii files in the same case as the original sentence to facilitate downstream debugging - it's easier to tell which words in a particular sentence are contributing to the dependency path if their original case is maintained. When working with the part-ii "with-themes" files, if you simply convert the dependency path to lowercase, it is guaranteed to match to one of the paths in the corresponding part-i file and you'll be able to get the theme scores.

Apologies for the additional complexity, and please reach out to us if you have any questions (see correspondence information in the Bioinformatics manuscript, above).
Data from: Datasets and scripts used in 'Host range of strand-biased...
figshare.com
zip
Updated Apr 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hirokazu Yano (2023). Datasets and scripts used in 'Host range of strand-biased circularizing integrative elements: a new class of mobile DNA elements nesting in Gammaproteobacteria' [Dataset]. http://doi.org/10.6084/m9.figshare.19350761.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19350761.v4
Dataset updated
Apr 23, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hirokazu Yano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This directory contains data sets and scripts used in "host range of SE: a new class of mobile DNA elements nesting in Gammaproteobacteria" authored by Desmila Idola, Hiroshi Mori, Yuji Nagata, Lisa Nonaka, and Hirokazu Yano.

The directories: "Data", "Results", "Scripts", and "synteny_block_search" are used for psi-blast search and visualization of the results.

Synteny block search requires protein database and gff files available from Zenodo repository:

Gammaproteobacteria protein dataset: doi 10.5281/zenodo.5880327 Betaproteobacteria protein dataset: doi 10.5281/zenodo.5885688 Alphaproteobacteria protein dataset: doi 10.5281/zenodo.7839301

"attS_sequencing" directory contains README.txt which contains all commands used for attS amplicon sequencing.

"resequencing" directory contains README.txt and illumina reads assemblies of single gene knockout mutants as well as parent strain BHY606.
Draft policy on Open Access for data and information
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willi Egloff; Donat Agosti; Anton Güntsch; Peter Hoverkamp; Eva Kralt; James Macklin; Daniel Mietchen; Alan Paton; David Patterson; Soraya Sierra (2023). Draft policy on Open Access for data and information [Dataset]. http://doi.org/10.6084/m9.figshare.785751.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.785751.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Willi Egloff; Donat Agosti; Anton Güntsch; Peter Hoverkamp; Eva Kralt; James Macklin; Daniel Mietchen; Alan Paton; David Patterson; Soraya Sierra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
At present, national provisions on copyright and database protection regarding exceptions and limitations for research purposes differ both in detail and substance. Scientists within the EU working with copyright protected works or with protected databases have to be aware that regulations may vary considerably from country to country. This can be a major stumbling block to international collaboration in science. The document addresses legal issues that hamper an integrative system for managing biodiversity knowledge in Europe. It describes the importance for scientists to have access to documents and data in order to synthesize disparate information and to facilitate data mining (or similar research techniques). It explores some aspects of copyright and database protection that influence access to and re-use of biodiversity data and information and refers to exceptions and limitations of copyright or database protection provided for within the relevant EU Directives.
Data from: Transcriptomic study of the mechanism of anoikis resistance in...
figshare.com
pdf
Updated Feb 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen Guo; Jun Jia; Ling-Feng Xu; Hui-Min Li; Wei Wang; Ji-Hua Guo; Rong Jia; Meng-Qi Jia (2019). Transcriptomic study of the mechanism of anoikis resistance in head and neck squamous carcinoma [Dataset]. http://doi.org/10.6084/m9.figshare.7390229.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7390229.v1
Dataset updated
Feb 7, 2019
Dataset provided by
Figsharehttp://figshare.com/
Authors
Chen Guo; Jun Jia; Ling-Feng Xu; Hui-Min Li; Wei Wang; Ji-Hua Guo; Rong Jia; Meng-Qi Jia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Normal epithelial cells rapidly undergo apoptosis as soon as they lose contact with the extracellular matrix (ECM), which is termed as anoikis. However, cancer cells tend to develop a resistance mechanism to anoikis. This acquired ability is termed as anoikis resistance. Cancer cells, with anoikis resistance, can spread to distant tissues or organs via the peripheral circulatory system and cause cancer metastasis. Thus, inhibition of anoikis resistance blocks the metastatic ability of cancer cells. Anoikis-resistant CAL27 (CAL27AR) cells were induced from CAL27 cells using the suspension culture approach. Transcriptome analysis was performed using RNA-Seq to study the differentially expressed genes (DEGs) between the CAL27AR cells and the parental CAL27 cells. Gene function annotation and Gene Ontology (GO) enrichment analysis were performed using DAVID database. Signaling pathways involved in DEGs were analyzed using Gene Set Enrichment Analysis (GSEA) software. Analysis results were confirmed by reverse transcription PCR (RT-PCR), Western blotting, and gene correlation analysis based on the TCGA database. The figure in here is some full-length uncropped blots about our study.
Functional Characterization of the Osteoarthritis Susceptibility Mapping to...
plos.figshare.com
figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louise N. Reynard; Madhushika Ratnayake; Mauro Santibanez-Koref; John Loughlin (2023). Functional Characterization of the Osteoarthritis Susceptibility Mapping to CHST11—A Bioinformatics and Molecular Study [Dataset]. http://doi.org/10.1371/journal.pone.0159024
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0159024
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Louise N. Reynard; Madhushika Ratnayake; Mauro Santibanez-Koref; John Loughlin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The single nucleotide polymorphism (SNP) rs835487 is associated with hip osteoarthritis (OA) at the genome-wide significance level and is located within CHST11, which codes for carbohydrate sulfotransferase 11. This enzyme post-translationally modifies proteoglycan prior to its deposition in the cartilage extracellular matrix. Using bioinformatics and experimental analyses, our aims were to characterise the rs835487 association signal and to identify the causal functional variant/s. Database searches revealed that rs835487 resides within a linkage disequilibrium (LD) block of only 2.7 kb and is in LD (r2 ≥ 0.8) with six other SNPs. These are all located within intron 2 of CHST11, in a region that has predicted enhancer activity and which shows a high degree of conservation in primates. Luciferase reporter assays revealed that of the seven SNPs, rs835487 and rs835488, which have a pairwise r2 of 0.962, are the top functional candidates; the haplotype composed of the OA-risk conferring G allele of rs835487 and the corresponding T allele of rs835488 (the G-T haplotype) demonstrated significantly different enhancer activity relative to the haplotype composed of the non-risk A allele of rs835487 and the corresponding C allele of rs835488 (the A-C haplotype) (p < 0.001). Electrophoretic mobility shift assays and supershifts identified several transcription factors that bind more strongly to the risk-conferring G and T alleles of the two SNPs, including SP1, SP3, YY1 and SUB1. CHST11 was found to be upregulated in OA versus non-OA cartilage (p < 0.001) and was expressed dynamically during chondrogenesis. Its expression in adult cartilage did not however correlate with rs835487 genotype. Our data demonstrate that the OA susceptibility is mediated by differential protein binding to the alleles of rs835487 and rs835488, which are located within an enhancer whose target may be CHST11 during chondrogenesis or an alternative gene.
qRT-PCR primer sequence.
plos.figshare.com
xls
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shicong Huang; Yi Nan; Guoqing Chen; Na Ning; Yuhua Du; Shuai Duan; Weiqiang Li; Ling Yuan (2025). qRT-PCR primer sequence. [Dataset]. http://doi.org/10.1371/journal.pone.0318838.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0318838.t001
Dataset updated
Mar 26, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Shicong Huang; Yi Nan; Guoqing Chen; Na Ning; Yuhua Du; Shuai Duan; Weiqiang Li; Ling Yuan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background Gastric cancer is the world’s leading tumor disease in terms of morbidity and mortality and is currently treated clinically with a comprehensive approach based on surgery. Studies have demonstrated the antitumor effects of neferine, but the anti-cancer mechanism for gastric cancer is not yet clear. Methods The Pubchem and Swiss TargetPrediction databases were searched to retrieve the targets of action of neferine. Meanwhile, relevant gene expression data were downloaded by means of the Gene Expression Omnibus(GEO) database to screen for differential genes and build a drug-disease network. The selected genes were analysed by bioinformatics analysis. Finally, gastric cancer treatment potential of neferine was determined through molecular docking. The molecular mechanism of neferine in the treatment of gastric cancer was verified by CCK8 assay, monoclonal assay, apoptotic and cycle assay, qRT-PCR and Western Blot. Results The results of network pharmacological analyses illustrate that the core genes are closely related to apoptosis, cell cycle, and cell proliferation. Through molecular docking, it was confirmed that neferine were closely related to key proteins. The results of in vitro experiments indicated that neferine could significantly inhibit the viability of gastric cancer cells, induce apoptosis of gastric cancer cells, and block the cell cycle of gastric cancer cells in the G0/G1 phase. Conclusion In summary, neferine inhibited the proliferation of gastric cancer cells through the CDK4/CDK6/CyclinD1 complex. This study provides a theoretical basis for the treatment of gastric cancer with neferine and an idea for the development of neferine for gastric cancer.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann; Iyanar Vetrivel; Swapnil Mahajan; Manoj Tyagi; Lionel Hoffmann; Yves-Henri Sanejouand; Narayanaswamy Srinivasan; Alexandre de Brevern; Frédéric Cadet; Bernard Offmann (2022). Data from: Knowledge-based prediction of protein backbone conformation using a structural alphabet [Dataset]. http://doi.org/10.5061/dryad.3f5q5

Data from: Knowledge-based prediction of protein backbone conformation using a structural alphabet

Explore at:

txt, application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.3f5q5

Dataset updated

May 31, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlights the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.

Clear search

Close search

Google apps

Main menu

Data from: Knowledge-based prediction of protein backbone conformation using...

Data from: PROSITE

Protein Structural Domain Classification

Data from: CottonGen Synteny Viewer

Comprehensive library of ribosomal proteins of different microbial species

High Quality SNP Database

Data from: PeTMbase: A database of plant endogenous target mimics (eTMs)

II.8 - Agrogenom synthesis of genome evolution scenarios

Sequence Similarity: An inquiry based and "under the hood" approach for...

Gene database of Bacteroides thetaiotaomicron strain 7330

Data from: A global network of biomedical relationships derived from text

Data from: Datasets and scripts used in 'Host range of strand-biased...

Draft policy on Open Access for data and information

Data from: Transcriptomic study of the mechanism of anoikis resistance in...

Functional Characterization of the Osteoarthritis Susceptibility Mapping to...

qRT-PCR primer sequence.

Data from: Knowledge-based prediction of protein backbone conformation using a structural alphabet