8 datasets found
  1. Content of the Bioinformatics for Dentistry, with its respective primary...

    • plos.figshare.com
    xls
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin (2024). Content of the Bioinformatics for Dentistry, with its respective primary sources. [Dataset]. http://doi.org/10.1371/journal.pone.0303628.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Content of the Bioinformatics for Dentistry, with its respective primary sources.

  2. f

    Data_Sheet_4_rboAnalyzer: A Software to Improve Characterization of...

    • frontiersin.figshare.com
    zip
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marek Schwarz; Jiří Vohradský; Martin Modrák; Josef Pánek (2023). Data_Sheet_4_rboAnalyzer: A Software to Improve Characterization of Non-coding RNAs From Sequence Database Search Output.ZIP [Dataset]. http://doi.org/10.3389/fgene.2020.00675.s005
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Frontiers
    Authors
    Marek Schwarz; Jiří Vohradský; Martin Modrák; Josef Pánek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Searching for similar sequences in a database via BLAST or a similar tool is one of the most common bioinformatics tasks applied in general, and to non-coding RNAs in particular. However, the results of the search might be difficult to interpret due to the presence of partial matches to the database subject sequences. Here, we present rboAnalyzer – a tool that helps with interpreting sequence search result by (1) extending partial matches into plausible full-length subject sequences, (2) predicting homology of RNAs represented by full-length subject sequences to the query RNA, (3) pooling information across homologous RNAs found in the search results and public databases such as Rfam to predict more reliable secondary structures for all matches, and (4) contextualizing the matches by providing the prediction results and other relevant information in a rich graphical output. Using predicted full-length matches improves secondary structure prediction and makes rboAnalyzer robust with regards to identification of homology. The output of the tool should help the user to reliably characterize non-coding RNAs in BLAST output. The usefulness of the rboAnalyzer and its ability to correctly extend partial matches to full-length is demonstrated on known homologous RNAs. To allow the user to use custom databases and search options, rboAnalyzer accepts any search results as a text file in the BLAST format. The main output is an interactive HTML page displaying the computed characteristics and other context of the matches. The output can also be exported in an appropriate sequence and/or secondary structure formats.

  3. n

    Vienna RNA

    • neuinfo.org
    • scicrunch.org
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Vienna RNA [Dataset]. http://identifiers.org/RRID:SCR_008550
    Explore at:
    Dataset updated
    Oct 11, 2024
    Description

    This server provides programs, web services, and databases, related to our work on RNA secondary structures. For general information and other offerings from our group see the main TBI web server. With the 1st of May 2009 we updated our servers to the Vienna RNA package version 1.8.2! The Vienna RNA Servers: * RNAfold server predicts minimum free energy structures and base pair probabilities from single RNA or DNA sequences. * RNAalifold server predicts consensus secondary structures from an alignment of several related RNA or DNA sequences. You need to upload an alignment. * RNAinverse server allows you to design RNA sequences for any desired target secondary structure. * RNAcofold server allows you to predict the secondary structure of a dimer. * RNAup server allows you to predict the accessibility of a target region. * LocARNA server generates structural alignments from a set of sequences. In collaboration with the Bioinformatics Group Freiburg. * barriers server allows you to get insights into RNA folding kinetics. * RNAz server will assist you in detecting thermodynamically stable and evolutionarily conserved RNA secondary structures in multiple sequence alignments. * Structure conservation analysis server will assist you in detecting evolutionarily conserved RNA secondary structures in multiple sequence alignments. * RNAstrand server allows you to predict the reading direction of evolutionarily conserved RNA secondary structures. * RNAxs server assists you in siRNA design. * Bcheck predicts rnpB genes Downloads Get the Source code for: * the Vienna RNA Package, our basic RNA secondary structure analysis software. * The ALIDOT package for finding conserved structure motifs (add-on) * The barriers program for analysis of RNA folding landscapes. Databases * Atlas of conserved Viral RNA Structures found by ALIDOT

  4. e

    Data from: PROSITE

    • prosite.expasy.org
    • identifiers.org
    • +7more
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE [Dataset]. https://prosite.expasy.org/
    Explore at:
    Dataset updated
    Oct 15, 2025
    Description

    PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].

  5. The Encyclopedia of Domains (TED) structural domains assignments for...

    • zenodo.org
    application/gzip, bz2 +1
    Updated Oct 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones (2024). The Encyclopedia of Domains (TED) structural domains assignments for AlphaFold Database v4 [Dataset]. http://doi.org/10.5281/zenodo.13369203
    Explore at:
    application/gzip, bz2, zipAvailable download formats
    Dataset updated
    Oct 31, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset description:

    The Encyclopedia of Domains (TED) is a joint effort by CATH (Orengo group) and the Jones group at University College London to identify and classify protein domains in AlphaFold2 models from AlphaFold Database version 4, covering over 188 million unique sequences and 324 million domain assignments.

    In this data release, we will be making available to the community a table of domain boundaries and additional metadata on quality (pLDDT, globularity, number of secondary structures), taxonomy and putative CATH SuperFamily or Fold assignments for all 324 million domains in TED100.

    For all chains in the TED-redundant dataset, the attached file contains boundaries predictions, consensus level and information on the TED100 representative.

    Additionally, an archive with chain-level consensus domain assignments are available for 21 model organisms and 25 global health proteomes:

    For both TED100 and TEDredundant we provide domain boundaries predictions outputted by each of the three methods employed in the project (Chainsaw, Merizo, UniDoc).

    We are making available 7,427 novel folds PDB files, identified during the TED classification process with an annotation table sorted by novelty.

    Please use the gunzip command to extract files with a '.gz' extension.

    CATH annotations have been assigned using the FoldSeek algorithm applied in various modes and the FoldClass algorithm, both of which are used to report significant structural similarity to a known CATH domain.
    Note: The TED protocol differs from that of our standard CATH Assignment protocol for superfamily assignment, which also involves HMM-based protocols and manual curation for remote matches.


    This dataset contains:

    • ted_214m_per_chain_segmentation.tsv
      The file contains all 214M protein chains in TED with consensus domain boundaries and proteome information in the following columns.
      1. AFDB_model_ID: chain identifier from AFDB in the format AF-
    • ted_365m_domain_boundaries_consensus_level.tsv.gz
      The file contains all domain assignments in TED100 and TED-redundant (365M) in the format:
      1. TED_ID: TED domain identifier in the format AF-
    • ted_100_324m.domain_summary.cath.globularity.taxid.tsv and novel_folds_set.domain_summary.tsv are header-less with the following columns separated by tabs (.tsv).
    • ted_324m_seq_clustering.cathlabels.tsv
      The file contains the results of the domain sequences clustering with MMseqs2.
      Columns:
      1. Cluster_representative
      2. Cluster_member
      3. CATH code assignment if available i.e. 3.40.50.300 for a domain with a homologous match or 3.20.20 for a domain matching at the fold level in the CATH classification
      4. CATH assignment type - either Foldseek-T, Foldseek-H or Foldclass
    • novel_folds_set.domain_summary.tsv is sorted by novelty.
      1. ted_id - TED domain identifier in the format AF-
    • Domain assignments for TED redundant using single-chain and multi-chain consensus in ted_redundant_39m.multichain.consensus_domain_summary.taxid.tsv and ted_redundant_39m.singlechain.consensus_domain_summary.taxid.tsv
      The files contain a header with the following fields. Each column is tab-separated (.tsv).
      1. TED_redundant_id - TED chain identifier in the format AF-
    • and ted_redundant_39m.singlechain.consensus_domain_summary.taxid.tsv
      The file contains a header with the following fields. Each column is tab-separated (.tsv).
      1. TED_redundant_id - TED chain identifier in the format AF-
    • novel_folds_set_models.tar.gz contains PDB files of all novel folds identified in TED100.
    • All per-tool domain boundaries predictions are in the same format with the following columns.
      1. TED_chainID - TED chain identifier in the format AF-
    • Domain boundaries predictions share the same format, with each segment separated by '_' and segment boundaries (start,stop) separated by '-'

      i.e.domain prediction by Merizo for AF-A0A000-F1-model_v4
      AF-A0A000-F1-model_v4 e8872c7a0261b9e88e6ff47eb34e4162 394 2 10-52_289-394,53-288 0.90077

      Merizo predicts one continuous domain and a discontinuous domain,
      Domain1 (discontinuous): 10-52_289-394
      segment1: 10-52
      segment2: 289-394
      Domain 2 (continuous):
      segment 1: 53-288
    • ted-tools-main.zip - copy of the https://github.com/psipred/ted-tools repository, containing tools and software used to generate TED.
    • cath-alphaflow-main.zip - copy of CATH-AlphaFlow, used to generate globularity scores for TED domains.
    • ted-web-master.zip - copy of TED-web, containing code to generate the web interface of TED (https://ted.cathdb.info)
    • gofocus_data.tar.bz2 - GOFocus model weights
  6. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

  7. Metabolite BridgeDb ID Mapping Database (20180705)

    • figshare.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Denise Slenter (2023). Metabolite BridgeDb ID Mapping Database (20180705) [Dataset]. http://doi.org/10.6084/m9.figshare.6741491.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Denise Slenter
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    BridgeDb ID mapping database for metabolites, using HMDB 4.0 (Release of 18 June 2018), ChEBI 165, and Wikidata (07 July 2018) as data sources. Two major changes:- 120% more mappings to LIPID MAPS IDs (from Wikidata).- Change in mapping between old(secondary) and new (primary) HMDB IDs.This work was funded by ELIXIR, the research infrastructure for life-science data.

  8. f

    Table_2_FunOrder 2.0 – a method for the fully automated curation of...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel A. Vignolle; Robert L. Mach; Astrid R. Mach-Aigner; Christian Zimmermann (2023). Table_2_FunOrder 2.0 – a method for the fully automated curation of co-evolved genes in fungal biosynthetic gene clusters.xlsx [Dataset]. http://doi.org/10.3389/ffunb.2022.1020623.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Gabriel A. Vignolle; Robert L. Mach; Astrid R. Mach-Aigner; Christian Zimmermann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coevolution is an important biological process that shapes interacting proteins – may it be physically interacting proteins or consecutive enzymes in a metabolic pathway, such as the biosynthetic pathways for secondary metabolites. Previously, we developed FunOrder, a semi-automated method for the detection of co-evolved genes, and demonstrated that FunOrder can be used to identify essential genes in biosynthetic gene clusters from different ascomycetes. A major drawback of this original method was the need for a manual assessment, which may create a user bias and prevents a high-throughput application. Here we present a fully automated version of this method termed FunOrder 2.0. In the improved version, we use several mathematical indices to determine the optimal number of clusters in the FunOrder output, and a subsequent k-means clustering based on the first three principal components of a principal component analysis of the FunOrder output to automatically detect co-evolved genes. Further, we replaced the BLAST tool with the DIAMOND tool as a prerequisite for using larger proteome databases. Potentially, FunOrder 2.0 may be used for the assessment of complete genomes, which has not been attempted yet. However, the introduced changes slightly decreased the sensitivity of this method, which is outweighed by enhanced overall speed and specificity.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin (2024). Content of the Bioinformatics for Dentistry, with its respective primary sources. [Dataset]. http://doi.org/10.1371/journal.pone.0303628.t002
Organization logo

Content of the Bioinformatics for Dentistry, with its respective primary sources.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 6, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Content of the Bioinformatics for Dentistry, with its respective primary sources.

Search
Clear search
Close search
Google apps
Main menu