4 datasets found

Content of the Bioinformatics for Dentistry, with its respective primary...
plos.figshare.com
xls
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin (2024). Content of the Bioinformatics for Dentistry, with its respective primary sources. [Dataset]. http://doi.org/10.1371/journal.pone.0303628.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303628.t002
Dataset updated
Jun 6, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content of the Bioinformatics for Dentistry, with its respective primary sources.
e
Data from: PROSITE
prosite.expasy.org
identifiers.org
+7more
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE [Dataset]. https://prosite.expasy.org/
Explore at:
Dataset updated
Oct 15, 2025
Description
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
The Encyclopedia of Domains (TED) structural domains assignments for...
zenodo.org
application/gzip, bz2 +1
Updated Oct 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones (2024). The Encyclopedia of Domains (TED) structural domains assignments for AlphaFold Database v4 [Dataset]. http://doi.org/10.5281/zenodo.13369203
Explore at:
application/gzip, bz2, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13369203
Dataset updated
Oct 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andy Lau; Andy Lau; Nicola Bordin; Nicola Bordin; Shaun Kandathil; Shaun Kandathil; Ian Sillitoe; Ian Sillitoe; Vaishali Waman; Vaishali Waman; Jude Wells; Jude Wells; Christine Orengo; Christine Orengo; David T Jones; David T Jones
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset description:

The Encyclopedia of Domains (TED) is a joint effort by CATH (Orengo group) and the Jones group at University College London to identify and classify protein domains in AlphaFold2 models from AlphaFold Database version 4, covering over 188 million unique sequences and 324 million domain assignments.

In this data release, we will be making available to the community a table of domain boundaries and additional metadata on quality (pLDDT, globularity, number of secondary structures), taxonomy and putative CATH SuperFamily or Fold assignments for all 324 million domains in TED100.

For all chains in the TED-redundant dataset, the attached file contains boundaries predictions, consensus level and information on the TED100 representative.

Additionally, an archive with chain-level consensus domain assignments are available for 21 model organisms and 25 global health proteomes:

For both TED100 and TEDredundant we provide domain boundaries predictions outputted by each of the three methods employed in the project (Chainsaw, Merizo, UniDoc).

We are making available 7,427 novel folds PDB files, identified during the TED classification process with an annotation table sorted by novelty.

Please use the gunzip command to extract files with a '.gz' extension.

CATH annotations have been assigned using the FoldSeek algorithm applied in various modes and the FoldClass algorithm, both of which are used to report significant structural similarity to a known CATH domain.
Note: The TED protocol differs from that of our standard CATH Assignment protocol for superfamily assignment, which also involves HMM-based protocols and manual curation for remote matches.

This dataset contains:

ted_214m_per_chain_segmentation.tsv
The file contains all 214M protein chains in TED with consensus domain boundaries and proteome information in the following columns.
1. AFDB_model_ID: chain identifier from AFDB in the format AF-

ted_365m_domain_boundaries_consensus_level.tsv.gz
The file contains all domain assignments in TED100 and TED-redundant (365M) in the format:
1. TED_ID: TED domain identifier in the format AF-

ted_100_324m.domain_summary.cath.globularity.taxid.tsv and novel_folds_set.domain_summary.tsv are header-less with the following columns separated by tabs (.tsv).

ted_324m_seq_clustering.cathlabels.tsv
The file contains the results of the domain sequences clustering with MMseqs2.
Columns:
1. Cluster_representative
2. Cluster_member
3. CATH code assignment if available i.e. 3.40.50.300 for a domain with a homologous match or 3.20.20 for a domain matching at the fold level in the CATH classification
4. CATH assignment type - either Foldseek-T, Foldseek-H or Foldclass

novel_folds_set.domain_summary.tsv is sorted by novelty.
1. ted_id - TED domain identifier in the format AF-

Domain assignments for TED redundant using single-chain and multi-chain consensus in ted_redundant_39m.multichain.consensus_domain_summary.taxid.tsv and ted_redundant_39m.singlechain.consensus_domain_summary.taxid.tsv
The files contain a header with the following fields. Each column is tab-separated (.tsv).
1. TED_redundant_id - TED chain identifier in the format AF-

and ted_redundant_39m.singlechain.consensus_domain_summary.taxid.tsv
The file contains a header with the following fields. Each column is tab-separated (.tsv).
1. TED_redundant_id - TED chain identifier in the format AF-

novel_folds_set_models.tar.gz contains PDB files of all novel folds identified in TED100.

All per-tool domain boundaries predictions are in the same format with the following columns.
1. TED_chainID - TED chain identifier in the format AF-

Domain boundaries predictions share the same format, with each segment separated by '_' and segment boundaries (start,stop) separated by '-'

i.e.domain prediction by Merizo for AF-A0A000-F1-model_v4
AF-A0A000-F1-model_v4 e8872c7a0261b9e88e6ff47eb34e4162 394 2 10-52_289-394,53-288 0.90077

Merizo predicts one continuous domain and a discontinuous domain,
Domain1 (discontinuous): 10-52_289-394
segment1: 10-52
segment2: 289-394
Domain 2 (continuous):
segment 1: 53-288

ted-tools-main.zip - copy of the https://github.com/psipred/ted-tools repository, containing tools and software used to generate TED.

cath-alphaflow-main.zip - copy of CATH-AlphaFlow, used to generate globularity scores for TED domains.

ted-web-master.zip - copy of TED-web, containing code to generate the web interface of TED (https://ted.cathdb.info)

gofocus_data.tar.bz2 - GOFocus model weights
Metabolite BridgeDb ID Mapping Database (20180705)
figshare.com
zip
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Denise Slenter (2023). Metabolite BridgeDb ID Mapping Database (20180705) [Dataset]. http://doi.org/10.6084/m9.figshare.6741491.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6741491.v1
Dataset updated
Jun 5, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Denise Slenter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
BridgeDb ID mapping database for metabolites, using HMDB 4.0 (Release of 18 June 2018), ChEBI 165, and Wikidata (07 July 2018) as data sources. Two major changes:- 120% more mappings to LIPID MAPS IDs (from Wikidata).- Change in mapping between old(secondary) and new (primary) HMDB IDs.This work was funded by ELIXIR, the research infrastructure for life-science data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin (2024). Content of the Bioinformatics for Dentistry, with its respective primary sources. [Dataset]. http://doi.org/10.1371/journal.pone.0303628.t002

Content of the Bioinformatics for Dentistry, with its respective primary sources.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0303628.t002

Dataset updated

Jun 6, 2024

Dataset provided by

PLOShttp://plos.org/

Authors

Ava K. Chow; Rachel Low; Jerald Yuan; Karen K. Yee; Jaskaranjit Kaur Dhaliwal; Shanice Govia; Nazlee Sharmin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Content of the Bioinformatics for Dentistry, with its respective primary sources.

Clear search

Close search

Google apps

Main menu

Content of the Bioinformatics for Dentistry, with its respective primary...

Data from: PROSITE

The Encyclopedia of Domains (TED) structural domains assignments for...

Dataset description:

This dataset contains:

Metabolite BridgeDb ID Mapping Database (20180705)

Content of the Bioinformatics for Dentistry, with its respective primary sources.