Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. SMART is based at EMBL, Heidelberg, Germany.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
The European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) is international, innovative and interdisciplinary, and a champion of open data in the life sciences. The EMBL-EBI captures and presents globally comprehensive sequence data as part of the International Nucleotide Sequence Database Collaboration. Data provided to GBIF include geotagged environmental sequences with user-provided taxonomic identifications. This dataset contains INSDC sequences associated with environmental sample identifiers. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with the search parameters: environmental_sample=True & host="" EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230). The data was then processed as follows: 1. Human sequences were excluded. 2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number. 3. Contigs and whole genome shotgun (WGS) records were added individually. 4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept. 5. The records associated with the same vouchers are aggregated together. 6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by scientific_name, collection_date, location, country, identified_by, collected_by and sample_accession (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: Deduplication v2 gbif/embl-adapter#10 (comment) 7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip More information available here: https://github.com/gbif/embl-adapter#readme You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains INSDC sequence records not associated with environmental sample identifiers or host organisms. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with search parameters: `environmental_sample=False & host=""`
EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230).
The data was then processed as follows:
1. Human sequences were excluded.
2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number.
3. Contigs and whole genome shotgun (WGS) records were added individually.
4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept.
5. The records associated with the same vouchers are aggregated together.
6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by `scientific_name`, `collection_date`, `location`, `country`, `identified_by`, `collected_by` and `sample_accession` (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978
7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip
More information available here: https://github.com/gbif/embl-adapter#readme
You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md
Basal expression profiles of 1,000 human cancer cell lines in the Genomics of Drug Sensitivity in Cancer (GDSC) panel [upcoming version], profiled using a diverse collection of 265 compounds. We have carried out an extensive computational exploration of the data to determine (1) to what extent does the mutational landscape of cancer cell lines recapitulate that seen in primary tumours, (2) what effect the status of these genomic features have on the variation in drug response; (3) whether genomic alterations acting in concert explain more of the variation in drug response; and (4) what is the predictive ability of these individual data-omics and at what extent this is improved when they are combined. [See publication]
Quantitative study of the N-terminal acetylome variations in Arabidopsis thaliana, looking at the effect of a N-acetyltransferase KO.
Partial remission (PR) occurs in only half of patients with new-onset type 1 diabetes (T1D) and correspond to a transient period characterized by low daily insulin needs, low glycemic fluctuations and increased endogenous insulin secretion. While identification of newly-onset T1D patients with significant residual beta-cell function may foster patient-specific interventions, reliable predictive biomarkers of PR occurrence currently lack. We analyzed the plasma of children with new-onset T1D to identify biomarkers present at diagnosis that predicted PR at 3 months post-diagnosis. We first performed an extensive shotgun proteomic analysis using Liquid Chromatography-Tandem-Mass-Spectrometry (LCMS/MS) on the plasma of 16 children with new-onset T1D and quantified nearly 1500 unique proteins with 98 significantly correlating with Insulin-Dose Adjusted glycated hemoglobin A1c score (IDAA1C). We next applied a series of both qualitative and statistical filters that yielded to the selection of 26 protein candidates that were associated to pathophysiological mechanisms related to T1D. Finally, we translationally validated several of the candidates using single-shot targeted proteomic (PRM method) on raw plasma. Taken together, we identified plasmatic biomarkers present at diagnosis that may predict the occurrence of PR in a single mass-spectrometry run. We believe that the identification of new predictive biomarkers of PR and β-cell function is key to stratify patients with new-onset T1D for β-cell preservation therapies
This data is apart of a project assessing transcriptional start site switching and UTR switching at translational level following hypoxia.
Bacterial meningitis is usually fatal without treatment and prompt and accurate diagnosis coupled with the timely administration of parenteral antibiotics, are necessary in order to save lives. The diagnosis can sometimes be delayed whilst samples are analysed in a laboratory using traditional methods of microscopy and antigen testing. The objective of our project is to define specific protein signatures in cerebrospinal fluid associated with Streptococcus pneumoniae infection which could lead to the development of assays or point-of-care devices to improve the speed and accuracy of diagnosis, and guide the clinicians in the treatment and prognosis of children with bacterial meningitis. The associated research paper is in preparation.
ADP-ribosylation is a widespread post-translational modification (PTM) with crucial functions in many cellular processes. Here, we describe an in-depth ADP-ribosylome using our Af1521-based proteomics methodology for profiling of ADP-ribosylation sites, by systematically assessing complementary proteolytic digestions and precursor fragmentation through application of electron-transfer higher-energy collisional dissociation (EThcD) and electron transfer dissociation (ETD), respectively. While ETD spectra yielded higher identification scores, EThcD generally proved superior to ETD in identification and localization of ADP-ribosylation sites regardless of protease employed. Notwithstanding, the propensities of complementary proteases and fragmentation methods expanded the detectable repertoire of ADP-ribosylation to an unprecedented depth. This system-wide profiling of the ADP-ribosylome in HeLa cells subjected to DNA damage uncovered >11,000 unique ADP-ribosylated peptides mapping to >7,000 ADP-ribosylation sites, in total modifying over one-third of the human nuclear proteome and highlighting the vast scope of this PTM. High-resolution MS/MS spectra enabled identification of dozens of proteins concomitantly modified by ADP-ribosylation and phosphorylation, revealing a considerable degree of crosstalk on histones. ADP-ribosylation was confidently localized to various amino acid residue types, including less abundantly modified residues, with hundreds of ADP-ribosylation sites pinpointed on histidine, arginine, and tyrosine residues. Functional enrichment analysis suggested modification of these specific residue types is directed in a spatial manner, with tyrosine ADP-ribosylation linked to the ribosome, arginine ADP-ribosylation linked to the endoplasmic reticulum, and histidine ADP-ribosylation linked to the mitochondrion.
In this study, we compared the effects of two cytokine treatments on the proteome of human Th-1 cells. We used saturating doses of murine single-chain IL-27 (EBI3+p28, 10nM) and HyperIL-6 (20nM) and continuously stimulated cells of three donors with the two cytokines for 24h or left untreated.
Proteomic analysis of sorted peroxysomes (old, young, and middle-aged).
Not available
Hepatocarcinoma is the third leading cause of death in cancer in the world. In recent years, research on CREB in hepatocellular carcinoma has become a hotspot, so our research group wants to use the mass spectrometry analysis what proteins can bind with CREB, and then explore the links between CREB and hepatocellular carcinoma.
Analysis of intact O-linked glycopeptides for SARS-Cov-2 S and human ACE2 protein by LC-MS
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Putative tumor suppressor gene that may be implicated in the origin and progression of lung cancer
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.