Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. SMART is based at EMBL, Heidelberg, Germany.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY is based at the University of Bristol, UK.
The European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) is international, innovative and interdisciplinary, and a champion of open data in the life sciences. The EMBL-EBI captures and presents globally comprehensive sequence data as part of the International Nucleotide Sequence Database Collaboration. Data provided to GBIF include geotagged environmental sequences with user-provided taxonomic identifications. This dataset contains INSDC sequences associated with environmental sample identifiers. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with the search parameters: environmental_sample=True & host="" EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230). The data was then processed as follows: 1. Human sequences were excluded. 2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number. 3. Contigs and whole genome shotgun (WGS) records were added individually. 4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept. 5. The records associated with the same vouchers are aggregated together. 6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by scientific_name, collection_date, location, country, identified_by, collected_by and sample_accession (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: Deduplication v2 gbif/embl-adapter#10 (comment) 7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip More information available here: https://github.com/gbif/embl-adapter#readme You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CDD is a protein annotation resource that consists of a collection of annotated multiple sequence alignment models for ancient domains and full-length proteins. These are available as position-specific score matrices (PSSMs) for fast identification of conserved domains in protein sequences via RPS-BLAST. CDD content includes NCBI-curated domain models, which use 3D-structure information to explicitly define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:
Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031
Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269) .
The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ).
Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.
The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISweden).
pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).
pmne_mtdna.gff.gz contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).
pmne_ncRNAs.gff.gz contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.
pmne_tRNAs_and_pseudogenes.gff.gz contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).
pmne_PacBio_isoseq.sorted.bam contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436) ) aligned to the primary genome assembly.
pmne_repeat_library.fa.gz contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).
Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats.
The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdf
The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformat/
ContactFor questions about this dataset, please contact: jacob.hoglund@ebc.uu.se niclas.backstrom@ebc.uu.se
Currently the use of tissue engineered small diameter vascular grafts (d< 6 mm) as substitutes in vascular reconstruction has gained significant attention by the cardiovascular surgeons. Cardiovascular disease (CVD) is among the leading causes of death, worldwide. It is estimated that more than 17 million people have been diagnosed with CVD. As vascular substitutes autologous vessels such as saphenous vein, radial and mammary arteries, are currently used. Although that these vessels are used in an autologous manner, avoiding thrombus formation and immune rejection, suitable grafts can be found in less than 40% of the patients. During the last years, synthetic polymer grafts derived either from Dacron or expanded tetrafluoroethylene (ePTFE) have been developed and applied in vascular bypass surgeries. These grafts also have received FDA approval for their use in CVD. As large diameter vascular grafts (d > 6 mm), the synthetic conduits have shown promising results. On the other hand, significant adverse reactions have been reported, when are used as small diameter vascular grafts (SDVGs, d< 6 mm). The most important adverse reaction is the low patency rate that is observed, within the first year of implantation (< 70%). The lack of the endothelium layer makes these grafts susceptible to platelets aggregation and thrombus formation. Also, mismatch compliance could result to intima hyperplasia, activation of Th1/ Th2 response, calcification and overall graft failure. Taking into consideration the above data, alternative sources of SDVGs must be established. In this way, the human umbilical arteries (hUAs) may represent significant candidates for SDVGs development. HUAs, are muscular arteries, responsible for transportation of non-oxygenated blood from the fetus to mother. Their inner diameter is approximately 2-4 mm, while their length can be varied and it is dependent to the length of the human umbilical cord (hUC) . The length of a typical hUC is 30- 60 cm. HUAs can be isolated non-invasively from the hUC, a material which is discarded after the gestation. These muscular arteries are characterized by three distinct layers, tunica intima, media and adventitia, where specific cellular populations such as endothelial cells (ECs) and vascular smooth muscle cells (VSMCs) are located. Decellularization approach, a tissue engineering method, can be applied in hUAs, in order acellular non-imunogenic SDVGs to be developed. Decellularization approach uses a combination of physical, chemical and enzymatic methods, to remove the cellular populations, while at the same time preserve the extracellular matrix (ECM) and tissue’s ultrastructure. Then, these vascular conduits can be repopulated with recipient’s ECs and VSMCs, in order to be fully compatible and functional. The aim of the current project is mainly focused to the proteomic identification and characterization of hUAs, before and after the application of the decellularization approach. The preservation of the ECM proteins like collagen, fibronectin, laminin in the decellularized hUAs, are of major importance for their key mechanical and functional properties. Until date, only a few studies have performed a broad proteomic approach in decellularized SDVGs such as the hUAs. Also, the proteomic data will be compared with the histological, biochemical and biomechanical analysis results. For this project, the hUAs were isolated from the delivered human umbilical cords to Hellenic Cord Blood Bank (HCBB) of Biomedical Research Foundation Academy of Athens (BRFAA). Human umbilical cords were obtained from end term gestations (38-40 weeks), either with normal or caesarian delivery. Each hUC was accompanied by signed informed consent by the mother before the gestation. The informed consent was in accordance with the Helsinki declaration and fulfilled the ethical standards of the Greek National Ethical Committee. After the isolation of hUAs, the decellularization approach was performed. Specifically, the hUAs were incubated in decellularization buffer 1 (DB1), consisted of 8 mM CHAPS, 1 M NaCl and 25 mM EDTA in PBS 1x, for 12 hours (h) at room temperature (RT). Briefly washes of hUAs for removal of the excess of DB1 with PBS 1x, was performed. Then, the hUAs were placed in DB2, consisted of 1.8 mM SDS, 1 M NaCl and 25 mM EDTA in PBS 1x for another 12 h at RT, followed by briefly washes with PBS 1x. Finally, the hUAs were incubated at 37o C for 12 h in α-Μinimum Essentials Medium (α-ΜΕΜ) with 40% v/v Fetal Bovine Serum (FBS), ensuring the complete removal of genetic material remnants. To evaluate the impact of the current decellularization protocol in hUAs, histological analysis, biochemically, biomechanically evaluation and full spectrum proteomic analysis were performed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of the type ? from the database All - version N/A
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of the type entry from the database AntiFam - version 8.0
Gene Expression Omnibus. GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. The GEO DataSets database stores original submitter-supplied records (Series, Samples and Platforms) as well as curated DataSets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data item of the type ? from the database reviewed with accession A0A017SE85 and name Transcriptional regulator fogI
This is about the global proteome and global succinylome of human brain tissues.
This data is apart of a project assessing transcriptional start site switching and UTR switching at translational level following hypoxia.
Oral cholera vaccines (OCVs) have become important components of strategies for cholera control, but the demand for OCVs has outstripped the supply2–4. There is a need for new cholera control measures for the one billion people living in cholera endemic regions5. The use of orally delivered single-domain antibodies has been proposed as a potential approach for the control of gastrointestinal pathogens6. Here, we describe the development of an orally deliverable bivalent VHH construct (BL3.2) that binds to the B-subunit of cholera toxin (CTXB). The epitope of CTXB for binding molecule BL3.2 was mapped by Hydrogen Deuterium Exchange HDX
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This signature identifies Transmembrane protein 53, that have no known function but are predicted to be integral membrane proteins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data item of the type repeat from the database prints with accession PR00014 and name FNTYPEIII
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two related families of asparaginase are designated type I and type II according to the terminology in Escherichia coli, which has both: L-asparaginase I is a low-affinity enzyme found in the cytoplasm, while L-asparaginase II is a high-affinity secreted enzyme synthesized with a cleavable signal sequence. This family includes L-asparaginases related to type I of E. coli. Archaeal members of this family contain an extra ~80 residues in a conserved N-terminal region. These archaeal homologues are known as GATD (glutamyl-tRNA(Gln) amidotransferase subunit D) .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.