https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13534 genes (67% of the human protein-coding genes), as well as predictions for an additional 3491 secreted- or membrane proteins, covering a total of 17025 genes (84 % of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 41 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations. The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:
The subcellular distribution of proteins in human cell lines. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.
Subcellular localization of proteins from low-throughput or high-throughput protein localization assays
LOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set. The membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing peer-reviewed publications.
The Characterized Protein Database, CharProtDB, is designed and being developed as a resource of expertly curated, experimentally characterized proteins described in published literature. For each protein record in CharProtDB, storage of several data types is supported. It includes functional annotation (several instances of protein names and gene symbols) taxonomic classification, literature links, specific Gene Ontology (GO) terms and GO evidence codes, EC (Enzyme Commisssion) and TC (Transport Classification) numbers and protein sequence. Additionally, each protein record is associated with cross links to all public accessions in major protein databases as ��synonymous accessions��. Each of the above data types can be linked to as many literature references as possible. Every CharProtDB entry requires minimum data types to be furnished. They are protein name, GO terms and supporting reference(s) associated to GO evidence codes. Annotating using the GO system is of importance for several reasons; the GO system captures defined concepts (the GO terms) with unique ids, which can be attached to specific genes and the three controlled vocabularies of the GO allow for the capture of much more annotation information than is traditionally captured in protein common names, including, for example, not just the function of the protein, but its location as well. GO evidence codes implemented in CharProtDB directly correlate with the GO consortium definitions of experimental codes. CharProtDB tools link characterization data from multiple input streams through synonymous accessions or direct sequence identity. CharProtDB can represent multiple characterizations of the same protein, with proper attribution and links to database sources. Users can use a variety of search terms including protein name, gene symbol, EC number, organism name, accessions or any text to search the database. Following the search, a display page lists all the proteins that match the search term. Click on the protein name to view more detailed annotated information for each protein. Additionally, each protein record can be annotated.
https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
This section provides comprehensive spatial profiling of the Brain, including overview of protein expression in the mammalian brain based on integration of data from human, pig and mouse. Transcriptomics data combined with affinity-based protein in situ localization down to single cell detail is available in this brain-centric sub atlas of the Human Protein Atlas. The data presented are for human genes and their one-to-one orthologues in pig and mouse. Gene summary pages provide the hierarchical expression landscape form 13 main regions of the brain to individual nuclei and subfields for every protein coding gene. For selected proteins, high content images are available to explore the cellular and subcellular protein distribution. In addition, the Brain section contains lists of genes with elevated expression in one or a group of regions to help the user identify unique protein expression profiles linked to physiology and function. More information about the specific content and the generation and analysis of the data in this section can be found on the Methods Summary. Learn about:
Expression levels for all human proteins in regions and subregions of the human brain Expression levels for all proteins with human orthologs in regions and subregions of the pig and mouse brain Brain enriched genes with higher expression in any of the regions of the brain compared to peripheral organs Regional enriched genes with higher expression in a single or few regions of the brain Cell-type and cell-compartment distribution of selected proteins in the human and mouse brain Differences in gene expression between mammalian species
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of proteins and their molecular weight (M. wt.), isoelectric point (PI), CDs and protein length, hydropathicity and subcellular localization.
Database that represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data.
A database that focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators. The curated data can be analyzed in the context of the high throughput data and viewed graphically with the MINT Viewer. This collection of molecular interaction databases can be used to search for, analyze and graphically display molecular interaction networks and pathways from a wide variety of species. MINT is comprised of separate database components. HomoMINT, is an inferred human protein interatction database. Domino, is database of domain peptide interactions. VirusMINT explores the interactions of viral proteins with human proteins. The MINT connect viewer allows you to enter a list of proteins (e.g. proteins in a pathway) to retrieve, display and download a network with all the interactions connecting them.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Protein names were retrieved from Ensembl and NCBI or proposed according to the evolutionary history of the genes. Chromosomal/genomic location was obtained using Ensembl genome browser, JGI databases, or NCBI Entrez Gene when not available on Ensembl or JGI.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The Human Proteome Organization (HUPO) Human Proteome Project (HPP) continues to make progress on its two overall goals: (1) completing the protein parts list, with an annual update of the HUPO draft human proteome, and (2) making proteomics an integrated complement to genomics and transcriptomics throughout biomedical and life sciences research. neXtProt version 2017-01-23 has 17 008 confident protein identifications (Protein Existence [PE] level 1) that are compliant with the HPP Guidelines v2.1 (https://hupo.org/Guidelines), up from 13 664 in 2012-12 and 16 518 in 2016-04. Remaining to be found by mass spectrometry and other methods are 2579 “missing proteins” (PE2+3+4), down from 2949 in 2016. PeptideAtlas 2017-01 has 15 173 canonical proteins, accounting for nearly all of the 15 290 PE1 proteins based on MS data. These resources have extensive data on PTMs, single amino acid variants, and splice isoforms. The Human Protein Atlas v16 has 10 492 highly curated protein entries with tissue and subcellular spatial localization of proteins and transcript expression. Organ-specific popular protein lists have been generated for broad use in quantitative targeted proteomics using SRM-MS or DIA-SWATH-MS studies of biology and disease.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aAbbreviation of cellular role categories of theoretical (http://www.ncbi.nlm.gov/COG/).bAbbreviation of cellular location. Protein cellular location was annotated by PSORTb V. 2.0 (http://www.psort.org/). C: Cytoplasmic, P: Periplasmic, U: Unknown, OM: OuterMembrane, CM: CytoplasmicMembrane.cProteins upshifted in the BMΔvirB mutant are marked with “+”, and those downshifted with “−”; unique protein spots in BM are marked with “Y”, and in BMΔvirB with “T”.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Footnote:E- Extracellular; C- Cytoplasm; ER- Endoplasmic reticulum.In column 5 () indicates the earlier report of a protein identified in normal CSF and (+) in normal Plasma.In column 7 () indicates expression of a protein in tissue at protein level (+) at mRNA level.Protein localization, Signal/TM domain containing information was derived from HPRD [35] and information about presence in normal CSF or plasma was extracted from [36], [37] respectively. Expression of the genes/proteins at tissue level was inferred from published transcriptome dataset (master list; [38]) or protein datasets [39]–[52].
SLIF finds fluorescence microscope images in on-line journal articles, and indexes them according to cell line, proteins visualized, and resolution. Images can be accessed via the SLIF Web database. SLIF takes on-line papers and scans them for figures that contain fluorescence microscope images (FMIs). Figures typically contain multiple FMIs, to SLIF must segment these images into individual FMIs. When the FMI images are extracted, annotations for the images (for instance, names of proteins and cell-lines) are also extracted from the accompanying caption text. Protein annotation are also used to link to external databases, such as the Gene Ontology DB. The more detailed process includes: segmentation of images into panels; panel classification, to find FMIs; segmentation of the caption, to find which portions of the caption apply to which panels; text-based entity extraction; matching of extracted entities to database entries; extraction of panel labels from text and figures; and alignment of the text segments to the panels. Extracted FMIs are processed to find subcellular location features (SLFs), and the resulting analyzed, annotated figures are stored in a database, which is accessible via SQL queries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of proteins along with their molecular weight (M. wt.), isoelectric point (PI), CDS and protein length, subcellular localization and Arabidopsis thaliana orthologs.
An integrated knowledgebase focused on protein termini, their formation by proteases and functional implications. It contains information about the processing and the processing state of proteins and functional implications thereof derived from research literature, contributions by the scientific community and biological databases. It lists more than 120,000 N- and C-termini and almost 10,000 cleavages. TopFIND is a resource for comprehensive coverage of protein N- and C-termini discovered by all available in silico, in vitro as well as in vivo methodologies. It makes use of existing knowledge by seamless integration of data from UniProt and MEROPS and provides access to new data from community submission and manual literature curating. It renders modifications of protein termini, such as acetylation and citrulination, easily accessible and searchable and provides the means to identify and analyse extend and distribution of terminal modifications across a protein. The data is presented to the user with a strong emphasis on the relation to curated background information and underlying evidence that led to the observation of a terminus, its modification or proteolytic cleavage. In brief the protein information, its domain structure, protein termini, terminus modifications and proteolytic processing of and by other proteins is listed. All information is accompanied by metadata like its original source, method of identification, confidence measurement or related publication. A positional cross correlation evaluation matches termini and cleavage sites with protein features (such as amino acid variants) and domains to highlight potential effects and dependencies in a unique way. Also, a network view of all proteins showing their functional dependency as protease, substrate or protease inhibitor tied in with protein interactions is provided for the easy evaluation of network wide effects. A powerful yet user friendly filtering mechanism allows the presented data to be filtered based on parameters like methodology used, in vivo relevance, confidence or data source (e.g. limited to a single laboratory or publication). This provides means to assess physiological relevant data and to deduce functional information and hypotheses relevant to the bench scientist. TopFIND PROVIDES: * Integration of protein termini with proteolytic processing and protein features * Displays proteases and substrates within their protease web including detailed evidence information * Fully supports the Human Proteome Project through search by chromosome location CONTRIBUTE * Submit your N- or C-termini datasets * Contribute information on protein cleavages * Provide detailed experimental description, sample information and raw data
A database of putative membrane proteins of Thale Cress (Arabidopsis thaliana), Rice (Oryza sativa) and about some 6700 putative membrane proteins of ~300 other seed plants. The database stores data about: * protein, cDNA and genomic sequences * exon predictions (A.thaliana and O.sativa) * different cDNA/protein models of genes (A.thaliana and O.sativa) * ontology terms according to the Gene Ontology (GO) Consortium * protein sequence motifs as predictable by using the PFAM database * transporter classification as predictable by using the TC-system * bibliographic references * predictions for transmembrane spanning proteins (transmembrane alpha helices, beta barrels) * predictions for membrane-anchored proteins (GPI-attachment, prenylation, myristoylation) * prediction of the subcellular location * consensus predictions (transmembrane alpha helices, subcellular location) * isospecic homologs (''paralogs'') * heterospecic homologs (''orthologs'')
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY is based at the University of Bristol, UK.
https://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13534 genes (67% of the human protein-coding genes), as well as predictions for an additional 3491 secreted- or membrane proteins, covering a total of 17025 genes (84 % of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 41 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations. The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:
The subcellular distribution of proteins in human cell lines. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.