Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Subcellular methods
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13603 genes (67% of the human protein-coding genes), as well as predictions for an additional 3459 secreted- or membrane proteins, covering a total of 17062 genes (85% of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 42 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines, induced pluripotent stem cells (iPSCs) and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations.
The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:
The subcellular distribution of proteins in standard human cell lines, including ciliated cells and iPSCs. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.
Facebook
TwitterSubcellular localization of proteins by sequence similarity to localization sequences
Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Tissue methods
This resource of the Human Protein Atlas focuses on the expression profiles in human tissues of genes both on the mRNA and protein level. The protein expression data from 45 normal human tissue types is derived from antibody-based protein profiling using conventional and multiplex immunohistochemistry. All underlying images of immunohistochemistry stained normal tissues are available together with knowledge-based annotation of protein expression levels. The protein data covers 15312 genes (76%) for which there are available antibodies. The mRNA expression data is derived from deep sequencing of RNA (RNA-seq) from 51 different normal tissue types.
More information about the specific content and the generation and analysis of the data in the resource can be found on the Methods Summary. Learn about:
protein localization in tissues at a single-cell level if a gene is enriched in a particular tissue (specificity) which genes have a similar expression profile across tissues (expression cluster)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.
Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Brain methods
This resource provides comprehensive spatial profiling of the Brain, including overview of protein expression in the mammalian brain based on integration of data from human, pig and mouse. Transcriptomics data combined with affinity-based protein in situ localization down to single cell detail is available in this brain-centric sub atlas of the Human Protein Atlas. The data presented are for human genes and their one-to-one orthologues in pig and mouse. Gene summary pages provide the hierarchical expression landscape form 13 main regions of the brain to individual nuclei and subfields for every protein coding gene. For selected proteins, high content images are available to explore the cellular and subcellular protein distribution. In addition, the Brain resource contains lists of genes with elevated expression in one or a group of regions to help the user identify unique protein expression profiles linked to physiology and function.
More information about the specific content and the generation and analysis of the data in this resource can be found on the Methods Summary. Learn about:
Expression levels for all human proteins in regions and subregions of the human brain Expression levels for all proteins with human orthologs in regions and subregions of the pig and mouse brain Brain enriched genes with higher expression in any of the regions of the brain compared to peripheral organs Regional enriched genes with higher expression in a single or few regions of the brain Cell-type and cell-compartment distribution of selected proteins in the human and mouse brain Differences in gene expression between mammalian species
Additional information: In addition to the data provided in the brain resource there is also data on human retina and single cell data containing information on protein expression in human neuronal and non-neuronal cell-types in the central nervous system.
Facebook
TwitterThe Characterized Protein Database, CharProtDB, is designed and being developed as a resource of expertly curated, experimentally characterized proteins described in published literature. For each protein record in CharProtDB, storage of several data types is supported. It includes functional annotation (several instances of protein names and gene symbols) taxonomic classification, literature links, specific Gene Ontology (GO) terms and GO evidence codes, EC (Enzyme Commisssion) and TC (Transport Classification) numbers and protein sequence. Additionally, each protein record is associated with cross links to all public accessions in major protein databases as ��synonymous accessions��. Each of the above data types can be linked to as many literature references as possible. Every CharProtDB entry requires minimum data types to be furnished. They are protein name, GO terms and supporting reference(s) associated to GO evidence codes. Annotating using the GO system is of importance for several reasons; the GO system captures defined concepts (the GO terms) with unique ids, which can be attached to specific genes and the three controlled vocabularies of the GO allow for the capture of much more annotation information than is traditionally captured in protein common names, including, for example, not just the function of the protein, but its location as well. GO evidence codes implemented in CharProtDB directly correlate with the GO consortium definitions of experimental codes. CharProtDB tools link characterization data from multiple input streams through synonymous accessions or direct sequence identity. CharProtDB can represent multiple characterizations of the same protein, with proper attribution and links to database sources. Users can use a variety of search terms including protein name, gene symbol, EC number, organism name, accessions or any text to search the database. Following the search, a display page lists all the proteins that match the search term. Click on the protein name to view more detailed annotated information for each protein. Additionally, each protein record can be annotated.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Knowledge of protein subcellular localization assists in the elucidation of protein function and understanding of different biological mechanisms that occur at discrete subcellular niches. Organelle-centric proteomics enables localization of thousands of proteins simultaneously. Although such techniques have successfully allowed organelle protein catalogues to be achieved, they rely on the purification or significant enrichment of the organelle of interest, which is not achievable for many organelles. Incomplete separation of organelles leads to false discoveries, with erroneous assignments. Proteomics methods that measure the distribution patterns of specific organelle markers along density gradients are able to assign proteins of unknown localization based on comigration with known organelle markers, without the need for organelle purification. These methods are greatly enhanced when coupled to sophisticated computational tools. Here we apply and compare multiple approaches to establish a high-confidence data set of Arabidopsis root tissue trans-Golgi network (TGN) proteins. The method employed involves immunoisolations of the TGN, coupled to probability-based organelle proteomics techniques. Specifically, the technique known as LOPIT (localization of organelle protein by isotope tagging), couples density centrifugation with quantitative mass-spectometry-based proteomics using isobaric labeling and targeted methods with semisupervised machine learning methods. We demonstrate that while the immunoisolation method gives rise to a significant data set, the approach is unable to distinguish cargo proteins and persistent contaminants from full-time residents of the TGN. The LOPIT approach, however, returns information about many subcellular niches simultaneously and the steady-state location of proteins. Importantly, therefore, it is able to dissect proteins present in more than one organelle and cargo proteins en route to other cellular destinations from proteins whose steady-state location favors the TGN. Using this approach, we present a robust list of Arabidopsis TGN proteins.
Facebook
TwitterThe probe column indicates type of probe used: 1, only one probe in database; 2, highest intensity probe covering all transcripts; 3, probe used covers all transcripts but is not highest intensity probe; 4, highest intensity probe as none of the probes covered all transcripts. The ANOVA column shows whether levels in Aging are higher or lower than in Development, after correcting for 49 comparisons (p < 0.001). ns, not significant.Gene list with protein name, chromosomal location, and ANOVA comparing mean expression levels between Aging and Development.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
Facebook
TwitterList of 17 protein-coding variations showed their incidence, genomic location, polymorphisms and alteration in amino acids (A.A.) in GRCh 37 (hg19) and updated version GRCh38 (hg38).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Protein Data Bank (PDB) files list the relative spatial location of every atom in a protein structure as the final output of the process of fitting and refining to experimentally determined electron density measurements. Where experimental evidence exists for multiple conformations all alternate locations of the atoms are listed. Programs reading the information in PDB files commonly ignore these alternate conformations. This has led to underappreciation of their prevalence, under characterisation of their features and limited the accessibility to this high-resolution data representing structural ensembles. We have trawled PDB files to extract structural features of residues with alternately located atoms. The output includes the distance between alternate conformations and identifies the location of these segments within the protein chain and in proximity of all other atoms within a defined radius. As structural biology transitions from a static to dynamic description of the proteome this dataset should be of use in efforts to predict multiple structures from a single sequence and support studies investigating protein flexibility and the association with protein function.
Facebook
TwitterSLIF finds fluorescence microscope images in on-line journal articles, and indexes them according to cell line, proteins visualized, and resolution. Images can be accessed via the SLIF Web database. SLIF takes on-line papers and scans them for figures that contain fluorescence microscope images (FMIs). Figures typically contain multiple FMIs, to SLIF must segment these images into individual FMIs. When the FMI images are extracted, annotations for the images (for instance, names of proteins and cell-lines) are also extracted from the accompanying caption text. Protein annotation are also used to link to external databases, such as the Gene Ontology DB. The more detailed process includes: segmentation of images into panels; panel classification, to find FMIs; segmentation of the caption, to find which portions of the caption apply to which panels; text-based entity extraction; matching of extracted entities to database entries; extraction of panel labels from text and figures; and alignment of the text segments to the panels. Extracted FMIs are processed to find subcellular location features (SLFs), and the resulting analyzed, annotated figures are stored in a database, which is accessible via SQL queries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
Facebook
TwitterDatabase that represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The Human Proteome Organization (HUPO) Human Proteome Project (HPP) continues to make progress on its two overall goals: (1) completing the protein parts list, with an annual update of the HUPO draft human proteome, and (2) making proteomics an integrated complement to genomics and transcriptomics throughout biomedical and life sciences research. neXtProt version 2017-01-23 has 17 008 confident protein identifications (Protein Existence [PE] level 1) that are compliant with the HPP Guidelines v2.1 (https://hupo.org/Guidelines), up from 13 664 in 2012-12 and 16 518 in 2016-04. Remaining to be found by mass spectrometry and other methods are 2579 “missing proteins” (PE2+3+4), down from 2949 in 2016. PeptideAtlas 2017-01 has 15 173 canonical proteins, accounting for nearly all of the 15 290 PE1 proteins based on MS data. These resources have extensive data on PTMs, single amino acid variants, and splice isoforms. The Human Protein Atlas v16 has 10 492 highly curated protein entries with tissue and subcellular spatial localization of proteins and transcript expression. Organ-specific popular protein lists have been generated for broad use in quantitative targeted proteomics using SRM-MS or DIA-SWATH-MS studies of biology and disease.
Facebook
TwitterColumns B and C: predicted function and e-value based on BLASTp algorithm against the NCBI non-redundant protein "nr" database (https://www.ncbi.nlm.nih.gov/against) with a parameter to exclude kinetoplastids. Columns D to G: statistically significant enrichment of a given protein against other bait protein datasets. “-”demarks when enrichment could not be calculated (e.g. when a given protein was measured only for a bait protein). Column M: the probability that a protein has the mitochondrial import signal detected with the Mitofates online prediction tool. Columns N and O display whether the protein was previously experimentally localized in the mitochondrion (TrypTag) or was present in the Tom40-based depletome. Columns P to AD display logarithm of two of measured intensities for a given protein in a specific dataset. Lower case letters a, b, and c represent each replicate. Proteins enriched on average less than three times are highlighted in blue. (XLSX)
Facebook
TwitterLOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set. The membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing peer-reviewed publications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
Facebook
TwitterA database that focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators. The curated data can be analyzed in the context of the high throughput data and viewed graphically with the MINT Viewer. This collection of molecular interaction databases can be used to search for, analyze and graphically display molecular interaction networks and pathways from a wide variety of species. MINT is comprised of separate database components. HomoMINT, is an inferred human protein interatction database. Domino, is database of domain peptide interactions. VirusMINT explores the interactions of viral proteins with human proteins. The MINT connect viewer allows you to enter a list of proteins (e.g. proteins in a pathway) to retrieve, display and download a network with all the interactions connecting them.
Facebook
TwitterBackground Although MP20 is the second most highly expressed membrane protein in the lens its function remains an enigma. Putative functions for MP20 have recently been inferred from its assignment to the tetraspanin superfamily of integral membrane proteins. Members of this family have been shown to be involved in cellular proliferation, differentiation, migration, and adhesion. In this study, we show that MP20 associates with galectin-3, a known adhesion modulator.
Results
MP20 and galectin-3 co-localized in selected areas of the lens fiber cell plasma membrane. Individually, these proteins purified with apparent molecular masses of 60 kDa and 22 kDa, respectively. A 104 kDa complex was formed in vitro upon mixing the purified proteins. A 102 kDa complex of MP20 and galectin-3 could also be isolated from detergent-solubilized native fiber cell membranes. Binding between MP20 and galectin-3 was disrupted by lactose suggesting the lectin site was involved in the interaction.
Conclusions
MP20 adds to a growing list of ligands of galectin-3 and appears to be the first representative of the tetraspanin superfamily identified to possess this specificity.
Facebook
Twitterhttps://www.proteinatlas.org/about/licencehttps://www.proteinatlas.org/about/licence
Subcellular methods
The subcellular resource of the Human Protein Atlas provides high-resolution insights into the expression and spatiotemporal distribution of proteins encoded by 13603 genes (67% of the human protein-coding genes), as well as predictions for an additional 3459 secreted- or membrane proteins, covering a total of 17062 genes (85% of the human protein-coding genes). For each gene, the subcellular distribution of the protein has been investigated by immunofluorescence (ICC-IF) and confocal microscopy in up to three different standard cell lines, selected from a panel of 42 cell lines used in the subcellular resource. For some genes, the protein has also been stained in up to three ciliated cell lines, induced pluripotent stem cells (iPSCs) and/or in human sperm cells. Upon image analysis, the subcellular localization of the protein has been classified into one or more of 49 different organelles and subcellular structures. In addition, the resource includes an annotation of genes that display single-cell variation in protein expression levels and/or subcellular distribution, as well as an extended analysis of cell cycle dependency of such variations.
The subcellular resource offers a database for detailed exploration of individual genes and proteins of interest, as well as for systematic analysis of proteomes in a broader context. More information about the content of the resouce, as well as the generation and analysis of the data, can be found in the Methods summary. Learn about:
The subcellular distribution of proteins in standard human cell lines, including ciliated cells and iPSCs. The subcellular distribution of proteins in human sperm. The proteomes of different organelles and subcellular structures. Single-cell variability in the expression levels and/or localizations of proteins.