Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data collection contains the data sets related to human (9606) that were previously deposed as separate datasets in STRING ver.10.5 before changing the download files structure with release of ver.11.0.
Facebook
TwitterDatabase of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations and are derived from four sources: Genomic Context, High-throughput experiments, (Conserved) Coexpression, and previous knowledge. STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently covers 5''214''234 proteins from 1133 organisms. (2013)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An extensive dataset of binary physical protein-protein interaction extracted from STRING 12.0 (>12,000 organisms) with artificially generated negatives. The dataset includes 72M positive pairs with STRING confidence scores> 0.9 and 720M negative pairs. The corresponding protein sequences are located in the .fasta files. The generation of the negatives was derived from https://doi.org/10.1016/j.isci.2024.110371
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of proteins and known PPIs per species in BIOGRID. (version 3.5.171).
Facebook
TwitterCombined-scores: PPIs that have combined scores are considered positive cases. Experimental-700: PPIs that have experimental scores over 700 are considered positive cases. Direct comparison: the results of embeddings using the same method (cbow) and same hyperparameters. Different embedding methods: the results of BioConceptVec (skip-gram), BioConceptVec (GloVe) and BioConceptVec (fastText). The highest results of each section are marked as bold.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Validation of the total new predicted links and the new predicted links associated with the 10 proteins by STRING database for the 14317_PPI data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of the genes in the protein interaction network constructed based on the STRING database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Homophily/heterophily evaluation, expressed in terms of z-score values, is related to the human Protein-Protein Interaction Network (PPI), obtained from the STRING v11.5 database (https://string-db.org) setting standard threshold on edge score (T=700). Each protein occurring in the PPI was assigned to a class corresponding to the chromosome the related gene belongs to.
A total of 23 classes (chr1, chr2, ..., chr22, chrX) were considered (excluding the class corresponding to chromosome Y because of the small number of genes occurring in the network).
The homophily/heterophily nature of the network, with respect to chromosome classes, was evaluated through HONTO tool (https://github.com/cumbof/honto).
In other words, the tendency of proteins to preferentially interact with proteins whose genes are physically located on the same chromosome (homophily) or on different chromosomes (heterophily) was investigated and evaluated in terms of z-scores.
Values related to intra (along the diagonal) and inter chromosomal interactions (other than the diagonal) are also reported as a heatmap.
As one can observe, values occurring in the diagonal are clearly higher than values out of the diagonal, leading to assess a homophilic nature of the network, confirming the link between shared chromosome and interaction in the PPI.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Snapshot of 9606.protein.links.full.v10.5.experiments.abc.txt from https://string-db.org/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study aimed to analyze metabolite abundances and proteome differences between Binglangjiang buffalo milk (BBM) and Dehong buffalo milk (DBM). Untargeted ultraperformance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS), label-free quantitative proteomics approaches, and bioinformatics analyses including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and protein-protein interaction (PPI) were performed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The optimal dimensions of raw network embedding representations and the corresponding 3rd hidden layer outputs (a.k.a. the STRING2GO-learnt functional representations) with their corresponding predictive power for biological process terms prediction, and the main characteristics of different STRING networks.
Facebook
TwitterDespite being one of the most important human fungal pathogens, Candida albicans has not been studied extensively at the level of protein-protein interactions (PPIs) and data on PPIs are not readily available in online databases. In January 2018, the database called “Biological General Repository for Interaction Datasets (BioGRID)” that contains the most PPIs for C. albicans, only documented 188 physical or direct PPIs (release 3.4.156) while several more can be found in the literature. Other databases such as the String database, the Molecular INTeraction Database (MINT), and the Database for Interacting Proteins (DIP) database contain even fewer interactions or do not even include C. albicans as a searchable term. Because of the non-canonical codon usage of C. albicans where CUG is translated as serine rather than leucine, it is often problematic to use the yeast two-hybrid system in Saccharomyces cerevisiae to study C. albicans PPIs. However, studying PPIs is crucial to gain a thorough understanding of the function of proteins, biological processes and pathways. PPIs can also be potential drug targets. To aid in creating PPI networks and updating the BioGRID, we performed an exhaustive literature search in order to provide, in an accessible format, a more extensive list of known PPIs in C. albicans.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis.
Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis.
Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis.
Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study.
Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones.
Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively.
Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively.
Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis.
Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis.
Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs.
Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.
Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.
Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis.
Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining.
Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.
Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section.
Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell.
Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The underlying mechanism of obesity and hyperuricemia. (A) Protein-protein interaction (PPI) network obtained from STRING database and constructed by Cytoscape. Each node size and color depth are proportional to their node degree. Edge width is proportional to the edge betweenness. (B) Grouping of KEGG enrichment analysis of the 235 intersected targets of obesity leading to hyperuricemia. Functionally related groups partially overlap. KEGG pathway is represented as a node. The nodes of a group are labeled in the same color. Two groups share the nodes with two colors
Facebook
TwitterDatabase to explore known and predicted interactions of chemicals and proteins. It integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug-target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. The database contains interaction information for over 68,000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cytoscape Session file containing STRING protein-protein interaction data with yeast heat shock data from the GEO series GSE18 platform GPL51 (Gasch, et. al., 2000). All results from the BMC Bioinformatics publication were derived from this session file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of GO term-centric results obtained by different network embedding representations and corresponding functional representations based on Combinedscore, Textmining, Experimental, Database and Coexpression networks working with different classification algorithms during hold-out evaluation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary statistics for protein-protein interaction networks identified with STRING amongst genes corresponding to significant SNPs or k-mers (inside or adjacent to genes). PPI enrichment p-value corresponds to the likelihood nodes and edges would be selected from the S. aureus database by chance.
Facebook
TwitterOrbitrap Fusion (Thermo Fisher Scientific) LC-MS/MS analyses were performed on an Easy-nLC 1000 liquid chromatography system (Thermo Fisher Scientific) coupled to an Orbitrap Fusion via a nano-electrospray ion source. Tryptic peptides were dissolved with a loading buffer (acetonitrile and 0.1% formic acid), and were eluted with a flow rate of 350 nL/min. Survey scans were acquired after an accumulation of 5×105 ions in the Orbitrap for m/z 300-1,400 using a resolution of 120,000 at m/z. The top speed data-dependent mode was selected for fragmentation in the cell at a normalized collision energy of 32%, and fragment ions were then transferred into the ion trap analyzer with the AGC target at 5×103 and maximum injection time at 35 ms. The dynamic exclusion of previously acquired precursor ions was enabled at 18 s. The Proteome Discoverer 1.4.1.14 was used for analysis of the protein spectrum. Oxidation (Methionine) and acetylation (Protein-N term) were chosen as variable modifications, cysteine carbamidomethylation was chosen as a fixed modification. Two missed cleavage sites for trypsin were allowed. The intensity-based absolute quantification (iBAQ)-based protein quantification were performed by an in-house software. The interaction of SNT-related differentially expressed proteins was investigated by STRING 11.0 (https://string-db.org). The differentially expressed protein interaction network (high reliability, interaction score > 0.4, PPI enrichment P-value < 1.0×10 -16) was selected for the analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data collection contains the data sets related to human (9606) that were previously deposed as separate datasets in STRING ver.10.5 before changing the download files structure with release of ver.11.0.