Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the protein-protein interaction analysis dataset that was used in the unpublished manuscript and was further analyzed with the STRING online software.Significantly upregulated mRNAs (2,777 genes; p < 0.05) identified by bulk RNA-seq were analyzed using the STRING module in Cytoscape v.2.2.0 (Institute for System Biology; WA; USA). A cluster network was constructed using the MCL algorithm with a granularity parameter of 4, followed by filtering nodes with mcl.cluster > 10. The resulting 1,848 nodes were processed through STRING v12.0 (Swiss Institute of Bioinformatics; Lausanne; Switzerland) to generate a protein–protein interaction (PPI) network, incorporating evidence from text mining, genomic neighborhood, experimental data, curated databases, co-expression, gene fusion, and co-occurrence, with a minimum confidence score threshold of 0.40. Network modules were defined using the DBSCAN clustering algorithm with an ε parameter of 2. Cluster 1, representing the largest gene set (101 genes), was further analyzed by sorting the top 20 nodes with the highest node degree, resulting in a network comprising 101 nodes and 756 edges. Global network metrics indicated an average node degree of 15, a local clustering coefficient of 0.600, and a PPI enrichment p-value of < 1 × 10⁻¹⁶. The average values of coexpression, experimentally determined interactions, automated text mining, and combined scores were calculated.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ataset representing a Protein-Protein Interaction (PPI) network of human proteins. Data generated and scored using the comprehensive STRING database resource. Focuses on analyzing functional and physical associations between proteins. Includes confidence scores (e.g., text-mining, experimental) for each interaction. A foundational resource for systems biology and identifying molecular hubs in disease pathways.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An extensive dataset of binary physical protein-protein interaction extracted from STRING 12.0 (>12,000 organisms) with artificially generated negatives. The dataset includes 72M positive pairs with STRING confidence scores> 0.9 and 720M negative pairs. The corresponding protein sequences are located in the .fasta files. The generation of the negatives was derived from https://doi.org/10.1016/j.isci.2024.110371
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data collection contains the data sets related to human (9606) that were previously deposed as separate datasets in STRING ver.10.5 before changing the download files structure with release of ver.11.0.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All human protein interactions were obtained from STRING (https://string-db.org/, version 11.0). Interactions were then filtered to those involving only BM zone proteins. Related to Fig. S6B.
Facebook
TwitterCombined-scores: PPIs that have combined scores are considered positive cases. Experimental-700: PPIs that have experimental scores over 700 are considered positive cases. Direct comparison: the results of embeddings using the same method (cbow) and same hyperparameters. Different embedding methods: the results of BioConceptVec (skip-gram), BioConceptVec (GloVe) and BioConceptVec (fastText). The highest results of each section are marked as bold.
Facebook
TwitterData for RAPPPID, a method for the Regularised Automative Prediction of Protein-Protein Interactions using Deep Learning. These datasets are in a format that RAPPPID is ready to read. Comparatives Dataset These datasets were derived from the STRING v11 H. sapiens dataset, according to the C1, C2, and C3 procedures outlined by Park and Marcotte, 2012. Negative samples are sampled randomly from the space of proteins not known to interact. See Szymborski & Emad for details. Repeatability Datasets The following datasets are all derived from STRING in the manner as the comparatives dataset, but three different random seeds are used for drawing proteins. References Park,Y. and Marcotte,E.M. (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods, 9, 1134–1136. Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., and Mering, C. (2019). String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613. Szymborski,J. and Emad,A. (2021) RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks. bioRxiv https://doi.org/10.1101/2021.08.13.456309
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Validation of the total new predicted links and the new predicted links associated with the 10 proteins by STRING database for the 14317_PPI data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Homophily/heterophily evaluation, expressed in terms of z-score values, is related to the human Protein-Protein Interaction Network (PPI), obtained from the STRING v11.5 database (https://string-db.org) setting standard threshold on edge score (T=700). Each protein occurring in the PPI was assigned to a class corresponding to the chromosome the related gene belongs to.
A total of 23 classes (chr1, chr2, ..., chr22, chrX) were considered (excluding the class corresponding to chromosome Y because of the small number of genes occurring in the network).
The homophily/heterophily nature of the network, with respect to chromosome classes, was evaluated through HONTO tool (https://github.com/cumbof/honto).
In other words, the tendency of proteins to preferentially interact with proteins whose genes are physically located on the same chromosome (homophily) or on different chromosomes (heterophily) was investigated and evaluated in terms of z-scores.
Values related to intra (along the diagonal) and inter chromosomal interactions (other than the diagonal) are also reported as a heatmap.
As one can observe, values occurring in the diagonal are clearly higher than values out of the diagonal, leading to assess a homophilic nature of the network, confirming the link between shared chromosome and interaction in the PPI.
Facebook
TwitterSelection of 30 central genes from PPI network, including 17 upregulated and 13 downregulated genes, by using the STRING and Cytoscape software.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Snapshot of 9606.protein.links.full.v10.5.experiments.abc.txt from https://string-db.org/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PPI network was constructed using the genes that are regulated by the SNPs associated with 18 AiDs. STRING PPI data was used for building the network. The list of proteins present in the Ai-PPIN and the edgelist of the network is provided here.
Facebook
TwitterClustered PPI datasets (BIOGRID + STRING) with sequence-disjoint splits
This dataset repo contains multiple dataset variants of protein–protein interactions (PPIs), built by clustering proteins by sequence similarity and then constructing train/valid/test splits that are intended to be disjoint at the protein level (and thus hard to memorize via near-identical sequences). Artifacts are stored as compressed pickles (*.pkl.gz). A helper downloader exists in this repo:… See the full description on the dataset page: https://huggingface.co/datasets/Synthyra/clustered_ppi_string_dedup.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study aimed to analyze metabolite abundances and proteome differences between Binglangjiang buffalo milk (BBM) and Dehong buffalo milk (DBM). Untargeted ultraperformance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS), label-free quantitative proteomics approaches, and bioinformatics analyses including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and protein-protein interaction (PPI) were performed.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Transcription factor-protein-protein interaction networks (TF-PPI) key pathway modulators in diabetes. A network of significantly modulated TF-PPIs for intact (A) and injured vessels at different timepoints - 20 hours (B), 2 weeks (C), and 6 weeks (D). Significantly up- and down-regulated genes from each timepoint comparing Goto-Kakizaki (GK) vs Wistar rats, were used to obtain TF-PPI, and this information was fed into STRING database to generate the network. The top 10 up- and 10 down-regulated TFs are shown in the network above. Up- and down-regulated TFs are indicated in green and red nodes respectively. Size of the nodes indicate the levels of P-value. All the interactions were predicted with the adjusted P-value < .05.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains pre-processed data required to reproduce the results of the paper "Identification of transcription factor co-binding patterns with non-negative matrix factorization". The repository with the code can be found here: https://bitbucket.org/CBGR/cobind_manuscript/src/master/. Data include transcription factor binding sites (TFBSs) for 7 species from UniBind 2021 database, joined motif collection from CIS-BP and JASPAR 2022 databases and corresponding physical protein-protein interaction (PPI) data from STRING database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the file “Gene_Ontology_de_novo_PPI.zip”, I present data extracted from the database used to coin the article " Using the Gene Ontology tool to produce de novo protein-protein interaction networks with IS_A relationship". There are protein-protein interaction (PPI) networks for all the ten organisms mentioned in the article, besides their respective plasmids. However, the PPIs available for download differ from those published since I didn't restringed them only to true positives according to the String database. Instead of that, I considered candidate relationships all protein pairs possessing commonality between all the three Gene Ontology categories. The edges weight reflects a logarithmic distance between protein pairs measured over the gene position within a chromosome. According to the methodology applied in the paper, a pair of genes separated by five loci has the weight=(1+(MAX-log(pos(locus_00006)-pos(locus_00001)))). In this formulae, pos extracts the locus_tag index; the logarithm of the difference is summed to one to avoid edges smaller than one because it could not be accepted by some visualization tools like GEPHI; MAX creates thicker lines for closer pairs. Figure 1 depicts data from the file "Escherichia_coli_S88_p1.dot".
Facebook
TwitterOrbitrap Fusion (Thermo Fisher Scientific) LC-MS/MS analyses were performed on an Easy-nLC 1000 liquid chromatography system (Thermo Fisher Scientific) coupled to an Orbitrap Fusion via a nano-electrospray ion source. Tryptic peptides were dissolved with a loading buffer (acetonitrile and 0.1% formic acid), and were eluted with a flow rate of 350 nL/min. Survey scans were acquired after an accumulation of 5×105 ions in the Orbitrap for m/z 300-1,400 using a resolution of 120,000 at m/z. The top speed data-dependent mode was selected for fragmentation in the cell at a normalized collision energy of 32%, and fragment ions were then transferred into the ion trap analyzer with the AGC target at 5×103 and maximum injection time at 35 ms. The dynamic exclusion of previously acquired precursor ions was enabled at 18 s. The Proteome Discoverer 1.4.1.14 was used for analysis of the protein spectrum. Oxidation (Methionine) and acetylation (Protein-N term) were chosen as variable modifications, cysteine carbamidomethylation was chosen as a fixed modification. Two missed cleavage sites for trypsin were allowed. The intensity-based absolute quantification (iBAQ)-based protein quantification were performed by an in-house software. The interaction of SNT-related differentially expressed proteins was investigated by STRING 11.0 (https://string-db.org). The differentially expressed protein interaction network (high reliability, interaction score > 0.4, PPI enrichment P-value < 1.0×10 -16) was selected for the analysis.
Facebook
TwitterDespite being one of the most important human fungal pathogens, Candida albicans has not been studied extensively at the level of protein-protein interactions (PPIs) and data on PPIs are not readily available in online databases. In January 2018, the database called “Biological General Repository for Interaction Datasets (BioGRID)” that contains the most PPIs for C. albicans, only documented 188 physical or direct PPIs (release 3.4.156) while several more can be found in the literature. Other databases such as the String database, the Molecular INTeraction Database (MINT), and the Database for Interacting Proteins (DIP) database contain even fewer interactions or do not even include C. albicans as a searchable term. Because of the non-canonical codon usage of C. albicans where CUG is translated as serine rather than leucine, it is often problematic to use the yeast two-hybrid system in Saccharomyces cerevisiae to study C. albicans PPIs. However, studying PPIs is crucial to gain a thorough understanding of the function of proteins, biological processes and pathways. PPIs can also be potential drug targets. To aid in creating PPI networks and updating the BioGRID, we performed an exhaustive literature search in order to provide, in an accessible format, a more extensive list of known PPIs in C. albicans.
Facebook
TwitterSignificant gene sets found using iHS scores and Daub et al. [19] approach. Threshold is set to <0.09. Table S2. Significant gene sets found using iHS scores and Gowinda approach. Threshold is set to <0.09. Table S3. Significant gene sets found using XPCLR scores and Daub et al. [19] approach. Threshold is set to <0.09. For all the population pairs, the first population is the objective one and the second, the reference. Table S4. Significant gene sets found using XPCLR scores and Gowinda approach. Threshold is set to 0.09. For all the population pairs, the first population is the objective one and the second, the reference. Table S5. Immunity related gene sets detected with the GSEA approaches. Q-value threshold is set to <0.09. Table S6. 17 significant genes related to obesity, diabetes and metabolic syndrome that were found to be under positive selection with XPCLR and iHS analysis using the list of genes derived from Bio4j. Some of them (indicated with *) have been detected in previous studies to be under positive selection, too. The threshold was calculated based on the 1Â % cut off level. Genes are categorized in groups of potential risk factors, potential protective and indirect associations. Table S7. Significant genes related to obesity, diabetes and metabolic syndrome that we found to be under positive selection from XPCLR and iHS analysis using the Protein-Protein Interaction (PPI) networks from the STRING database. Five of them (indicated with *) have been detected in previous studies to be under positive selection. (XLSX 29 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the protein-protein interaction analysis dataset that was used in the unpublished manuscript and was further analyzed with the STRING online software.Significantly upregulated mRNAs (2,777 genes; p < 0.05) identified by bulk RNA-seq were analyzed using the STRING module in Cytoscape v.2.2.0 (Institute for System Biology; WA; USA). A cluster network was constructed using the MCL algorithm with a granularity parameter of 4, followed by filtering nodes with mcl.cluster > 10. The resulting 1,848 nodes were processed through STRING v12.0 (Swiss Institute of Bioinformatics; Lausanne; Switzerland) to generate a protein–protein interaction (PPI) network, incorporating evidence from text mining, genomic neighborhood, experimental data, curated databases, co-expression, gene fusion, and co-occurrence, with a minimum confidence score threshold of 0.40. Network modules were defined using the DBSCAN clustering algorithm with an ε parameter of 2. Cluster 1, representing the largest gene set (101 genes), was further analyzed by sorting the top 20 nodes with the highest node degree, resulting in a network comprising 101 nodes and 756 edges. Global network metrics indicated an average node degree of 15, a local clustering coefficient of 0.600, and a PPI enrichment p-value of < 1 × 10⁻¹⁶. The average values of coexpression, experimentally determined interactions, automated text mining, and combined scores were calculated.