Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Protein networks have become a popular tool for analyzing and visualizing the often long lists of proteins or genes obtained from proteomics and other high-throughput technologies. One of the most popular sources of such networks is the STRING database, which provides protein networks for more than 2000 organisms, including both physical interactions from experimental data and functional associations from curated pathways, automatic text mining, and prediction methods. However, its web interface is mainly intended for inspection of small networks and their underlying evidence. The Cytoscape software, on the other hand, is much better suited for working with large networks and offers greater flexibility in terms of network analysis, import, and visualization of additional data. To include both resources in the same workflow, we created stringApp, a Cytoscape app that makes it easy to import STRING networks into Cytoscape, retains the appearance and many of the features of STRING, and integrates data from associated databases. Here, we introduce many of the stringApp features and show how they can be used to carry out complex network analysis and visualization tasks on a typical proteomics data set, all through the Cytoscape user interface. stringApp is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/stringapp.
Facebook
TwitterSTRING protein-protein interaction networks for WT-C vs. WT-D.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The optimal dimensions of raw network embedding representations and the corresponding 3rd hidden layer outputs (a.k.a. the STRING2GO-learnt functional representations) with their corresponding predictive power for biological process terms prediction, and the main characteristics of different STRING networks.
Facebook
TwitterColumn ‘Gene’ contains the T. reesei gene ID. ‘In STRING’ tells if the gene has interactions in STRING. Columns ‘Btw’ and ‘Deg’ denote the betweenness and degree network statistics of the corresponding gene. Columns ‘Class’ and ‘Putative secretion pathway component’ are author assigned classifications. ‘Taxon specificity’ gives the largest taxonomic group the gene was found in.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of the genes in the protein interaction network constructed based on the STRING database.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Pathway model based on hub miRNAs and their putative targets from network analysis. * From a set of differentially expressed genes in both chronic HCV (hepatitis C virus) and HCC (hepatocellular carcinoma) samples, a protein-protein network was constructed using STRING and GeneMANIA. * After topological analysis and network visualization in Cytoscape, the top hub genes were identified. * miRNAs related to hub genes were identified using miRTarBase server and combined with the PPI network to constructed a miRNA-Hubgene network. Based on Figure 4 from Poortahmasebi et al, How Hepatitis C Virus Leads to Hepatocellular Carcinoma: A Network-Based Study. Proteins on this pathway have targeted assays available via the CPTAC Assay Portal.
Facebook
Twitterhttp://string-db.org/newstring_cgi/show_download_page.plhttp://string-db.org/newstring_cgi/show_download_page.pl
STRING is a database of known and predicted protein interactions, including both physical and functional interactions. It contains data which derived from four sources: genomic context, high-throughput experiments, coexpression and previous knowledge. This database quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. It performs iterative searches and visualizes the results in their genomic context. Many data including protein sequences, protein network, interaction types for protein links, orthologous groups or full database dumps (license required) are downloadable.
Facebook
TwitterBasic information of the four original networks (HIPPIE, HumanNet, FunCoup and STRING) and the GO network.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unclassified proteins in the STRING network analysis.
Facebook
TwitterSelection of 30 central genes from PPI network, including 17 upregulated and 13 downregulated genes, by using the STRING and Cytoscape software.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Protein–protein interaction network of the top differentially expressed genes between the patient’s samples and the Ctrl cohort. Edges represent protein–protein associations. Confidence ≥0.700; maximum number of interactors ≤20. Edge confidence: high (0.700) and highest (0.900) (see https://string-db.org/cgi/network).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary statistics for protein-protein interaction networks identified with STRING amongst genes corresponding to significant SNPs or k-mers (inside or adjacent to genes). PPI enrichment p-value corresponds to the likelihood nodes and edges would be selected from the S. aureus database by chance.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The European Power Grid Network dataset contains anonym zed data that sheds light on the intricate connections between nodes within Europe’s electricity grid. Researchers and policymakers can leverage this dataset to gain valuable insights into energy trading patterns, nodal prices, and the stability of energy supply.
1. Network Structure and Insights:
o The dataset provides detailed information about the interconnections between nodes across the European power grid. Researchers can analyze these links to understand how electricity flows between different regions. o By examining nodal prices, researchers can uncover pricing dynamics. This includes variations based on geographical location, demand, and supply. o Geospatial analysis facilitated by this dataset allows researchers to identify patterns in power market behavior, congestion points, and reliability challenges.
2. Critical Energy Supplies and Stability:
o Identifying critical energy supplies is essential for maintaining grid stability. Policymakers can use this dataset to inform decisions related to energy security and resilience. o Additionally, the dataset enables cross-state comparisons of power price competitiveness, aiding policymakers in designing effective energy policies.
This dataset contains anonymized information about the European power grid network, providing insights on the connections between nodes and their pricing. To use this dataset, one must identify the source and destination nodes of the power grid along with associated features such as prices and country information.
Firstly, it is important to understand the readings of each column in order to navigate through the data effectively:
from: The source node of the power grid. (Integer)
to: The destination node of the power grid. (Integer)
name: Name of the node in European Power Grid Network. (String)
price: Price of electricity at each node. (Float)
country: Country in which a particular node is located. (String).
Secondly, it is helpful to visualize and explore this dataset with various plots for better understanding its features for valuable analysis insights such as geospatial exploration by plotting out their geographical locations on maps; comparison between different countries or regions regarding electricity prices; assessing economic relationships through trade flows or supply-chains networks related to energy market developments; etc., all are possible via simple analyses that can be done from this european_power_grid dataset!
Acknowledgements
If you use this dataset in your research, please credit the original authors.
https://zenodo.org/records/7037956#.Y9Y6yNJBwUE
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-proteins
import os.path as osp
import pandas as pd
import torch
import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset
class PygOgbnProteins(PygNodePropPredDataset):
def _init_(self, meta_csv = None):
root, name, transform = '/kaggle/input', 'ogbn-proteins', T.ToSparseTensor()
if meta_csv is None:
meta_csv = osp.join(root, name, 'ogbn-master.csv')
master = pd.read_csv(meta_csv, index_col = 0)
meta_dict = master[name]
meta_dict['dir_path'] = osp.join(root, name)
super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
def get_idx_split(self, split_type = None):
if split_type is None:
split_type = self.meta_info['split']
path = osp.join(self.root, 'split', split_type)
if osp.isfile(os.path.join(path, 'split_dict.pt')):
return torch.load(os.path.join(path, 'split_dict.pt'))
if self.is_hetero:
train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
for nodetype in train_idx_dict.keys():
train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
else:
train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
train_idx = torch.from_numpy(train_idx).to(torch.long)
valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
valid_idx = torch.from_numpy(valid_idx).to(torch.long)
test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
test_idx = torch.from_numpy(test_idx).to(torch.long)
return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
dataset = PygOgbnProteins()
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
graph = dataset[0] # PyG Graph object
Graph: The ogbn-proteins dataset is an undirected, weighted, and typed (according to species) graph. Nodes represent proteins, and edges indicate different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology [1,2]. All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species.
Prediction task: The task is to predict the presence of protein functions in a multi-label binary classification setup, where there are 112 kinds of labels to predict in total. The performance is measured by the average of ROC-AUC scores across the 112 tasks.
Dataset splitting: The authors split the protein nodes into training/validation/test sets according to the species which the proteins come from. This enables the evaluation of the generalization performance of the model across different species.
Note: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.
| Package | #Nodes | #Edges | Split Type | Task Type | Metric |
|---|---|---|---|---|---|
ogb>=1.1.1 | 132,534 | 39,561,252 | Species | Multi-label binary classification | ROC-AUC |
Website: https://ogb.stanford.edu
The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.
[1] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019. [2] Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchm...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Homophily/heterophily evaluation, expressed in terms of z-score values, is related to the human Protein-Protein Interaction Network (PPI), obtained from the STRING v11.5 database (https://string-db.org) setting standard threshold on edge score (T=700). Each protein occurring in the PPI was assigned to a class corresponding to the chromosome the related gene belongs to.
A total of 23 classes (chr1, chr2, ..., chr22, chrX) were considered (excluding the class corresponding to chromosome Y because of the small number of genes occurring in the network).
The homophily/heterophily nature of the network, with respect to chromosome classes, was evaluated through HONTO tool (https://github.com/cumbof/honto).
In other words, the tendency of proteins to preferentially interact with proteins whose genes are physically located on the same chromosome (homophily) or on different chromosomes (heterophily) was investigated and evaluated in terms of z-scores.
Values related to intra (along the diagonal) and inter chromosomal interactions (other than the diagonal) are also reported as a heatmap.
As one can observe, values occurring in the diagonal are clearly higher than values out of the diagonal, leading to assess a homophilic nature of the network, confirming the link between shared chromosome and interaction in the PPI.
Facebook
TwitterGiven the limitation of technologies, the subcellular localizations of proteins are difficult to identify. Predicting the subcellular localization and the intercellular distribution patterns of proteins in accordance with their specific biological roles, including validated functions, relationships with other proteins, and even their specific sequence characteristics, is necessary. The computational prediction of protein subcellular localizations can be performed on the basis of the sequence and the functional characteristics. In this study, the protein–protein interaction network, functional annotation of proteins and a group of direct proteins with known subcellular localization were used to construct models. To build efficient models, several powerful machine learning algorithms, including two feature selection methods, four classification algorithms, were employed. Some key proteins and functional terms were discovered, which may provide important contributions for determining protein subcellular locations. Furthermore, some quantitative rules were established to identify the potential subcellular localizations of proteins. As the first prediction model that uses direct protein annotation information (i.e., functional features) and STRING-based protein–protein interaction network (i.e., network features), our computational model can help promote the development of predictive technologies on subcellular localizations and provide a new approach for exploring the protein subcellular localization patterns and their potential biological importance.
Facebook
Twitter(A and B) Phosphorylation-specific networks based on integration of our phosphoproteomic data with protein-protein interactions in the STRING database. Note that both networks show constellation of hubs characterized by interacting proteins function. Zooms into the signaling hub of LAT efficient (LAT+/+) or deficient (LAT−/−) cell lines shows first neighbors (pistachio green or orange circles) of CD3ζ, LCK and ZAP-70 (red or blue circles). (C) The number of edges and (D) degree distribution for experimental and random networks. Orange and pistachio-green triangles correspond to degree distribution in networks based on data from two replicas in Jurkat CL20 cell line. Blue triangles and purple crosses correspond to networks based on the data in LAT-deficient cell line and randomly selected proteins, respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Validation of the total new predicted links and the new predicted links associated with the 10 proteins by STRING database for the 14317_PPI data.
Facebook
TwitterBacterial pathogens continue to threaten public health worldwide today. Identification of bacterial virulence factors can help to find novel drug/vaccine targets against pathogenicity. It can also help to reveal the mechanisms of the related diseases at the molecular level. With the explosive growth in protein sequences generated in the postgenomic age, it is highly desired to develop computational methods for rapidly and effectively identifying virulence factors according to their sequence information alone. In this study, based on the protein-protein interaction networks from the STRING database, a novel network-based method was proposed for identifying the virulence factors in the proteomes of UPEC 536, UPEC CFT073, P. aeruginosa PAO1, L. pneumophila Philadelphia 1, C. jejuni NCTC 11168 and M. tuberculosis H37Rv. Evaluated on the same benchmark datasets derived from the aforementioned species, the identification accuracies achieved by the network-based method were around 0.9, significantly higher than those by the sequence-based methods such as BLAST, feature selection and VirulentPred. Further analysis showed that the functional associations such as the gene neighborhood and co-occurrence were the primary associations between these virulence factors in the STRING database. The high success rates indicate that the network-based method is quite promising. The novel approach holds high potential for identifying virulence factors in many other various organisms as well because it can be easily extended to identify the virulence factors in many other bacterial species, as long as the relevant significant statistical data are available for them.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Transcription factor-protein-protein interaction networks (TF-PPI) key pathway modulators in diabetes. A network of significantly modulated TF-PPIs for intact (A) and injured vessels at different timepoints - 20 hours (B), 2 weeks (C), and 6 weeks (D). Significantly up- and down-regulated genes from each timepoint comparing Goto-Kakizaki (GK) vs Wistar rats, were used to obtain TF-PPI, and this information was fed into STRING database to generate the network. The top 10 up- and 10 down-regulated TFs are shown in the network above. Up- and down-regulated TFs are indicated in green and red nodes respectively. Size of the nodes indicate the levels of P-value. All the interactions were predicted with the adjusted P-value < .05.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Protein networks have become a popular tool for analyzing and visualizing the often long lists of proteins or genes obtained from proteomics and other high-throughput technologies. One of the most popular sources of such networks is the STRING database, which provides protein networks for more than 2000 organisms, including both physical interactions from experimental data and functional associations from curated pathways, automatic text mining, and prediction methods. However, its web interface is mainly intended for inspection of small networks and their underlying evidence. The Cytoscape software, on the other hand, is much better suited for working with large networks and offers greater flexibility in terms of network analysis, import, and visualization of additional data. To include both resources in the same workflow, we created stringApp, a Cytoscape app that makes it easy to import STRING networks into Cytoscape, retains the appearance and many of the features of STRING, and integrates data from associated databases. Here, we introduce many of the stringApp features and show how they can be used to carry out complex network analysis and visualization tasks on a typical proteomics data set, all through the Cytoscape user interface. stringApp is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/stringapp.