100+ datasets found
  1. STRING-Protein-Protein-Interactions-Network

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). STRING-Protein-Protein-Interactions-Network [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/string-protein-protein-interactions-network
    Explore at:
    zip(6368384 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ataset representing a Protein-Protein Interaction (PPI) network of human proteins. Data generated and scored using the comprehensive STRING database resource. Focuses on analyzing functional and physical associations between proteins. Includes confidence scores (e.g., text-mining, experimental) for each interaction. A foundational resource for systems biology and identifying molecular hubs in disease pathways.

  2. STRING Network Analysis

    • figshare.com
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dain Lee (2025). STRING Network Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.29126396.v2
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Dain Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the protein-protein interaction analysis dataset that was used in the unpublished manuscript and was further analyzed with the STRING online software.Significantly upregulated mRNAs (2,777 genes; p < 0.05) identified by bulk RNA-seq were analyzed using the STRING module in Cytoscape v.2.2.0 (Institute for System Biology; WA; USA). A cluster network was constructed using the MCL algorithm with a granularity parameter of 4, followed by filtering nodes with mcl.cluster > 10. The resulting 1,848 nodes were processed through STRING v12.0 (Swiss Institute of Bioinformatics; Lausanne; Switzerland) to generate a protein–protein interaction (PPI) network, incorporating evidence from text mining, genomic neighborhood, experimental data, curated databases, co-expression, gene fusion, and co-occurrence, with a minimum confidence score threshold of 0.40. Network modules were defined using the DBSCAN clustering algorithm with an ε parameter of 2. Cluster 1, representing the largest gene set (101 genes), was further analyzed by sorting the top 20 nodes with the highest node degree, resulting in a network comprising 101 nodes and 756 edges. Global network metrics indicated an average node degree of 15, a local clustering coefficient of 0.600, and a PPI enrichment p-value of < 1 × 10⁻¹⁶. The average values of coexpression, experimentally determined interactions, automated text mining, and combined scores were calculated.

  3. STRING protein-protein interaction networks for WT-C vs. WT-D.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parisa Sooshtari; Biao Feng; Saumik Biswas; Michael Levy; Hanxin Lin; Zhaoliang Su; Subrata Chakrabarti (2023). STRING protein-protein interaction networks for WT-C vs. WT-D. [Dataset]. http://doi.org/10.1371/journal.pone.0270287.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Parisa Sooshtari; Biao Feng; Saumik Biswas; Michael Levy; Hanxin Lin; Zhaoliang Su; Subrata Chakrabarti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    STRING protein-protein interaction networks for WT-C vs. WT-D.

  4. Protein interaction data for 222 BM zone components

    • figshare.com
    xlsx
    Updated Feb 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mychel Morais; Ranjay Jayadev; Rachel Lennon; David Sherwood; Jamie Ellingford; Craig Lawless (2022). Protein interaction data for 222 BM zone components [Dataset]. http://doi.org/10.6084/m9.figshare.19127504.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 6, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Mychel Morais; Ranjay Jayadev; Rachel Lennon; David Sherwood; Jamie Ellingford; Craig Lawless
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All human protein interactions were obtained from STRING (https://string-db.org/, version 11.0). Interactions were then filtered to those involving only BM zone proteins. Related to Fig. S6B.

  5. Citation network of the knowledge co-production literature. Supplementary...

    • zenodo.org
    csv
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justyna Bandola-Gill; Justyna Bandola-Gill; Megan Arthur; Megan Arthur; Rhodri Ivor Leng; Rhodri Ivor Leng (2021). Citation network of the knowledge co-production literature. Supplementary data. [Dataset]. http://doi.org/10.5281/zenodo.5762451
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 8, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Justyna Bandola-Gill; Justyna Bandola-Gill; Megan Arthur; Megan Arthur; Rhodri Ivor Leng; Rhodri Ivor Leng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data description

    This data note describes the final citation network dataset analysed in the manuscript "What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives’"[1].

    The data collection strategy used to construct the following dataset can be found in the associated manuscript [1]. These data were originally downloaded from the Web of Science (WoS) Core Collection via the library subscription of the University of Edinburgh via a systematic search methodology that sought to capture literature relevant to ‘knowledge co-production’. The dataset consists of 1,893 unique document reference strings (nodes) interlinked together by 9,759 citation links (edges). The network dataset describes a directed citation network composed of papers relevant to 'knowledge co-production', and is split into two files: (i) ‘KnowCo_node_attribute_list.csv’ contains attributes of the 1,893 documents (nodes); and (ii) ‘KnowCo_edge_list.csv’ records the citation links (edges) between pairs of documents.

    1. ‘KnowCo_node_attribute_list.csv’ consists of attributes of the 1,893 nodes (documents) of the citation network. Due to the approach used to collect data, there are two types of node: (i) 525 nodes represent documents retrieved from WoS via the systematic search strategy, and these have full attribute data including their reference lists; and (ii) 1,368 documents that were cited >2 times by our 525 fully retrieved papers (see manuscript for full description [1]). The columns refer to:

    Id, the unique identifier. Fully retrieved documents are identified via a unique identifier that begins with ‘f’ followed by an integer (e.g. f1, f2, etc.). Non-retrieved documents are identified via a unique identifier beginning with ‘n’ followed by an integer (e.g. n1, n2, etc.).

    Label, contains the unique reference string of the document for which the attribute data in that row corresponds. Reference strings contain the last name of the first author, publication year, journal, volume, start page, and DOI (if available).

    authors, all author names. These are in the order that these names appear in the authorship list of the corresponding document. These data are only available for fully retrieved documents.

    title, document title. These data are only available for fully retrieved documents.

    journal, journal of publication. These data are only available for fully retrieved documents. For those interested in journal data for the remaining papers, this can be extracted from the reference string in the ‘Label’ column.

    year, year of publication. These data are available for all nodes.

    type, document type (e.g. article, review). Available only for fully retrieved documents.

    wos_total_citations, total citation count as recorded by Web of Science Core Collection as of May 2020. Available only for fully retrieved documents.

    wos_id, Web of Science accession number. Available only for fully retrieved documents only, for non-retrieved documents ‘CitedReference’ fills the cell.

    cluster, provides the cluster membership number as discussed within the manuscript, established via modularity maximisation via the Leiden algorithm (Res 0.8; Q=0.53|5 clusters). Available for all nodes.

    indegree, total count of within network citations to a given document. Due to the composition of the network, this figure tells us the total number of citations from 525 fully retrieved documents to each of the 1,893 documents within the network. Available for all nodes.

    outdegree, total count of within network references from a given document. Due to the composition of the network, only fully retrieved documents can have a value >0 because only these documents have their associated reference list data. Available for all nodes.

    2. ‘KnowCo_edge _list.csv’ is an edge list containing 9,759 citation links between the 1,893 documents. The columns refer to:

    Source, the citing document’s unique identifier.

    Target, the cited document’s unique identifier.

    Notes

    [1] Bandola-Gill, J., Arthur, M., & Leng, R. I. (Under review). What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives. Evidence & Policy

  6. Statistics of the genes in the protein interaction network constructed based...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shunyao Wu; Fengjing Shao; Jun Ji; Rencheng Sun; Rizhuang Dong; Yuanke Zhou; Shaojie Xu; Yi Sui; Jianlong Hu (2023). Statistics of the genes in the protein interaction network constructed based on the STRING database. [Dataset]. http://doi.org/10.1371/journal.pone.0116505.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shunyao Wu; Fengjing Shao; Jun Ji; Rencheng Sun; Rizhuang Dong; Yuanke Zhou; Shaojie Xu; Yi Sui; Jianlong Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics of the genes in the protein interaction network constructed based on the STRING database.

  7. o

    Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with...

    • explore.openaire.eu
    Updated Jun 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Szymborski; Amin Emad (2022). Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks [Dataset]. http://doi.org/10.5281/zenodo.6709789
    Explore at:
    Dataset updated
    Jun 23, 2022
    Authors
    Joseph Szymborski; Amin Emad
    Description

    Data for RAPPPID, a method for the Regularised Automative Prediction of Protein-Protein Interactions using Deep Learning. These datasets are in a format that RAPPPID is ready to read. Comparatives Dataset These datasets were derived from the STRING v11 H. sapiens dataset, according to the C1, C2, and C3 procedures outlined by Park and Marcotte, 2012. Negative samples are sampled randomly from the space of proteins not known to interact. See Szymborski & Emad for details. Repeatability Datasets The following datasets are all derived from STRING in the manner as the comparatives dataset, but three different random seeds are used for drawing proteins. References Park,Y. and Marcotte,E.M. (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods, 9, 1134–1136. Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., and Mering, C. (2019). String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613. Szymborski,J. and Emad,A. (2021) RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks. bioRxiv https://doi.org/10.1101/2021.08.13.456309

  8. European Power Grid Network Dataset

    • kaggle.com
    zip
    Updated Mar 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afroz (2024). European Power Grid Network Dataset [Dataset]. https://www.kaggle.com/datasets/pythonafroz/european-power-grid-network-dataset
    Explore at:
    zip(92071 bytes)Available download formats
    Dataset updated
    Mar 2, 2024
    Authors
    Afroz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    The European Power Grid Network dataset contains anonym zed data that sheds light on the intricate connections between nodes within Europe’s electricity grid. Researchers and policymakers can leverage this dataset to gain valuable insights into energy trading patterns, nodal prices, and the stability of energy supply.

    1. Network Structure and Insights:

    o The dataset provides detailed information about the interconnections between nodes across the European power grid. Researchers can analyze these links to understand how electricity flows between different regions. o By examining nodal prices, researchers can uncover pricing dynamics. This includes variations based on geographical location, demand, and supply. o Geospatial analysis facilitated by this dataset allows researchers to identify patterns in power market behavior, congestion points, and reliability challenges.

    2. Critical Energy Supplies and Stability:

    o Identifying critical energy supplies is essential for maintaining grid stability. Policymakers can use this dataset to inform decisions related to energy security and resilience. o Additionally, the dataset enables cross-state comparisons of power price competitiveness, aiding policymakers in designing effective energy policies.

    This dataset contains anonymized information about the European power grid network, providing insights on the connections between nodes and their pricing. To use this dataset, one must identify the source and destination nodes of the power grid along with associated features such as prices and country information.

    Firstly, it is important to understand the readings of each column in order to navigate through the data effectively:

    1. from: The source node of the power grid. (Integer)

    2. to: The destination node of the power grid. (Integer)

    3. name: Name of the node in European Power Grid Network. (String)

    4. price: Price of electricity at each node. (Float)

    5. country: Country in which a particular node is located. (String).

    Secondly, it is helpful to visualize and explore this dataset with various plots for better understanding its features for valuable analysis insights such as geospatial exploration by plotting out their geographical locations on maps; comparison between different countries or regions regarding electricity prices; assessing economic relationships through trade flows or supply-chains networks related to energy market developments; etc., all are possible via simple analyses that can be done from this european_power_grid dataset!

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    https://zenodo.org/records/7037956#.Y9Y6yNJBwUE

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

    https://creativecommons.org/publicdomain/zero/1.0/

  9. f

    Selection of 30 central genes from PPI network, including 17 upregulated and...

    • datasetcatalog.nlm.nih.gov
    Updated Jun 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Ping; Wang, Xiaoming; Chen, Xuewei; Dong, Lijin; Fan, Rong (2021). Selection of 30 central genes from PPI network, including 17 upregulated and 13 downregulated genes, by using the STRING and Cytoscape software. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000899333
    Explore at:
    Dataset updated
    Jun 11, 2021
    Authors
    Li, Ping; Wang, Xiaoming; Chen, Xuewei; Dong, Lijin; Fan, Rong
    Description

    Selection of 30 central genes from PPI network, including 17 upregulated and 13 downregulated genes, by using the STRING and Cytoscape software.

  10. i

    STRING

    • integbio.jp
    Updated Jun 17, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    STRING Consortium (2013). STRING [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc00690?jtpl=56
    Explore at:
    Dataset updated
    Jun 17, 2013
    Dataset provided by
    STRING Consortium
    License

    http://string-db.org/newstring_cgi/show_download_page.plhttp://string-db.org/newstring_cgi/show_download_page.pl

    Description

    STRING is a database of known and predicted protein interactions, including both physical and functional interactions. It contains data which derived from four sources: genomic context, high-throughput experiments, coexpression and previous knowledge. This database quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. It performs iterative searches and visualizes the results in their genomic context. Many data including protein sequences, protein network, interaction types for protein links, orthologous groups or full database dumps (license required) are downloadable.

  11. Median values of the proportions of non-disease essential proteins among n...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shunyao Wu; Fengjing Shao; Jun Ji; Rencheng Sun; Rizhuang Dong; Yuanke Zhou; Shaojie Xu; Yi Sui; Jianlong Hu (2023). Median values of the proportions of non-disease essential proteins among n (n ∈ {1, 2, 3, 4, 5, 6, 7}) neighbors of nonessential disease proteins (D−) and other proteins (O) in the protein interaction network constructed based on the STRING database. [Dataset]. http://doi.org/10.1371/journal.pone.0116505.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shunyao Wu; Fengjing Shao; Jun Ji; Rencheng Sun; Rizhuang Dong; Yuanke Zhou; Shaojie Xu; Yi Sui; Jianlong Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Median values of the proportions of non-disease essential proteins among n (n ∈ {1, 2, 3, 4, 5, 6, 7}) neighbors of nonessential disease proteins (D−) and other proteins (O) in the protein interaction network constructed based on the STRING database.

  12. Z

    Evaluating homophily of human PPI with respect to chromosomes

    • data.niaid.nih.gov
    Updated Jul 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apollonio, Nicola; Blankenberg, Daniel; Cumbo, Fabio; Franciosa, Paolo Giulio; Santoni, Daniele (2022). Evaluating homophily of human PPI with respect to chromosomes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6941314
    Explore at:
    Dataset updated
    Jul 30, 2022
    Dataset provided by
    Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy
    Department of Statistical Science, University of Rome "La Sapienza", Piazzale Aldo Moro 5, 00185 Rome, Italy
    Institute for applied mathematics "Mauro Picone", National Research Council of Italy, Via dei Taurini 19, 00185 Rome, Italy
    Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA
    Authors
    Apollonio, Nicola; Blankenberg, Daniel; Cumbo, Fabio; Franciosa, Paolo Giulio; Santoni, Daniele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Homophily/heterophily evaluation, expressed in terms of z-score values, is related to the human Protein-Protein Interaction Network (PPI), obtained from the STRING v11.5 database (https://string-db.org) setting standard threshold on edge score (T=700). Each protein occurring in the PPI was assigned to a class corresponding to the chromosome the related gene belongs to.

    A total of 23 classes (chr1, chr2, ..., chr22, chrX) were considered (excluding the class corresponding to chromosome Y because of the small number of genes occurring in the network).

    The homophily/heterophily nature of the network, with respect to chromosome classes, was evaluated through HONTO tool (https://github.com/cumbof/honto).

    In other words, the tendency of proteins to preferentially interact with proteins whose genes are physically located on the same chromosome (homophily) or on different chromosomes (heterophily) was investigated and evaluated in terms of z-scores.

    Values related to intra (along the diagonal) and inter chromosomal interactions (other than the diagonal) are also reported as a heatmap.

    As one can observe, values occurring in the diagonal are clearly higher than values out of the diagonal, leading to assess a homophilic nature of the network, confirming the link between shared chromosome and interaction in the PPI.

  13. d

    Zaied et al. supplementary data 1 - 10

    • catalogue.data.govt.nz
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Zaied et al. supplementary data 1 - 10 [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-24911157
    Explore at:
    Dataset updated
    Aug 11, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary Data 1. STRING protein-protein interaction network in lung. Supplementary Data 2. PROPER protein-protein interaction network in lung. Supplementary Data 3. STRING protein-protein interaction network in whole blood. Supplementary Data 4. PROPER protein-protein interaction network in whole blood. Supplementary Data 5. Genes causal for asthma in the lung GRN identified using Mendelian randomisation (Wald ratio method). Supplementary Data 6. Genes causal for asthma in the Blood GRN identified using Mendelian randomisation (Wald ratio method and inverse variance weighted). Supplementary Data 7. significantly enriched (hypergeometric test, FDR≤0.05 and 500 sets of Monte Carlo simulations) asthma-trait interactions. Supplementary Data 8. curated gene-disease associations from DisGeNet for the identified level 0-4 genes (hypergeometric test, FDR≤0.05). Supplementary Data 9. Comorbidity analysis using health records of 2051661 hospitalized patients, 26781 of which had asthma (ICD10-AM code J459). Supplementary Data 10. list of level 0-4 genes (hypergeometric test, FDR≤0.05 and 500 sets of Monte Carlo simulation) that are part of the druggable genome and/or have known drug targets and/or have been causally associated with asthma through Mendelian Randomization.

  14. f

    STRING v9.1 analysis of the 27 proteins deregulated only in glia exposed to...

    • datasetcatalog.nlm.nih.gov
    Updated Mar 18, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brodaty, Henry; Smythe, George; Jayasena, Tharusha; Poljak, Anne; Kochan, Nicole; Sachdev, Perminder; Trollor, Julian; Hill, Mark; Raftery, Mark; Braidy, Nady (2015). STRING v9.1 analysis of the 27 proteins deregulated only in glia exposed to AD plasma (shown in Table 4). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001899459
    Explore at:
    Dataset updated
    Mar 18, 2015
    Authors
    Brodaty, Henry; Smythe, George; Jayasena, Tharusha; Poljak, Anne; Kochan, Nicole; Sachdev, Perminder; Trollor, Julian; Hill, Mark; Raftery, Mark; Braidy, Nady
    Description

    for enrichment in gene ontology biological processes. Glucose metabolism was found to be the most significant biological process, and is also highlighted in the STRING network map (Fig. 5).

  15. w

    CreativeWork

    • pfocr.wikipathways.org
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiPathways (2023). CreativeWork [Dataset]. https://pfocr.wikipathways.org/figures/PMC10242111_fcell-11-1165308-g004.html
    Explore at:
    Dataset updated
    Jun 9, 2023
    Dataset authored and provided by
    WikiPathways
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Protein–protein interaction network of the top differentially expressed genes between the patient’s samples and the Ctrl cohort. Edges represent protein–protein associations. Confidence ≥0.700; maximum number of interactors ≤20. Edge confidence: high (0.700) and highest (0.900) (see https://string-db.org/cgi/network).

  16. d

    Data from: Determining the minimum number of protein-protein interactions...

    • dataone.org
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natsu Nakajima; Morihiro Hayashida; Jesper Jansson; Osamu Maruyama; Tatsuya Akutsu (2025). Determining the minimum number of protein-protein interactions required to support known protein complexes [Dataset]. http://doi.org/10.5061/dryad.8s3682g
    Explore at:
    Dataset updated
    Apr 16, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Natsu Nakajima; Morihiro Hayashida; Jesper Jansson; Osamu Maruyama; Tatsuya Akutsu
    Time period covered
    Apr 30, 2018
    Description

    The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results...

  17. OGBN-Proteins (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-Proteins (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-proteins
    Explore at:
    zip(677947148 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    OGBN-Proteins

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-proteins

    Usage in Python

    import os.path as osp
    import pandas as pd
    import torch
    import torch_geometric.transforms as T
    from ogb.nodeproppred import PygNodePropPredDataset
    
    class PygOgbnProteins(PygNodePropPredDataset):
      def _init_(self, meta_csv = None):
        root, name, transform = '/kaggle/input', 'ogbn-proteins', T.ToSparseTensor()
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbn-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
        path = osp.join(self.root, 'split', split_type)
        if osp.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
        if self.is_hetero:
          train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
          for nodetype in train_idx_dict.keys():
            train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
            valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
            test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
            return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
        else:
          train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
          train_idx = torch.from_numpy(train_idx).to(torch.long)
          valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
          valid_idx = torch.from_numpy(valid_idx).to(torch.long)
          test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
          test_idx = torch.from_numpy(test_idx).to(torch.long)
          return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
    
    dataset = PygOgbnProteins()
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-proteins dataset is an undirected, weighted, and typed (according to species) graph. Nodes represent proteins, and edges indicate different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology [1,2]. All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species.

    Prediction task: The task is to predict the presence of protein functions in a multi-label binary classification setup, where there are 112 kinds of labels to predict in total. The performance is measured by the average of ROC-AUC scores across the 112 tasks.

    Dataset splitting: The authors split the protein nodes into training/validation/test sets according to the species which the proteins come from. This enables the evaluation of the generalization performance of the model across different species.

    Note: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.1.1132,53439,561,252SpeciesMulti-label binary classificationROC-AUC

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019. [2] Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchm...

  18. e

    Data from: Plasma proteomics in epilepsy: network-based identification of...

    • ebi.ac.uk
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liisa Arike (2025). Plasma proteomics in epilepsy: network-based identification of proteins associated with seizures [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD057292
    Explore at:
    Dataset updated
    Apr 25, 2025
    Authors
    Liisa Arike
    Variables measured
    Proteomics
    Description

    Purpose Identification of potential biomarkers of seizures. Methods In this exploratory study, we quantified plasma protein intensities in 15 patients with recent seizures compared to 15 patients with long-standing seizure freedom. Using TMT-based proteomics we found fifty-one differentially expressed proteins. Results Network analyses including co-expression networks and protein-protein interaction networks, using the STRING database, followed by network centrality and modularity analyses revealed 22 protein modules, with one module showing a significant association with seizures. The protein-protein interaction network centered around this module identified a subnetwork of 125 proteins, grouped into four clusters. Notably, one cluster (mainly enriching inflammatory pathways and Gene Ontology terms) demonstrated the highest enrichment of known epilepsy-related genes. Conclusion Overall, our network-based approach identified a protein module linked with seizures. The module contained known markers of epilepsy and inflammation. The results also demonstrate the potential of network analysis in discovering new biomarkers for improved epilepsy management.

  19. f

    Basic information of the four original networks (HIPPIE, HumanNet, FunCoup...

    • datasetcatalog.nlm.nih.gov
    Updated Dec 22, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang, Jian; Lin, Limei; Yang, Fan; Wu, Duzhi; Yang, Tinghong; Zhao, Jing (2017). Basic information of the four original networks (HIPPIE, HumanNet, FunCoup and STRING) and the GO network. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001798563
    Explore at:
    Dataset updated
    Dec 22, 2017
    Authors
    Yang, Jian; Lin, Limei; Yang, Fan; Wu, Duzhi; Yang, Tinghong; Zhao, Jing
    Description

    Basic information of the four original networks (HIPPIE, HumanNet, FunCoup and STRING) and the GO network.

  20. Protein Expression Profiles Characterize Distinct Features of Mouse Cerebral...

    • plos.figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haijun Zhang; Yoko Kawase-Koga; Tao Sun (2023). Protein Expression Profiles Characterize Distinct Features of Mouse Cerebral Cortices at Different Developmental Stages [Dataset]. http://doi.org/10.1371/journal.pone.0125608
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Haijun Zhang; Yoko Kawase-Koga; Tao Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The proper development of the mammalian cerebral cortex requires precise protein synthesis and accurate regulation of protein expression levels. To reveal signatures of protein expression in developing mouse cortices, we here generate proteomic profiles of cortices at embryonic and postnatal stages using tandem mass spectrometry (MS/MS). We found that protein expression profiles are mostly consistent with biological features of the developing cortex. Gene Ontology (GO) and KEGG pathway analyses demonstrate conserved molecules that maintain cortical development such as proteins involved in metabolism. GO and KEGG pathway analyses further identify differentially expressed proteins that function at specific stages, for example proteins regulating the cell cycle in the embryonic cortex, and proteins controlling axon guidance in the postnatal cortex, suggesting that distinct protein expression profiles determine biological events in the developing cortex. Furthermore, the STRING network analysis has revealed that many proteins control a single biological event, such as the cell cycle regulation, through cohesive interactions, indicating a complex network regulation in the cortex. Our study has identified protein networks that control the cortical development and has provided a protein reference for further investigation of protein interactions in the cortex.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dr. Nagendra (2025). STRING-Protein-Protein-Interactions-Network [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/string-protein-protein-interactions-network
Organization logo

STRING-Protein-Protein-Interactions-Network

Protein-Protein Interaction Network Data from the STRING Database.

Explore at:
25 scholarly articles cite this dataset (View in Google Scholar)
zip(6368384 bytes)Available download formats
Dataset updated
Nov 29, 2025
Authors
Dr. Nagendra
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

ataset representing a Protein-Protein Interaction (PPI) network of human proteins. Data generated and scored using the comprehensive STRING database resource. Focuses on analyzing functional and physical associations between proteins. Includes confidence scores (e.g., text-mining, experimental) for each interaction. A foundational resource for systems biology and identifying molecular hubs in disease pathways.

Search
Clear search
Close search
Google apps
Main menu