100+ datasets found
  1. f

    STRING Network Analysis

    • figshare.com
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dain Lee (2025). STRING Network Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.29126396.v2
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    figshare
    Authors
    Dain Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the protein-protein interaction analysis dataset that was used in the unpublished manuscript and was further analyzed with the STRING online software.Significantly upregulated mRNAs (2,777 genes; p < 0.05) identified by bulk RNA-seq were analyzed using the STRING module in Cytoscape v.2.2.0 (Institute for System Biology; WA; USA). A cluster network was constructed using the MCL algorithm with a granularity parameter of 4, followed by filtering nodes with mcl.cluster > 10. The resulting 1,848 nodes were processed through STRING v12.0 (Swiss Institute of Bioinformatics; Lausanne; Switzerland) to generate a protein–protein interaction (PPI) network, incorporating evidence from text mining, genomic neighborhood, experimental data, curated databases, co-expression, gene fusion, and co-occurrence, with a minimum confidence score threshold of 0.40. Network modules were defined using the DBSCAN clustering algorithm with an ε parameter of 2. Cluster 1, representing the largest gene set (101 genes), was further analyzed by sorting the top 20 nodes with the highest node degree, resulting in a network comprising 101 nodes and 756 edges. Global network metrics indicated an average node degree of 15, a local clustering coefficient of 0.600, and a PPI enrichment p-value of < 1 × 10⁻¹⁶. The average values of coexpression, experimentally determined interactions, automated text mining, and combined scores were calculated.

  2. f

    Protein interaction data for 222 BM zone components

    • figshare.com
    xlsx
    Updated Feb 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mychel Morais; Ranjay Jayadev; Rachel Lennon; David Sherwood; Jamie Ellingford; Craig Lawless (2022). Protein interaction data for 222 BM zone components [Dataset]. http://doi.org/10.6084/m9.figshare.19127504.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 6, 2022
    Dataset provided by
    figshare
    Authors
    Mychel Morais; Ranjay Jayadev; Rachel Lennon; David Sherwood; Jamie Ellingford; Craig Lawless
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All human protein interactions were obtained from STRING (https://string-db.org/, version 11.0). Interactions were then filtered to those involving only BM zone proteins. Related to Fig. S6B.

  3. P

    STRING Dataset

    • paperswithcode.com
    Updated Oct 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi Tsafou; Michael Kuhn; Peer Bork; Lars Juhl Jensen; Christian von Mering (2021). STRING Dataset [Dataset]. https://paperswithcode.com/dataset/string
    Explore at:
    Dataset updated
    Oct 11, 2021
    Authors
    Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi Tsafou; Michael Kuhn; Peer Bork; Lars Juhl Jensen; Christian von Mering
    Description

    STRING is a collection of protein-protein interaction (PPI) networks.

  4. Cytoscape session of networks generated with the stringApp

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadezhda T. Doncheva; John H. Morris; Jan Gorodkin; Lars Juhl Jensen (2023). Cytoscape session of networks generated with the stringApp [Dataset]. http://doi.org/10.6084/m9.figshare.7258235.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Nadezhda T. Doncheva; John H. Morris; Jan Gorodkin; Lars Juhl Jensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supporting Information for the paper entitled Cytoscape stringApp: Network analysis and visualization of proteomics data (preprint available at bioRxiv). The Cytoscape session contains networks generated with the stringApp for the analysis of a phosphoproteomics dataset of ovarian cancer by Francavilla et al. (Cell Rep. 2017).

  5. f

    Data from: Cytoscape StringApp: Network Analysis and Visualization of...

    • acs.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadezhda T. Doncheva; John H. Morris; Jan Gorodkin; Lars J. Jensen (2023). Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data [Dataset]. http://doi.org/10.1021/acs.jproteome.8b00702.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Nadezhda T. Doncheva; John H. Morris; Jan Gorodkin; Lars J. Jensen
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Protein networks have become a popular tool for analyzing and visualizing the often long lists of proteins or genes obtained from proteomics and other high-throughput technologies. One of the most popular sources of such networks is the STRING database, which provides protein networks for more than 2000 organisms, including both physical interactions from experimental data and functional associations from curated pathways, automatic text mining, and prediction methods. However, its web interface is mainly intended for inspection of small networks and their underlying evidence. The Cytoscape software, on the other hand, is much better suited for working with large networks and offers greater flexibility in terms of network analysis, import, and visualization of additional data. To include both resources in the same workflow, we created stringApp, a Cytoscape app that makes it easy to import STRING networks into Cytoscape, retains the appearance and many of the features of STRING, and integrates data from associated databases. Here, we introduce many of the stringApp features and show how they can be used to carry out complex network analysis and visualization tasks on a typical proteomics data set, all through the Cytoscape user interface. stringApp is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/stringapp.

  6. Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with...

    • zenodo.org
    zip
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Szymborski; Joseph Szymborski; Amin Emad; Amin Emad (2022). Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks [Dataset]. http://doi.org/10.5281/zenodo.6709790
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph Szymborski; Joseph Szymborski; Amin Emad; Amin Emad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for RAPPPID, a method for the Regularised Automative Prediction of Protein-Protein Interactions using Deep Learning.

    These datasets are in a format that RAPPPID is ready to read.

    Comparatives Dataset
    These datasets were derived from the STRING v11 H. sapiens dataset, according to the C1, C2, and C3 procedures outlined by Park and Marcotte, 2012. Negative samples are sampled randomly from the space of proteins not known to interact. See Szymborski & Emad for details.

    Repeatability Datasets
    The following datasets are all derived from STRING in the manner as the comparatives dataset, but three different random seeds are used for drawing proteins.

    References
    Park,Y. and Marcotte,E.M. (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods, 9, 1134–1136.

    Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., and Mering, C. (2019). String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613.

    Szymborski,J. and Emad,A. (2021) RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks. bioRxiv https://doi.org/10.1101/2021.08.13.456309

  7. i

    STRING

    • integbio.jp
    • opendatalab.com
    Updated Jun 17, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    STRING Consortium (2013). STRING [Dataset]. https://integbio.jp/dbcatalog/en/record/nbdc00690?jtpl=56
    Explore at:
    Dataset updated
    Jun 17, 2013
    Dataset provided by
    STRING Consortium
    License

    http://string-db.org/newstring_cgi/show_download_page.plhttp://string-db.org/newstring_cgi/show_download_page.pl

    Description

    STRING is a database of known and predicted protein interactions, including both physical and functional interactions. It contains data which derived from four sources: genomic context, high-throughput experiments, coexpression and previous knowledge. This database quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. It performs iterative searches and visualizes the results in their genomic context. Many data including protein sequences, protein network, interaction types for protein links, orthologous groups or full database dumps (license required) are downloadable.

  8. OGBN-Proteins (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-Proteins (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-proteins
    Explore at:
    zip(677947148 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    OGBN-Proteins

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-proteins

    Usage in Python

    import os.path as osp
    import pandas as pd
    import torch
    import torch_geometric.transforms as T
    from ogb.nodeproppred import PygNodePropPredDataset
    
    class PygOgbnProteins(PygNodePropPredDataset):
      def _init_(self, meta_csv = None):
        root, name, transform = '/kaggle/input', 'ogbn-proteins', T.ToSparseTensor()
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbn-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
        path = osp.join(self.root, 'split', split_type)
        if osp.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
        if self.is_hetero:
          train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
          for nodetype in train_idx_dict.keys():
            train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
            valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
            test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
            return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
        else:
          train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
          train_idx = torch.from_numpy(train_idx).to(torch.long)
          valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
          valid_idx = torch.from_numpy(valid_idx).to(torch.long)
          test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
          test_idx = torch.from_numpy(test_idx).to(torch.long)
          return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
    
    dataset = PygOgbnProteins()
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-proteins dataset is an undirected, weighted, and typed (according to species) graph. Nodes represent proteins, and edges indicate different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology [1,2]. All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species.

    Prediction task: The task is to predict the presence of protein functions in a multi-label binary classification setup, where there are 112 kinds of labels to predict in total. The performance is measured by the average of ROC-AUC scores across the 112 tasks.

    Dataset splitting: The authors split the protein nodes into training/validation/test sets according to the species which the proteins come from. This enables the evaluation of the generalization performance of the model across different species.

    Note: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.1.1132,53439,561,252SpeciesMulti-label binary classificationROC-AUC

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019. [2] Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

    Disclaimer

    I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.

  9. e

    Data from: Plasma proteomics in epilepsy: network-based identification of...

    • ebi.ac.uk
    Updated Nov 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liisa Arike (2024). Plasma proteomics in epilepsy: network-based identification of proteins associated with seizures [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD057292
    Explore at:
    Dataset updated
    Nov 19, 2024
    Authors
    Liisa Arike
    Variables measured
    Proteomics
    Description

    Purpose Identification of potential biomarkers of seizures. Methods In this exploratory study, we quantified plasma protein intensities in 15 patients with recent seizures compared to 15 patients with long-standing seizure freedom. Using TMT-based proteomics we found fifty-one differentially expressed proteins. Results Network analyses including co-expression networks and protein-protein interaction networks, using the STRING database, followed by network centrality and modularity analyses revealed 22 protein modules, with one module showing a significant association with seizures. The protein-protein interaction network centered around this module identified a subnetwork of 125 proteins, grouped into four clusters. Notably, one cluster (mainly enriching inflammatory pathways and Gene Ontology terms) demonstrated the highest enrichment of known epilepsy-related genes. Conclusion Overall, our network-based approach identified a protein module linked with seizures. The module contained known markers of epilepsy and inflammation. The results also demonstrate the potential of network analysis in discovering new biomarkers for improved epilepsy management.

  10. CreativeWork

    • pfocr.wikipathways.org
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiPathways (2023). CreativeWork [Dataset]. https://pfocr.wikipathways.org/figures/PMC10242111_fcell-11-1165308-g004.html
    Explore at:
    Dataset updated
    Jun 9, 2023
    Dataset authored and provided by
    WikiPathwayshttp://wikipathways.org/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Protein–protein interaction network of the top differentially expressed genes between the patient’s samples and the Ctrl cohort. Edges represent protein–protein associations. Confidence ≥0.700; maximum number of interactors ≤20. Edge confidence: high (0.700) and highest (0.900) (see https://string-db.org/cgi/network).

  11. f

    The optimal dimensions of raw network embedding representations and the...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cen Wan; Domenico Cozzetto; Rui Fa; David T. Jones (2023). The optimal dimensions of raw network embedding representations and the corresponding 3rd hidden layer outputs (a.k.a. the STRING2GO-learnt functional representations) with their corresponding predictive power for biological process terms prediction, and the main characteristics of different STRING networks. [Dataset]. http://doi.org/10.1371/journal.pone.0209958.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Cen Wan; Domenico Cozzetto; Rui Fa; David T. Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The optimal dimensions of raw network embedding representations and the corresponding 3rd hidden layer outputs (a.k.a. the STRING2GO-learnt functional representations) with their corresponding predictive power for biological process terms prediction, and the main characteristics of different STRING networks.

  12. Statistics of the genes in the protein interaction network constructed based...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shunyao Wu; Fengjing Shao; Jun Ji; Rencheng Sun; Rizhuang Dong; Yuanke Zhou; Shaojie Xu; Yi Sui; Jianlong Hu (2023). Statistics of the genes in the protein interaction network constructed based on the STRING database. [Dataset]. http://doi.org/10.1371/journal.pone.0116505.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shunyao Wu; Fengjing Shao; Jun Ji; Rencheng Sun; Rizhuang Dong; Yuanke Zhou; Shaojie Xu; Yi Sui; Jianlong Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics of the genes in the protein interaction network constructed based on the STRING database.

  13. Evaluating homophily of human PPI with respect to chromosomes

    • zenodo.org
    • data.niaid.nih.gov
    bin, txt
    Updated Jul 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicola Apollonio; Daniel Blankenberg; Daniel Blankenberg; Fabio Cumbo; Fabio Cumbo; Paolo Giulio Franciosa; Paolo Giulio Franciosa; Daniele Santoni; Daniele Santoni; Nicola Apollonio (2022). Evaluating homophily of human PPI with respect to chromosomes [Dataset]. http://doi.org/10.5281/zenodo.6941315
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Jul 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nicola Apollonio; Daniel Blankenberg; Daniel Blankenberg; Fabio Cumbo; Fabio Cumbo; Paolo Giulio Franciosa; Paolo Giulio Franciosa; Daniele Santoni; Daniele Santoni; Nicola Apollonio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Homophily/heterophily evaluation, expressed in terms of z-score values, is related to the human Protein-Protein Interaction Network (PPI), obtained from the STRING v11.5 database (https://string-db.org) setting standard threshold on edge score (T=700). Each protein occurring in the PPI was assigned to a class corresponding to the chromosome the related gene belongs to.

    A total of 23 classes (chr1, chr2, ..., chr22, chrX) were considered (excluding the class corresponding to chromosome Y because of the small number of genes occurring in the network).

    The homophily/heterophily nature of the network, with respect to chromosome classes, was evaluated through HONTO tool (https://github.com/cumbof/honto).

    In other words, the tendency of proteins to preferentially interact with proteins whose genes are physically located on the same chromosome (homophily) or on different chromosomes (heterophily) was investigated and evaluated in terms of z-scores.

    Values related to intra (along the diagonal) and inter chromosomal interactions (other than the diagonal) are also reported as a heatmap.

    As one can observe, values occurring in the diagonal are clearly higher than values out of the diagonal, leading to assess a homophilic nature of the network, confirming the link between shared chromosome and interaction in the PPI.

  14. n

    Data from: Determining the minimum number of protein-protein interactions...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Apr 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natsu Nakajima; Morihiro Hayashida; Jesper Jansson; Osamu Maruyama; Tatsuya Akutsu (2018). Determining the minimum number of protein-protein interactions required to support known protein complexes [Dataset]. http://doi.org/10.5061/dryad.8s3682g
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 30, 2018
    Dataset provided by
    The University of Tokyo
    Kyushu University
    Hong Kong Polytechnic University
    National Institute of Technology
    Kyoto University
    Authors
    Natsu Nakajima; Morihiro Hayashida; Jesper Jansson; Osamu Maruyama; Tatsuya Akutsu
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.

  15. Z

    Data from: Citation network of the knowledge co-production literature....

    • data.niaid.nih.gov
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan Arthur (2021). Citation network of the knowledge co-production literature. Supplementary data. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5762450
    Explore at:
    Dataset updated
    Dec 8, 2021
    Dataset provided by
    Megan Arthur
    Rhodri Ivor Leng
    Justyna Bandola-Gill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data description

    This data note describes the final citation network dataset analysed in the manuscript "What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives’"[1].

    The data collection strategy used to construct the following dataset can be found in the associated manuscript [1]. These data were originally downloaded from the Web of Science (WoS) Core Collection via the library subscription of the University of Edinburgh via a systematic search methodology that sought to capture literature relevant to ‘knowledge co-production’. The dataset consists of 1,893 unique document reference strings (nodes) interlinked together by 9,759 citation links (edges). The network dataset describes a directed citation network composed of papers relevant to 'knowledge co-production', and is split into two files: (i) ‘KnowCo_node_attribute_list.csv’ contains attributes of the 1,893 documents (nodes); and (ii) ‘KnowCo_edge_list.csv’ records the citation links (edges) between pairs of documents.

    1. ‘KnowCo_node_attribute_list.csv’ consists of attributes of the 1,893 nodes (documents) of the citation network. Due to the approach used to collect data, there are two types of node: (i) 525 nodes represent documents retrieved from WoS via the systematic search strategy, and these have full attribute data including their reference lists; and (ii) 1,368 documents that were cited >2 times by our 525 fully retrieved papers (see manuscript for full description [1]). The columns refer to:

    Id, the unique identifier. Fully retrieved documents are identified via a unique identifier that begins with ‘f’ followed by an integer (e.g. f1, f2, etc.). Non-retrieved documents are identified via a unique identifier beginning with ‘n’ followed by an integer (e.g. n1, n2, etc.).

    Label, contains the unique reference string of the document for which the attribute data in that row corresponds. Reference strings contain the last name of the first author, publication year, journal, volume, start page, and DOI (if available).

    authors, all author names. These are in the order that these names appear in the authorship list of the corresponding document. These data are only available for fully retrieved documents.

    title, document title. These data are only available for fully retrieved documents.

    journal, journal of publication. These data are only available for fully retrieved documents. For those interested in journal data for the remaining papers, this can be extracted from the reference string in the ‘Label’ column.

    year, year of publication. These data are available for all nodes.

    type, document type (e.g. article, review). Available only for fully retrieved documents.

    wos_total_citations, total citation count as recorded by Web of Science Core Collection as of May 2020. Available only for fully retrieved documents.

    wos_id, Web of Science accession number. Available only for fully retrieved documents only, for non-retrieved documents ‘CitedReference’ fills the cell.

    cluster, provides the cluster membership number as discussed within the manuscript, established via modularity maximisation via the Leiden algorithm (Res 0.8; Q=0.53|5 clusters). Available for all nodes.

    indegree, total count of within network citations to a given document. Due to the composition of the network, this figure tells us the total number of citations from 525 fully retrieved documents to each of the 1,893 documents within the network. Available for all nodes.

    outdegree, total count of within network references from a given document. Due to the composition of the network, only fully retrieved documents can have a value >0 because only these documents have their associated reference list data. Available for all nodes.

    1. ‘KnowCo_edge _list.csv’ is an edge list containing 9,759 citation links between the 1,893 documents. The columns refer to:

    Source, the citing document’s unique identifier.

    Target, the cited document’s unique identifier.

    Notes

    [1] Bandola-Gill, J., Arthur, M., & Leng, R. I. (Under review). What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives. Evidence & Policy

  16. w

    Data from: VERY HIGH-SPEED DRILL STRING COMMUNICATIONS NETWORK

    • data.wu.ac.at
    Updated Sep 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). VERY HIGH-SPEED DRILL STRING COMMUNICATIONS NETWORK [Dataset]. https://data.wu.ac.at/schema/edx_netl_doe_gov/NDNjNzUzN2MtZGNjOS00NDFmLTliNmUtZWJhN2E0M2I3MDZh
    Explore at:
    Dataset updated
    Sep 29, 2016
    Description

    A history and project summary of the development of a very high-speed drill string communications network are given. The summary includes laboratory and field test results, including recent successes of the system in wells in Oklahoma. A brief explanation of commercialization plans is included. The primary conclusion for this work is that a high data rate communications system can be made sufficiently robust, reliable, and transparent to the end user to be successfully deployed in a down-hole drilling environment. A secondary conclusion is that a networking system with user data bandwidth of at least 1 million bits per second can be built to service any practical depth of well using multiple repeaters (Links), with spacing between the Links of at least 1000 ft.

  17. n

    ProtChemSI

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated May 7, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). ProtChemSI [Dataset]. http://identifiers.org/RRID:SCR_006115
    Explore at:
    Dataset updated
    May 7, 2011
    Description

    The database of protein-chemical structural interactions includes all existing 3D structures of complexes of proteins with low molecular weight ligands. When one considers the proteins and chemical vertices of a graph, all these interactions form a network. Biological networks are powerful tools for predicting undocumented relationships between molecules. The underlying principle is that existing interactions between molecules can be used to predict new interactions. For pairs of proteins sharing a common ligand, we use protein and chemical superimpositions combined with fast structural compatibility screens to predict whether additional compounds bound by one protein would bind the other. The current version includes data from the Protein Data Bank as of August 2011. The database is updated monthly.

  18. f

    Summary of GO term-centric results obtained by different network embedding...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cen Wan; Domenico Cozzetto; Rui Fa; David T. Jones (2023). Summary of GO term-centric results obtained by different network embedding representations and corresponding functional representations based on Combinedscore, Textmining, Experimental, Database and Coexpression networks working with different classification algorithms during hold-out evaluation. [Dataset]. http://doi.org/10.1371/journal.pone.0209958.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Cen Wan; Domenico Cozzetto; Rui Fa; David T. Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of GO term-centric results obtained by different network embedding representations and corresponding functional representations based on Combinedscore, Textmining, Experimental, Database and Coexpression networks working with different classification algorithms during hold-out evaluation.

  19. Z

    Data from: A global network of biomedical relationships derived from text

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altman, Russ B. (2020). A global network of biomedical relationships derived from text [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1035252
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Percha, Bethany
    Altman, Russ B.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains labeled, weighted networks of chemical-gene, gene-gene, gene-disease, and chemical-disease relationships based on single sentences in PubMed abstracts. All raw dependency paths are provided in addition to the labeled relationships.

    PART I: Connects dependency paths to labels, or "themes". Each record contains a dependency path followed by its score for each theme, and indicators of whether or not the path is part of the flagship path set for each theme (meaning that it was manually reviewed and determined to reflect that theme). The themes themselves are listed below and are in our paper (reference below).

    PART II: Connects sentences to dependency paths. It consists of sentences and associated metadata, entity pairs found in the sentences, and dependency paths connecting those entity pairs. Each record contains the following information:

    PubMed ID

    Sentence number (0 = title)

    First entity name, formatted

    First entity name, location (characters from start of abstract)

    Second entity name, formatted

    Second entity name, location

    First entity name, raw string

    Second entity name, raw string

    First entity name, database ID(s)

    Second entity name, database ID(s)

    First entity type (Chemical, Gene, Disease)

    Second entity type (Chemical, Gene, Disease)

    Dependency path

    Sentence, tokenized

    The "with-themes.txt" files only contain dependency paths with corresponding theme assignments from Part I. The plain ".txt" files contain all dependency paths.

    This release contains the annotated network for the September 15, 2019 version of PubTator. The version discussed in our paper, below, is an older one - from April 30, 2016. If you're interested in that network, it can be found in Version 1 of this repository. We will be releasing updated networks periodically, as the PubTator community continues to release new versions of named entity annotations for Medline each month or so.

    REFERENCES

    Percha B, Altman RBA (2017) A global network of biomedical relationships derived from text. Bioinformatics, 34(15): 2614-2624. Percha B, Altman RBA (2015) Learning the structure of biomedical relationships from unstructured text. PLoS Computational Biology, 11(7): e1004216.

    This project depends on named entity annotations from the PubTator project: https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

    Reference: Wei CH et. al., PubTator: a Web-based text mining tool for assisting Biocuration, Nucleic acids research, 2013, 41 (W1): W518-W522.

    Dependency parsing was provided by the Stanford CoreNLP toolkit (version 3.9.1): https://stanfordnlp.github.io/CoreNLP/index.html

    Reference: Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

    THEMES

    chemical-gene (A+) agonism, activation (A-) antagonism, blocking (B) binding, ligand (esp. receptors) (E+) increases expression/production (E-) decreases expression/production (E) affects expression/production (neutral) (N) inhibits

    gene-chemical (O) transport, channels (K) metabolism, pharmacokinetics (Z) enzyme activity

    chemical-disease (T) treatment/therapy (including investigatory) (C) inhibits cell growth (esp. cancers) (Sa) side effect/adverse event (Pr) prevents, suppresses (Pa) alleviates, reduces (J) role in disease pathogenesis

    disease-chemical (Mp) biomarkers (of disease progression)

    gene-disease (U) causal mutations (Ud) mutations affecting disease course (D) drug targets (J) role in pathogenesis (Te) possible therapeutic effect (Y) polymorphisms alter risk (G) promotes progression

    disease-gene (Md) biomarkers (diagnostic) (X) overexpression in disease (L) improper regulation linked to disease

    gene-gene (B) binding, ligand (esp. receptors) (W) enhances response (V+) activates, stimulates (E+) increases expression/production (E) affects expression/production (neutral) (I) signaling pathway (H) same protein or complex (Rg) regulation (Q) production by cell population

    FORMATTING NOTE

    A few users have mentioned that the dependency paths in the "part-i" files are all lowercase text, whereas those in the "part-ii" files maintain the case of the original sentence. This complicates mapping between the two sets of files.

    We kept the part-ii files in the same case as the original sentence to facilitate downstream debugging - it's easier to tell which words in a particular sentence are contributing to the dependency path if their original case is maintained. When working with the part-ii "with-themes" files, if you simply convert the dependency path to lowercase, it is guaranteed to match to one of the paths in the corresponding part-i file and you'll be able to get the theme scores.

    Apologies for the additional complexity, and please reach out to us if you have any questions (see correspondence information in the Bioinformatics manuscript, above).

  20. n

    Data from: Osteosarcoma-enriched transcripts paradoxically generate...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Apr 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kexin Li; Qingji Huo; Nathan H. Dimmitt; Guofan Qu; Junjie Bao; Pankita H. Pandya; M. Reza Saadatzadeh; Khadijeh Bijangi-Vishehsaraei; Melissa A. Kacena; Karen E. Pollok; Chien-Chi Lin; Bai-Yan Li; Hiroki Yokota (2023). Osteosarcoma-enriched transcripts paradoxically generate osteosarcoma-suppressing extracellular proteins [Dataset]. http://doi.org/10.5061/dryad.m905qfv4w
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 3, 2023
    Dataset provided by
    Indiana University School of Medicine
    Indiana University – Purdue University Indianapolis
    Third Affiliated Hospital of Harbin Medical University
    Harbin Medical University
    Authors
    Kexin Li; Qingji Huo; Nathan H. Dimmitt; Guofan Qu; Junjie Bao; Pankita H. Pandya; M. Reza Saadatzadeh; Khadijeh Bijangi-Vishehsaraei; Melissa A. Kacena; Karen E. Pollok; Chien-Chi Lin; Bai-Yan Li; Hiroki Yokota
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Osteosarcoma (OS) is the common primary bone cancer that affects mostly children and young adults. To augment the standard-of-care chemotherapy, we examined the possibility of protein-based therapy using mesenchymal stem cells (MSCs)-derived proteomes and osteosarcoma-elevated proteins. While a conditioned medium (CM), collected from MSCs, did not present tumor-suppressing ability, the activation of PKA converted MSCs into induced tumor-suppressing cells (iTSCs). In a mouse model, the direct and hydrogel-assisted administration of CM inhibited tumor-induced bone destruction, and its effect was additive with Cisplatin. CM was enriched with proteins such as Calreticulin, which acted as an extracellular tumor suppressor by interacting with CD47. Notably, the level of Calr transcripts was elevated in OS tissues, together with other tumor-suppressing proteins, including histone H4, and PCOLCE. PCOLCE acted as an extracellular tumor-suppressing protein by interacting with amyloid precursor protein (APP), a prognostic OS marker with poor survival. The results supported the possibility of employing a paradoxical strategy of utilizing OS transcriptomes for the treatment of OS. Methods Here are the procedures for collecting and analyzing in vitro and in vivo data and conducting bioinformatics analysis. In vitro assays

    MTT-based metabolic activity (Figs. 1A, 1D, 2A, 2B, 2D, 2E, 2F, 4G, 4H, 5H, 6B, 7E, and Figure 1-figure supplement 1A, 1C, Figure 5-figure supplement 4): The activity was evaluated using three osteosarcoma cell lines. The optical density was determined at 562 nm using a multi-well spectrophotometer (EL808, BioTek, VT, USA). Data were analyzed in Excel. Transwell invasion (Figs. 1C, 1F, 6D, 7C, and Figure 1-figure supplement 1A, 1D): Images were taken with an inverted optical microscope (magnification, 100x, Nikon, Tokyo, Japan). The average number of stained cells was determined with Image J (National Institutes of Health, Bethesda, MD, USA). Data were analyzed in Excel. Two-dimensional motility (Figs. 1B, 1E, 6E, 7D): Images were taken with an inverted optical microscope. The two-dimensional motility scratch areas were quantified by Image J software. Data were analyzed in Excel. Western blot analysis (Figs. 2C, 4A, 5D, 5E, 5G, 6C, 7A, 7B, 7F, and Figure 5-figure supplement 2B, Figure 5-figure supplement 3, Figure 5-figure supplement 5): A luminescent image analyzer (LAS-3000, Fuji Film, Tokyo, Japan) was used to determine signal intensities for Western blot images. The relevant bands are labeled in PowerPoint. ELISA assay (Figs. 2B, 2C, 2D, 2E, 2F): According to the procedure provided by the manufacturer, protein levels in CW008-treated CM were determined using the ELISA kits (MyBioSource, San Diego, CA, USA). The absorbance of each well was measured at 450 nm using a multi-well spectrophotometer (EL808). Data were analyzed in Excel. Alizarin Red assay (Figure 5-figure supplement 2): Alizarin Red staining was used to visualize calcium deposits. The optical density was measured at 562 nm using a multi-well spectrophotometer (EL808). Data were analyzed in Excel.

    In vivo assays (mouse model)

    X-ray images (Fig. 5A): X-ray imaging was performed using a Faxitron radiographic system (Faxitron X-ray Co., Tucson, AZ, USA). Data were analyzed in PowerPoint. microCT images (Figs. 3, 5B, 5C): Micro-computed tomography was performed using Skyscan 1172 (Bruker-MicroCT, Kontich, Belgium). Scans were performed at pixel size 8.99 μm, and the images were reconstructed using a pair of software tools (nRecon v1.6.9.18, and CTan v1.13). Data were analyzed in Excel. Histology (Figure 3-figure supplement 1 and Figure 3-figure supplement 2): H&E staining was conducted on the sagittal sections, and images were taken with a microscope (U-TV0.63XG, OLYMPUS, Tokyo, Japan). The distribution of tumor cells in the tibial bone cavity was quantified by Image J software. Data were analyzed in Excel. Immunohistochemistry (Figure 3-figure supplement 3 and Figure 5-figure supplement 1): Immunohistochemistry staining was conducted on the sagittal sections, and images were taken with a microscope (U-TV0.63XG, OLYMPUS, Tokyo, Japan). The immunostained area was quantified in the tumor-invaded area by Image J software. Data were analyzed in Excel.

    Bioinformatics

    Survival rate (Figs. 5F, 6F, 6H): Patient survival analyses were obtained from a web-based tool, GEPIA (Gene Expression Profiling Interactive Analysis). Transcript levels (Figs. 6A, and Suppl. File 1): The TCGA (The Cancer Genome Atlas) database was used to predict tumor-suppressing protein candidates via GEPIA. Protein interactions (Fig. 6G): The target protein-protein interaction network was shown by String (String Consortium; string-db.org/network/) via the Uniprot (Universal Protein Resource; uniport.org).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dain Lee (2025). STRING Network Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.29126396.v2

STRING Network Analysis

Explore at:
Dataset updated
May 22, 2025
Dataset provided by
figshare
Authors
Dain Lee
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file contains the protein-protein interaction analysis dataset that was used in the unpublished manuscript and was further analyzed with the STRING online software.Significantly upregulated mRNAs (2,777 genes; p < 0.05) identified by bulk RNA-seq were analyzed using the STRING module in Cytoscape v.2.2.0 (Institute for System Biology; WA; USA). A cluster network was constructed using the MCL algorithm with a granularity parameter of 4, followed by filtering nodes with mcl.cluster > 10. The resulting 1,848 nodes were processed through STRING v12.0 (Swiss Institute of Bioinformatics; Lausanne; Switzerland) to generate a protein–protein interaction (PPI) network, incorporating evidence from text mining, genomic neighborhood, experimental data, curated databases, co-expression, gene fusion, and co-occurrence, with a minimum confidence score threshold of 0.40. Network modules were defined using the DBSCAN clustering algorithm with an ε parameter of 2. Cluster 1, representing the largest gene set (101 genes), was further analyzed by sorting the top 20 nodes with the highest node degree, resulting in a network comprising 101 nodes and 756 edges. Global network metrics indicated an average node degree of 15, a local clustering coefficient of 0.600, and a PPI enrichment p-value of < 1 × 10⁻¹⁶. The average values of coexpression, experimentally determined interactions, automated text mining, and combined scores were calculated.

Search
Clear search
Close search
Google apps
Main menu