100+ datasets found
  1. f

    STRING Network Analysis

    • figshare.com
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dain Lee (2025). STRING Network Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.29126396.v2
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    figshare
    Authors
    Dain Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the protein-protein interaction analysis dataset that was used in the unpublished manuscript and was further analyzed with the STRING online software.Significantly upregulated mRNAs (2,777 genes; p < 0.05) identified by bulk RNA-seq were analyzed using the STRING module in Cytoscape v.2.2.0 (Institute for System Biology; WA; USA). A cluster network was constructed using the MCL algorithm with a granularity parameter of 4, followed by filtering nodes with mcl.cluster > 10. The resulting 1,848 nodes were processed through STRING v12.0 (Swiss Institute of Bioinformatics; Lausanne; Switzerland) to generate a protein–protein interaction (PPI) network, incorporating evidence from text mining, genomic neighborhood, experimental data, curated databases, co-expression, gene fusion, and co-occurrence, with a minimum confidence score threshold of 0.40. Network modules were defined using the DBSCAN clustering algorithm with an ε parameter of 2. Cluster 1, representing the largest gene set (101 genes), was further analyzed by sorting the top 20 nodes with the highest node degree, resulting in a network comprising 101 nodes and 756 edges. Global network metrics indicated an average node degree of 15, a local clustering coefficient of 0.600, and a PPI enrichment p-value of < 1 × 10⁻¹⁶. The average values of coexpression, experimentally determined interactions, automated text mining, and combined scores were calculated.

  2. PPI prediction data (STRING 12.0 based)

    • zenodo.org
    bin, tsv
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin Volzhenin; Konstantin Volzhenin (2024). PPI prediction data (STRING 12.0 based) [Dataset]. http://doi.org/10.5281/zenodo.13936160
    Explore at:
    bin, tsvAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Konstantin Volzhenin; Konstantin Volzhenin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An extensive dataset of binary physical protein-protein interaction extracted from STRING 12.0 (>12,000 organisms) with artificially generated negatives. The dataset includes 72M positive pairs with STRING confidence scores> 0.9 and 720M negative pairs. The corresponding protein sequences are located in the .fasta files. The generation of the negatives was derived from https://doi.org/10.1016/j.isci.2024.110371

  3. STRING database ver.10.5 archive 9606 data sets

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Sugis (2023). STRING database ver.10.5 archive 9606 data sets [Dataset]. http://doi.org/10.6084/m9.figshare.7857185.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Elena Sugis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data collection contains the data sets related to human (9606) that were previously deposed as separate datasets in STRING ver.10.5 before changing the download files structure with release of ver.11.0.

  4. Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with...

    • zenodo.org
    zip
    Updated Jun 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Szymborski; Joseph Szymborski; Amin Emad; Amin Emad (2022). Data for RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks [Dataset]. http://doi.org/10.5281/zenodo.6709790
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph Szymborski; Joseph Szymborski; Amin Emad; Amin Emad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for RAPPPID, a method for the Regularised Automative Prediction of Protein-Protein Interactions using Deep Learning.

    These datasets are in a format that RAPPPID is ready to read.

    Comparatives Dataset
    These datasets were derived from the STRING v11 H. sapiens dataset, according to the C1, C2, and C3 procedures outlined by Park and Marcotte, 2012. Negative samples are sampled randomly from the space of proteins not known to interact. See Szymborski & Emad for details.

    Repeatability Datasets
    The following datasets are all derived from STRING in the manner as the comparatives dataset, but three different random seeds are used for drawing proteins.

    References
    Park,Y. and Marcotte,E.M. (2012) Flaws in evaluation schemes for pair-input computational predictions. Nat Methods, 9, 1134–1136.

    Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., Simonovic, M., Doncheva, N. T., Morris, J. H., Bork, P., Jensen, L. J., and Mering, C. (2019). String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607–D613.

    Szymborski,J. and Emad,A. (2021) RAPPPID: Towards Generalisable Protein Interaction Prediction with AWD-LSTM Twin Networks. bioRxiv https://doi.org/10.1101/2021.08.13.456309

  5. Protein interaction data for 222 BM zone components

    • figshare.com
    xlsx
    Updated Feb 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mychel Morais; Ranjay Jayadev; Rachel Lennon; David Sherwood; Jamie Ellingford; Craig Lawless (2022). Protein interaction data for 222 BM zone components [Dataset]. http://doi.org/10.6084/m9.figshare.19127504.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 6, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mychel Morais; Ranjay Jayadev; Rachel Lennon; David Sherwood; Jamie Ellingford; Craig Lawless
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All human protein interactions were obtained from STRING (https://string-db.org/, version 11.0). Interactions were then filtered to those involving only BM zone proteins. Related to Fig. S6B.

  6. n

    Data from: Determining the minimum number of protein-protein interactions...

    • data.niaid.nih.gov
    • datadryad.org
    Updated Apr 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natsu Nakajima; Morihiro Hayashida; Jesper Jansson; Osamu Maruyama; Tatsuya Akutsu (2018). Determining the minimum number of protein-protein interactions required to support known protein complexes [Dataset]. http://doi.org/10.5061/dryad.8s3682g
    Explore at:
    Dataset updated
    Apr 30, 2018
    Dataset provided by
    Kyushu University
    The University of Tokyo
    National Institute of Technology
    Hong Kong Polytechnic University
    Kyoto University
    Authors
    Natsu Nakajima; Morihiro Hayashida; Jesper Jansson; Osamu Maruyama; Tatsuya Akutsu
    Description

    The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.

  7. f

    Validation of the total new predicted links and the new predicted links...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Zhang; Jia Xu; Yuanyuan Li; Xiufen Zou (2023). Validation of the total new predicted links and the new predicted links associated with the 10 proteins by STRING database for the 14317_PPI data. [Dataset]. http://doi.org/10.1371/journal.pone.0177029.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Wei Zhang; Jia Xu; Yuanyuan Li; Xiufen Zou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Validation of the total new predicted links and the new predicted links associated with the 10 proteins by STRING database for the 14317_PPI data.

  8. h

    bacbench-ppi-stringdb-protein-sequences

    • huggingface.co
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maciej Wiatrak (2025). bacbench-ppi-stringdb-protein-sequences [Dataset]. https://huggingface.co/datasets/macwiatrak/bacbench-ppi-stringdb-protein-sequences
    Explore at:
    Dataset updated
    Jul 27, 2025
    Authors
    Maciej Wiatrak
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset for protein-protein interaction prediction across bacteria (Protein sequences)

    A dataset of 10,533 bacterial genomes across 6,956 species with protein-protein interaction (PPI) scores for each genome. The genome protein sequences and PPI scores have been extracted from STRING DB. Each row contains a set of protein sequences from a genome, ordered by their location on the chromosome and plasmids and a set of associated PPI scores. The PPI scores have been extracted using the… See the full description on the dataset page: https://huggingface.co/datasets/macwiatrak/bacbench-ppi-stringdb-protein-sequences.

  9. h

    string_ppi_human_1M

    • huggingface.co
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vladimir Kovačević (2025). string_ppi_human_1M [Dataset]. https://huggingface.co/datasets/vladak/string_ppi_human_1M
    Explore at:
    Dataset updated
    Aug 11, 2025
    Authors
    Vladimir Kovačević
    Description

    STRING PPI Human 1M

    This dataset contains 1 million human protein–protein interactions (PPIs) derived from STRING v11.5. Columns:

    seq_a, seq_b: Amino acid sequences of the interacting proteins (≤2048 AA). seq_name_a, seq_name_b: Protein names from STRING. score: Combined score from STRING (0–1000, normalized to 0–1). This score integrates various evidence channels (experimental data, text mining, co-expression, etc.) into a single confidence metric. label: Binary interaction label.… See the full description on the dataset page: https://huggingface.co/datasets/vladak/string_ppi_human_1M.

  10. Evaluating homophily of human PPI with respect to chromosomes

    • zenodo.org
    bin, txt
    Updated Jul 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicola Apollonio; Daniel Blankenberg; Daniel Blankenberg; Fabio Cumbo; Fabio Cumbo; Paolo Giulio Franciosa; Paolo Giulio Franciosa; Daniele Santoni; Daniele Santoni; Nicola Apollonio (2022). Evaluating homophily of human PPI with respect to chromosomes [Dataset]. http://doi.org/10.5281/zenodo.6941315
    Explore at:
    txt, binAvailable download formats
    Dataset updated
    Jul 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nicola Apollonio; Daniel Blankenberg; Daniel Blankenberg; Fabio Cumbo; Fabio Cumbo; Paolo Giulio Franciosa; Paolo Giulio Franciosa; Daniele Santoni; Daniele Santoni; Nicola Apollonio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Homophily/heterophily evaluation, expressed in terms of z-score values, is related to the human Protein-Protein Interaction Network (PPI), obtained from the STRING v11.5 database (https://string-db.org) setting standard threshold on edge score (T=700). Each protein occurring in the PPI was assigned to a class corresponding to the chromosome the related gene belongs to.

    A total of 23 classes (chr1, chr2, ..., chr22, chrX) were considered (excluding the class corresponding to chromosome Y because of the small number of genes occurring in the network).

    The homophily/heterophily nature of the network, with respect to chromosome classes, was evaluated through HONTO tool (https://github.com/cumbof/honto).

    In other words, the tendency of proteins to preferentially interact with proteins whose genes are physically located on the same chromosome (homophily) or on different chromosomes (heterophily) was investigated and evaluated in terms of z-scores.

    Values related to intra (along the diagonal) and inter chromosomal interactions (other than the diagonal) are also reported as a heatmap.

    As one can observe, values occurring in the diagonal are clearly higher than values out of the diagonal, leading to assess a homophilic nature of the network, confirming the link between shared chromosome and interaction in the PPI.

  11. f

    Selection of 30 central genes from PPI network, including 17 upregulated and...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Ping; Wang, Xiaoming; Chen, Xuewei; Dong, Lijin; Fan, Rong (2021). Selection of 30 central genes from PPI network, including 17 upregulated and 13 downregulated genes, by using the STRING and Cytoscape software. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000899333
    Explore at:
    Dataset updated
    Jun 11, 2021
    Authors
    Li, Ping; Wang, Xiaoming; Chen, Xuewei; Dong, Lijin; Fan, Rong
    Description

    Selection of 30 central genes from PPI network, including 17 upregulated and 13 downregulated genes, by using the STRING and Cytoscape software.

  12. Number of proteins and known PPIs per species in BIOGRID. (version 3.5.171)....

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stavros Makrodimitris; Marcel Reinders; Roeland van Ham (2023). Number of proteins and known PPIs per species in BIOGRID. (version 3.5.171). [Dataset]. http://doi.org/10.1371/journal.pone.0242723.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Stavros Makrodimitris; Marcel Reinders; Roeland van Ham
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of proteins and known PPIs per species in BIOGRID. (version 3.5.171).

  13. Analysis of associations between cancer driver mutations and PPI neighbours

    • zenodo.org
    zip
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Márcia Vital; Márcia Vital; João André Isidoro Miranda; João André Isidoro Miranda; Margarida Carrolo; Margarida Carrolo; Francisco Pinto; Francisco Pinto (2025). Analysis of associations between cancer driver mutations and PPI neighbours [Dataset]. http://doi.org/10.5281/zenodo.16322756
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Márcia Vital; Márcia Vital; João André Isidoro Miranda; João André Isidoro Miranda; Margarida Carrolo; Margarida Carrolo; Francisco Pinto; Francisco Pinto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data files necessary to reproduce the analysis of associations between cancer driver mutation status and its protein physical interactors. Original research article available at: https://www.biorxiv.org/content/10.1101/2025.01.18.633728v1

    Code for the analysis is available at: https://github.com/GamaPintoLab/driver_neighbours. Clone the code repository and extract the data folder to the directory's root.

    Raw data files were retrieved from the TCGA Pan Cancer cohort through the UCSC Xena online tool (https://pancanatlas.xenahubs.net):

    Protein-protein interaction data:

    Cancer driver genes from the Network of Cancer Genes and Heathy Drivers (NCG7.0): http://network-cancer-genes.org/

    Consensus RNA data from The Human Protein Atlas (version 24.0): https://www.proteinatlas.org/humanproteome/tissue/data#consensus_tissues_rna

    The opentargets folder contains files from the Open Targets Platform (Version 24.09): (https://platform.opentargets.org/)

    Additionally, we also include a processed folder, which includes all generated files required to reproduce the figures in the mentioned article, without the necessity of running all the provided code.

  14. w

    CreativeWork

    • pfocr.wikipathways.org
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WikiPathways (2024). CreativeWork [Dataset]. https://pfocr.wikipathways.org/figures/PMC10951504_gr5.html
    Explore at:
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    WikiPathways
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The underlying mechanism of obesity and hyperuricemia. (A) Protein-protein interaction (PPI) network obtained from STRING database and constructed by Cytoscape. Each node size and color depth are proportional to their node degree. Edge width is proportional to the edge betweenness. (B) Grouping of KEGG enrichment analysis of the 235 intersected targets of obesity leading to hyperuricemia. Functionally related groups partially overlap. KEGG pathway is represented as a node. The nodes of a group are labeled in the same color. Two groups share the nodes with two colors

  15. Supplemental Table S5

    • figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abraham Moller (2023). Supplemental Table S5 [Dataset]. http://doi.org/10.6084/m9.figshare.13355945.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Abraham Moller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary statistics for protein-protein interaction networks identified with STRING amongst genes corresponding to significant SNPs or k-mers (inside or adjacent to genes). PPI enrichment p-value corresponds to the likelihood nodes and edges would be selected from the S. aureus database by chance.

  16. f

    Table_1_Protein-Protein Interactions in Candida albicans.xlsx

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Aug 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schoeters, Floris; Van Dijck, Patrick (2019). Table_1_Protein-Protein Interactions in Candida albicans.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000118969
    Explore at:
    Dataset updated
    Aug 7, 2019
    Authors
    Schoeters, Floris; Van Dijck, Patrick
    Description

    Despite being one of the most important human fungal pathogens, Candida albicans has not been studied extensively at the level of protein-protein interactions (PPIs) and data on PPIs are not readily available in online databases. In January 2018, the database called “Biological General Repository for Interaction Datasets (BioGRID)” that contains the most PPIs for C. albicans, only documented 188 physical or direct PPIs (release 3.4.156) while several more can be found in the literature. Other databases such as the String database, the Molecular INTeraction Database (MINT), and the Database for Interacting Proteins (DIP) database contain even fewer interactions or do not even include C. albicans as a searchable term. Because of the non-canonical codon usage of C. albicans where CUG is translated as serine rather than leucine, it is often problematic to use the yeast two-hybrid system in Saccharomyces cerevisiae to study C. albicans PPIs. However, studying PPIs is crucial to gain a thorough understanding of the function of proteins, biological processes and pathways. PPIs can also be potential drug targets. To aid in creating PPI networks and updating the BioGRID, we performed an exhaustive literature search in order to provide, in an accessible format, a more extensive list of known PPIs in C. albicans.

  17. S

    Data Mining of Primary Sclerosing Cholangitis in the GEO Database

    • scidb.cn
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong.LI; Jiao.Mu; Yuling.Liu (2024). Data Mining of Primary Sclerosing Cholangitis in the GEO Database [Dataset]. http://doi.org/10.57760/sciencedb.j00217.03668
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Hong.LI; Jiao.Mu; Yuling.Liu
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Data from the Gene Expression Omnibus (GEO) database (GSE119600) for PSC were downloaded and analyzed using R software to identify differentially expressed genes (DEGs). Online analysis tools were employed for Gene Ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. The STRING database (https://www.string-db.org) was used for protein-protein interaction (PPI) analysis to identify key genes in PSC, and ssGSEA was used to analyze immune cell infiltration.

  18. Microarray and bioinformatic analysis of conventional ameloblastoma

    • data.scielo.org
    jpeg, txt, xlsx
    Updated Dec 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez (2022). Microarray and bioinformatic analysis of conventional ameloblastoma [Dataset]. http://doi.org/10.48331/SCIELODATA.Z2S8X9
    Explore at:
    xlsx(10317), jpeg(3415112), xlsx(9969), jpeg(12173968), txt(605), txt(289), txt(3840), xlsx(9964), xlsx(12458), txt(2657), txt(18077), xlsx(10402), jpeg(2313098), txt(406), txt(1023)Available download formats
    Dataset updated
    Dec 20, 2022
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Luis Fernando Jacinto-Alemán; Luis Fernando Jacinto-Alemán; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez; Javier Portilla-Robertson; Elba Rosa Leyva-Huerta; Josué Orlando Ramírez-Jarquín; Francisco Germán Villanueva-Sánchez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    National Autonomous University of Mexico
    Description

    Ameloblastoma is a highly aggressive odontogenic tumor, and its pathogenesis is associated with multiple participating genes. Objective: Our aim was to identify and validate new critical genes of conventional ameloblastoma using microarray and bioinformatics analysis. Methods: Gene expression microarray and bioinformatic analysis were performed to use CHIP H10KA and DAVID software for enrichment. Protein-protein interactions (PPI) were visualized using STRING-Cytoscape with MCODE plugin, followed by Kaplan-Meier and GEPIA analysis that were employed for the candidate's postulation. RT-qPCR and IHC assays were performed to validate the bioinformatic approach. Results: 376 upregulated genes were identified. PPI analysis revealed 14 genes that were validated by Kaplan-Meier and GEPIA resulting in PDGFA and IL2RA as candidate genes. The RT-qPCR analysis confirmed their intense expression. Immunohistochemistry analysis showed that PDGFA expression is parenchyma located. Conclusion: With bioinformatics methods, we can identify upregulated genes in conventional ameloblastoma, and with RT-qPCR and immunoexpression analysis validate that PDGFA could be a more specific and localized therapeutic target.

  19. Data from: Identification of transcription factor co-binding patterns with...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ieva Rauluseviciute; Ieva Rauluseviciute; Timothée Launay; Guido Barzaghi; Guido Barzaghi; Sarvesh Nikumbh; Sarvesh Nikumbh; Boris Lenhard; Boris Lenhard; Arnaud Krebs; Arnaud Krebs; Jaime A. Castro-Mondragon; Jaime A. Castro-Mondragon; Anthony Mathelier; Anthony Mathelier; Timothée Launay (2023). Identification of transcription factor co-binding patterns with non-negative matrix factorization [Dataset]. http://doi.org/10.5281/zenodo.7681483
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ieva Rauluseviciute; Ieva Rauluseviciute; Timothée Launay; Guido Barzaghi; Guido Barzaghi; Sarvesh Nikumbh; Sarvesh Nikumbh; Boris Lenhard; Boris Lenhard; Arnaud Krebs; Arnaud Krebs; Jaime A. Castro-Mondragon; Jaime A. Castro-Mondragon; Anthony Mathelier; Anthony Mathelier; Timothée Launay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains pre-processed data required to reproduce the results of the paper "Identification of transcription factor co-binding patterns with non-negative matrix factorization". The repository with the code can be found here: https://bitbucket.org/CBGR/cobind_manuscript/src/master/. Data include transcription factor binding sites (TFBSs) for 7 species from UniBind 2021 database, joined motif collection from CIS-BP and JASPAR 2022 databases and corresponding physical protein-protein interaction (PPI) data from STRING database.

  20. f

    Additional file 3 of A method for estimating coherence of molecular...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail G. Dozmorov; Kellen G. Cresswell; Silviu-Alin Bacanu; Carl Craver; Mark Reimers; Kenneth S. Kendler (2023). Additional file 3 of A method for estimating coherence of molecular mechanisms in major human disease and traits [Dataset]. http://doi.org/10.6084/m9.figshare.13126766.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Mikhail G. Dozmorov; Kellen G. Cresswell; Silviu-Alin Bacanu; Carl Craver; Mark Reimers; Kenneth S. Kendler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3: Phenotype-specific coherence estimates. The PPI database (STRING, STRING filtered, Biogrid) are specified in the corresponding column names. For each phenotype category, the results are sorted alphabetically. In addition to the normalized coherence and the permutation p-values indicating significant difference of phenotype-specific coherence from random, modularity, untransformed slopes, and the corresponding lower and upper confidence interval bounds are shown. “NA” indicates the corresponding value cannot be estimated due to lack of sufficient number of genes annotated with PPIs.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dain Lee (2025). STRING Network Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.29126396.v2

STRING Network Analysis

Explore at:
Dataset updated
May 22, 2025
Dataset provided by
figshare
Authors
Dain Lee
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file contains the protein-protein interaction analysis dataset that was used in the unpublished manuscript and was further analyzed with the STRING online software.Significantly upregulated mRNAs (2,777 genes; p < 0.05) identified by bulk RNA-seq were analyzed using the STRING module in Cytoscape v.2.2.0 (Institute for System Biology; WA; USA). A cluster network was constructed using the MCL algorithm with a granularity parameter of 4, followed by filtering nodes with mcl.cluster > 10. The resulting 1,848 nodes were processed through STRING v12.0 (Swiss Institute of Bioinformatics; Lausanne; Switzerland) to generate a protein–protein interaction (PPI) network, incorporating evidence from text mining, genomic neighborhood, experimental data, curated databases, co-expression, gene fusion, and co-occurrence, with a minimum confidence score threshold of 0.40. Network modules were defined using the DBSCAN clustering algorithm with an ε parameter of 2. Cluster 1, representing the largest gene set (101 genes), was further analyzed by sorting the top 20 nodes with the highest node degree, resulting in a network comprising 101 nodes and 756 edges. Global network metrics indicated an average node degree of 15, a local clustering coefficient of 0.600, and a PPI enrichment p-value of < 1 × 10⁻¹⁶. The average values of coexpression, experimentally determined interactions, automated text mining, and combined scores were calculated.

Search
Clear search
Close search
Google apps
Main menu