9 datasets found
  1. Data from: A global network of biomedical relationships derived from text

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bethany Percha; Russ B. Altman; Bethany Percha; Russ B. Altman (2020). A global network of biomedical relationships derived from text [Dataset]. http://doi.org/10.5281/zenodo.3459420
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bethany Percha; Russ B. Altman; Bethany Percha; Russ B. Altman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains labeled, weighted networks of chemical-gene, gene-gene, gene-disease, and chemical-disease relationships based on single sentences in PubMed abstracts. All raw dependency paths are provided in addition to the labeled relationships.

    PART I: Connects dependency paths to labels, or "themes". Each record contains a dependency path followed by its score for each theme, and indicators of whether or not the path is part of the flagship path set for each theme (meaning that it was manually reviewed and determined to reflect that theme). The themes themselves are listed below and are in our paper (reference below).

    PART II: Connects sentences to dependency paths. It consists of sentences and associated metadata, entity pairs found in the sentences, and dependency paths connecting those entity pairs. Each record contains the following information:

    • PubMed ID
    • Sentence number (0 = title)
    • First entity name, formatted
    • First entity name, location (characters from start of abstract)
    • Second entity name, formatted
    • Second entity name, location
    • First entity name, raw string
    • Second entity name, raw string
    • First entity name, database ID(s)
    • Second entity name, database ID(s)
    • First entity type (Chemical, Gene, Disease)
    • Second entity type (Chemical, Gene, Disease)
    • Dependency path
    • Sentence, tokenized

    The "with-themes.txt" files only contain dependency paths with corresponding theme assignments from Part I. The plain ".txt" files contain all dependency paths.

    This release contains the annotated network for the September 15, 2019 version of PubTator. The version discussed in our paper, below, is an older one - from April 30, 2016. If you're interested in that network, it can be found in Version 1 of this repository. We will be releasing updated networks periodically, as the PubTator community continues to release new versions of named entity annotations for Medline each month or so.

    ------------------------------------------------------------------------------------
    REFERENCES

    Percha B, Altman RBA (2017) A global network of biomedical relationships derived from text. Bioinformatics, 34(15): 2614-2624.
    Percha B, Altman RBA (2015) Learning the structure of biomedical relationships from unstructured text. PLoS Computational Biology, 11(7): e1004216.

    This project depends on named entity annotations from the PubTator project:
    https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

    Reference:
    Wei CH et. al., PubTator: a Web-based text mining tool for assisting Biocuration, Nucleic acids research, 2013, 41 (W1): W518-W522.

    Dependency parsing was provided by the Stanford CoreNLP toolkit (version 3.9.1):
    https://stanfordnlp.github.io/CoreNLP/index.html

    Reference:
    Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

    ------------------------------------------------------------------------------------
    THEMES

    chemical-gene
    (A+) agonism, activation
    (A-) antagonism, blocking
    (B) binding, ligand (esp. receptors)
    (E+) increases expression/production
    (E-) decreases expression/production
    (E) affects expression/production (neutral)
    (N) inhibits

    gene-chemical
    (O) transport, channels
    (K) metabolism, pharmacokinetics
    (Z) enzyme activity

    chemical-disease
    (T) treatment/therapy (including investigatory)
    (C) inhibits cell growth (esp. cancers)
    (Sa) side effect/adverse event
    (Pr) prevents, suppresses
    (Pa) alleviates, reduces
    (J) role in disease pathogenesis

    disease-chemical
    (Mp) biomarkers (of disease progression)

    gene-disease
    (U) causal mutations
    (Ud) mutations affecting disease course
    (D) drug targets
    (J) role in pathogenesis
    (Te) possible therapeutic effect
    (Y) polymorphisms alter risk
    (G) promotes progression

    disease-gene
    (Md) biomarkers (diagnostic)
    (X) overexpression in disease
    (L) improper regulation linked to disease

    gene-gene
    (B) binding, ligand (esp. receptors)
    (W) enhances response
    (V+) activates, stimulates
    (E+) increases expression/production
    (E) affects expression/production (neutral)
    (I) signaling pathway
    (H) same protein or complex
    (Rg) regulation
    (Q) production by cell population

    ------------------------------------------------------------------------------------
    FORMATTING NOTE

    A few users have mentioned that the dependency paths in the "part-i" files are all lowercase text, whereas those in the "part-ii" files maintain the case of the original sentence. This complicates mapping between the two sets of files.

    We kept the part-ii files in the same case as the original sentence to facilitate downstream debugging - it's easier to tell which words in a particular sentence are contributing to the dependency path if their original case is maintained. When working with the part-ii "with-themes" files, if you simply convert the dependency path to lowercase, it is guaranteed to match to one of the paths in the corresponding part-i file and you'll be able to get the theme scores.

    Apologies for the additional complexity, and please reach out to us if you have any questions (see correspondence information in the Bioinformatics manuscript, above).

  2. f

    Network metrics for Protein-Protein Interaction (PPI). The table shows...

    • plos.figshare.com
    xls
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhizhong Wang; Sen Xu; Ailong Lin; Chunxian Wei; Zhiyong Li; Yingchun Chen; Bizhou Bie; Ling Liu (2025). Network metrics for Protein-Protein Interaction (PPI). The table shows important network measures for certain proteins associated with vascular dementia. Degree shows how many links each protein has, Betweenness Centrality shows how it acts as a network hub, Clustering Coefficient shows how connected its neighbours are, and Edge Confidence Score shows how reliable the interaction is based on estimates from the STRING database. [Dataset]. http://doi.org/10.1371/journal.pone.0331787.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhizhong Wang; Sen Xu; Ailong Lin; Chunxian Wei; Zhiyong Li; Yingchun Chen; Bizhou Bie; Ling Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Network metrics for Protein-Protein Interaction (PPI). The table shows important network measures for certain proteins associated with vascular dementia. Degree shows how many links each protein has, Betweenness Centrality shows how it acts as a network hub, Clustering Coefficient shows how connected its neighbours are, and Edge Confidence Score shows how reliable the interaction is based on estimates from the STRING database.

  3. Data from MyGene2.

    • plos.figshare.com
    csv
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael S. Bradshaw; Connor Gibbs; Skylar Martin; Taylor Firman; Alisa Gaskell; Bailey Fosdick; Ryan Layer (2024). Data from MyGene2. [Dataset]. http://doi.org/10.1371/journal.pone.0309205.s005
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael S. Bradshaw; Connor Gibbs; Skylar Martin; Taylor Firman; Alisa Gaskell; Bailey Fosdick; Ryan Layer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient’s genome and their phenotypes in the literature, then the patient will remain undiagnosed. When a direct variant-phenotype connection is not known, putting a patient’s information in the larger context of phenotype relationships and protein-protein interactions may provide an opportunity to find an indirect explanation. Databases such as STRING contain millions of protein-protein interactions, and the Human Phenotype Ontology (HPO) contains the relations of thousands of phenotypes. By integrating these networks and clustering the entities within, we can potentially discover latent gene-to-phenotype connections. The historical records for STRING and HPO provide a unique opportunity to create a network time series for evaluating the cluster significance. Most excitingly, working with Children’s Hospital Colorado, we have provided promising hypotheses about latent gene-to-phenotype connections for 38 patients. We also provide potential answers for 14 patients listed on MyGene2. Clusters our tool finds significant harbor 2.35 to 8.72 times as many gene-to-phenotype edges inferred from known drug interactions than clusters found to be insignificant. Our tool, BOCC, is available as a web app and command line tool.

  4. f

    Network Properties of KEGG and STRING.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashis Saha; Aik Choon Tan; Jaewoo Kang (2023). Network Properties of KEGG and STRING. [Dataset]. http://doi.org/10.1371/journal.pone.0084227.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ashis Saha; Aik Choon Tan; Jaewoo Kang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ‘Total Nodes’ column contains the total number of nodes available in the network while the ‘Gene (Protein) Nodes’ column shows the number of nodes with at least one gene in KEGG (or one protein in STIRING). The fourth and fifth columns contain the total number of edges, and the number of connected components having at least one gene (or protein), respectively. ‘Avg. Node Degree’ represents the number of edges a node has on average. ‘Max Node Degree’ denotes the maximum number of edges a node has in the network. ‘Clustering Coefficient’ is the ratio of the triangles to the connected triples in a graph.

  5. e

    Data from: Prognostic Markers in Triple-Negative Breast Cancer

    • ebi.ac.uk
    • data.niaid.nih.gov
    Updated Jun 5, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Baxter (2019). Prognostic Markers in Triple-Negative Breast Cancer [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD013397
    Explore at:
    Dataset updated
    Jun 5, 2019
    Authors
    Robert Baxter
    Variables measured
    Proteomics
    Description

    There are no widely-accepted prognostic markers currently available to predict outcomes in patients with triple-negative breast cancer (TNBC), and no targeted therapies with confirmed benefit. We have used MALDI mass spectrometry imaging (MSI) of tryptic peptides to compare regions of cancer and benign tissue in 10 formalin-fixed, paraffin-embedded sections of TNBC tumors. Proteins were identified by reference to a peptide library constructed by LC-MALDI-MS/MS analyses of the same tissues. The prognostic significance of proteins that distinguished between cancer and benign regions was estimated by Kaplan-Meier analysis of their gene expression from public databases. Among peptides that distinguished between cancer and benign tissue in at least 3 tissues with a ROC area under the curve >0.7, 14 represented proteins identified from the reference library, including proteins not previously associated with breast cancer. Initial network analysis using the STRING database showed no obvious functional relationships except among collagen subunits COL1A1, COL1A2, and COL63A, but manual curation, including the addition of EGFR to the analysis, revealed a unique network connecting 10 of the 14 proteins. Kaplan-Meier survival analysis to examine the relationship between tumor expression of genes encoding the 14 proteins, and recurrence-free survival (RFS) in patients with basal-like TNBC showed that, compared to low expression, high expression of nine of the genes was associated with significantly worse RFS, most with hazard ratios >2. In contrast, in estrogen receptor-positive tumors, high expression of these genes showed only low, or no, association with worse RFS. These proteins are proposed as putative markers of RFS in TNBC, and some may also be considered as possible targets for future therapies.

  6. Preferential Duplication of Intermodular Hub Genes: An Evolutionary...

    • plos.figshare.com
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo M. Ferreira; José Luiz Rybarczyk-Filho; Rodrigo J. S. Dalmolin; Mauro A. A. Castro; José C. F. Moreira; Leonardo G. Brunnet; Rita M. C. de Almeida (2023). Preferential Duplication of Intermodular Hub Genes: An Evolutionary Signature in Eukaryotes Genome Networks [Dataset]. http://doi.org/10.1371/journal.pone.0056579
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ricardo M. Ferreira; José Luiz Rybarczyk-Filho; Rodrigo J. S. Dalmolin; Mauro A. A. Castro; José C. F. Moreira; Leonardo G. Brunnet; Rita M. C. de Almeida
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Whole genome protein-protein association networks are not random and their topological properties stem from genome evolution mechanisms. In fact, more connected, but less clustered proteins are related to genes that, in general, present more paralogs as compared to other genes, indicating frequent previous gene duplication episodes. On the other hand, genes related to conserved biological functions present few or no paralogs and yield proteins that are highly connected and clustered. These general network characteristics must have an evolutionary explanation. Considering data from STRING database, we present here experimental evidence that, more than not being scale free, protein degree distributions of organisms present an increased probability for high degree nodes. Furthermore, based on this experimental evidence, we propose a simulation model for genome evolution, where genes in a network are either acquired de novo using a preferential attachment rule, or duplicated with a probability that linearly grows with gene degree and decreases with its clustering coefficient. For the first time a model yields results that simultaneously describe different topological distributions. Also, this model correctly predicts that, to produce protein-protein association networks with number of links and number of nodes in the observed range for Eukaryotes, it is necessary 90% of gene duplication and 10% of de novo gene acquisition. This scenario implies a universal mechanism for genome evolution.

  7. List of reviewed studies or tools related to clustering algorithms for...

    • plos.figshare.com
    xls
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael S. Bradshaw; Connor Gibbs; Skylar Martin; Taylor Firman; Alisa Gaskell; Bailey Fosdick; Ryan Layer (2024). List of reviewed studies or tools related to clustering algorithms for biological networks. [Dataset]. http://doi.org/10.1371/journal.pone.0309205.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael S. Bradshaw; Connor Gibbs; Skylar Martin; Taylor Firman; Alisa Gaskell; Bailey Fosdick; Ryan Layer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is nowhere near an exhaustive list of papers or tools on the topic. It is not intended to be a systematic review but highlights the breadth and general shift in the methods up to the present.

  8. Number of g2p co-occurring pairs in BOCC clusters and the number of those...

    • plos.figshare.com
    xls
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael S. Bradshaw; Connor Gibbs; Skylar Martin; Taylor Firman; Alisa Gaskell; Bailey Fosdick; Ryan Layer (2024). Number of g2p co-occurring pairs in BOCC clusters and the number of those patients whose number of co-occurring pairs is significant compared to the HPO list shuffle null model. [Dataset]. http://doi.org/10.1371/journal.pone.0309205.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael S. Bradshaw; Connor Gibbs; Skylar Martin; Taylor Firman; Alisa Gaskell; Bailey Fosdick; Ryan Layer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Breakdowns are given for the four sets of clusters identified by the corresponding predictive models.

  9. Table_2_Transcriptome Analysis Revealed a Highly Connected Gene Module...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shan Shan; Wei Chen; Ji-dong Jia (2023). Table_2_Transcriptome Analysis Revealed a Highly Connected Gene Module Associated With Cirrhosis to Hepatocellular Carcinoma Development.XLSX [Dataset]. http://doi.org/10.3389/fgene.2019.00305.s008
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Shan Shan; Wei Chen; Ji-dong Jia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionCirrhosis is one of the most important risk factors for development of hepatocellular carcinoma (HCC). Recent studies have shown that removal or well control of the underlying cause could reduce but not eliminate the risk of HCC. Therefore, it is important to elucidate the molecular mechanisms that drive the progression of cirrhosis to HCC.Materials and MethodsMicroarray datasets incorporating cirrhosis and HCC subjects were identified from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were determined by GEO2R software. Functional enrichment analysis was performed by the clusterProfiler package in R. Liver carcinogenesis-related networks and modules were established using STRING database and MCODE plug-in, respectively, which were visualized with Cytoscape software. The ability of modular gene signatures to discriminate cirrhosis from HCC was assessed by hierarchical clustering, principal component analysis (PCA), and receiver operating characteristic (ROC) curve. Association of top modular genes and HCC grades or prognosis was analyzed with the UALCAN web-tool. Protein expression and distribution of top modular genes were analyzed using the Human Protein Atlas database.ResultsFour microarray datasets were retrieved from GEO database. Compared with cirrhotic livers, 125 upregulated and 252 downregulated genes in HCC tissues were found. These DEGs constituted a liver carcinogenesis-related network with 272 nodes and 2954 edges, with 65 nodes being highly connected and formed a liver carcinogenesis-related module. The modular genes were significantly involved in several KEGG pathways, such as “cell cycle,” “DNA replication,” “p53 signaling pathway,” “mismatch repair,” “base excision repair,” etc. These identified modular gene signatures could robustly discriminate cirrhosis from HCC in the validation dataset. In contrast, the expression pattern of the modular genes was consistent between cirrhotic and normal livers. The top modular genes TOP2A, CDC20, PRC1, CCNB2, and NUSAP1 were associated with HCC onset, progression, and prognosis, and exhibited higher expression in HCC compared with normal livers in the HPA database.ConclusionOur study revealed a highly connected module associated with liver carcinogenesis on a cirrhotic background, which may provide deeper understanding of the genetic alterations involved in the transition from cirrhosis to HCC, and offer valuable variables for screening and surveillance of HCC in high-risk patients with cirrhosis.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bethany Percha; Russ B. Altman; Bethany Percha; Russ B. Altman (2020). A global network of biomedical relationships derived from text [Dataset]. http://doi.org/10.5281/zenodo.3459420
Organization logo

Data from: A global network of biomedical relationships derived from text

Related Article
Explore at:
application/gzipAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bethany Percha; Russ B. Altman; Bethany Percha; Russ B. Altman
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This repository contains labeled, weighted networks of chemical-gene, gene-gene, gene-disease, and chemical-disease relationships based on single sentences in PubMed abstracts. All raw dependency paths are provided in addition to the labeled relationships.

PART I: Connects dependency paths to labels, or "themes". Each record contains a dependency path followed by its score for each theme, and indicators of whether or not the path is part of the flagship path set for each theme (meaning that it was manually reviewed and determined to reflect that theme). The themes themselves are listed below and are in our paper (reference below).

PART II: Connects sentences to dependency paths. It consists of sentences and associated metadata, entity pairs found in the sentences, and dependency paths connecting those entity pairs. Each record contains the following information:

  • PubMed ID
  • Sentence number (0 = title)
  • First entity name, formatted
  • First entity name, location (characters from start of abstract)
  • Second entity name, formatted
  • Second entity name, location
  • First entity name, raw string
  • Second entity name, raw string
  • First entity name, database ID(s)
  • Second entity name, database ID(s)
  • First entity type (Chemical, Gene, Disease)
  • Second entity type (Chemical, Gene, Disease)
  • Dependency path
  • Sentence, tokenized

The "with-themes.txt" files only contain dependency paths with corresponding theme assignments from Part I. The plain ".txt" files contain all dependency paths.

This release contains the annotated network for the September 15, 2019 version of PubTator. The version discussed in our paper, below, is an older one - from April 30, 2016. If you're interested in that network, it can be found in Version 1 of this repository. We will be releasing updated networks periodically, as the PubTator community continues to release new versions of named entity annotations for Medline each month or so.

------------------------------------------------------------------------------------
REFERENCES

Percha B, Altman RBA (2017) A global network of biomedical relationships derived from text. Bioinformatics, 34(15): 2614-2624.
Percha B, Altman RBA (2015) Learning the structure of biomedical relationships from unstructured text. PLoS Computational Biology, 11(7): e1004216.

This project depends on named entity annotations from the PubTator project:
https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/

Reference:
Wei CH et. al., PubTator: a Web-based text mining tool for assisting Biocuration, Nucleic acids research, 2013, 41 (W1): W518-W522.

Dependency parsing was provided by the Stanford CoreNLP toolkit (version 3.9.1):
https://stanfordnlp.github.io/CoreNLP/index.html

Reference:
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

------------------------------------------------------------------------------------
THEMES

chemical-gene
(A+) agonism, activation
(A-) antagonism, blocking
(B) binding, ligand (esp. receptors)
(E+) increases expression/production
(E-) decreases expression/production
(E) affects expression/production (neutral)
(N) inhibits

gene-chemical
(O) transport, channels
(K) metabolism, pharmacokinetics
(Z) enzyme activity

chemical-disease
(T) treatment/therapy (including investigatory)
(C) inhibits cell growth (esp. cancers)
(Sa) side effect/adverse event
(Pr) prevents, suppresses
(Pa) alleviates, reduces
(J) role in disease pathogenesis

disease-chemical
(Mp) biomarkers (of disease progression)

gene-disease
(U) causal mutations
(Ud) mutations affecting disease course
(D) drug targets
(J) role in pathogenesis
(Te) possible therapeutic effect
(Y) polymorphisms alter risk
(G) promotes progression

disease-gene
(Md) biomarkers (diagnostic)
(X) overexpression in disease
(L) improper regulation linked to disease

gene-gene
(B) binding, ligand (esp. receptors)
(W) enhances response
(V+) activates, stimulates
(E+) increases expression/production
(E) affects expression/production (neutral)
(I) signaling pathway
(H) same protein or complex
(Rg) regulation
(Q) production by cell population

------------------------------------------------------------------------------------
FORMATTING NOTE

A few users have mentioned that the dependency paths in the "part-i" files are all lowercase text, whereas those in the "part-ii" files maintain the case of the original sentence. This complicates mapping between the two sets of files.

We kept the part-ii files in the same case as the original sentence to facilitate downstream debugging - it's easier to tell which words in a particular sentence are contributing to the dependency path if their original case is maintained. When working with the part-ii "with-themes" files, if you simply convert the dependency path to lowercase, it is guaranteed to match to one of the paths in the corresponding part-i file and you'll be able to get the theme scores.

Apologies for the additional complexity, and please reach out to us if you have any questions (see correspondence information in the Bioinformatics manuscript, above).

Search
Clear search
Close search
Google apps
Main menu