10 datasets found
  1. f

    Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional...

    • frontiersin.figshare.com
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei (2023). Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSX [Dataset]. http://doi.org/10.3389/fbioe.2020.00267.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Deep learning is an effective method to capture drug-target binding affinity, but low accuracy is still an obstacle to be overcome. Thus, we propose a novel predictor for drug-target binding affinity based on dipeptide frequency of word frequency encoding and a hybrid graph convolutional network. Word frequency characteristics of natural language are used to improve the frequency characteristics of peptides to express target proteins. For each drug molecules, the five different features of drug atoms and the atomic bond relationships are expressed as graphs. The obtained protein features and graph structure are used as the input of convolution neural network and the input of graph convolution neural network, respectively. A prediction model is established to predict the drug affinity by calculating the hidden relationship. In the KIBA data set test experiment, the consistency coefficient of the model is 0.901, which is 0.01 higher than the existing model, and the MSE (mean square error) of the model is 0.126, which is 5% lower than the existing model. In Davis data set test experiment, the consistency coefficient of the model is 0.895, which is 0.006 higher than the existing model, and the MSE of the model is 0.220, which is 4% lower than the existing model. These results show that our proposed method can not only predict the affinity better than those existing models, but also outperform unitary deep learning approaches.

  2. h

    DataBank

    • huggingface.co
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Botato (2024). DataBank [Dataset]. https://huggingface.co/datasets/BotatoFontys/DataBank
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2024
    Dataset authored and provided by
    Botato
    Description

    Word Cloud

      Frequency of Words
    

    This graph shows a tendency for being about Eindhoven, more specifically, matters of its housing situation, social environments, industry and tech, among other topics.

      Word Embeddings Plot
    

    This graph shows us how related words are to each other. The closer one word is to another, the more they are related.

  3. f

    An Information Theoretic Clustering Approach for Unveiling Authorship...

    • figshare.com
    txt
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). An Information Theoretic Clustering Approach for Unveiling Authorship Affinities in Shakespearean Era Plays and Poems [Dataset]. http://doi.org/10.1371/journal.pone.0111445
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper we analyse the word frequency profiles of a set of works from the Shakespearean era to uncover patterns of relationship between them, highlighting the connections within authorial canons. We used a text corpus comprising 256 plays and poems from the 16th and 17th centuries, with 17 works of uncertain authorship. Our clustering approach is based on the Jensen-Shannon divergence and a graph partitioning algorithm, and our results show that authors' characteristic styles are very powerful factors in explaining the variation of word use, frequently transcending cross-cutting factors like the differences between tragedy and comedy, early and late works, and plays and poems. Our method also provides an empirical guide to the authorship of plays and poems where this is unknown or disputed.

  4. f

    Scoring rule for the tokens in graph node labels.

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). Scoring rule for the tokens in graph node labels. [Dataset]. http://doi.org/10.1371/journal.pone.0111445.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scoring rule for the tokens in graph node labels.

  5. f

    Significance of authorial affinity observed on clusters obtained with...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). Significance of authorial affinity observed on clusters obtained with different distance metrics. [Dataset]. http://doi.org/10.1371/journal.pone.0111445.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Significance of authorial affinity observed on clusters obtained with different distance metrics.

  6. f

    Information Theory based kNN classification of the works of uncertain...

    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). Information Theory based kNN classification of the works of uncertain authorship. [Dataset]. http://doi.org/10.1371/journal.pone.0111445.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information Theory based kNN classification of the works of uncertain authorship.

  7. f

    High contributor singleton vertices in the text graph and their frequencies....

    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hend Alrasheed (2023). High contributor singleton vertices in the text graph and their frequencies. [Dataset]. http://doi.org/10.1371/journal.pone.0255127.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hend Alrasheed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High contributor singleton vertices in the text graph and their frequencies.

  8. f

    Gender Differences in Covid-19 Tweeting in English

    • figshare.com
    xlsx
    Updated Mar 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Thelwall (2020). Gender Differences in Covid-19 Tweeting in English [Dataset]. http://doi.org/10.6084/m9.figshare.12026625.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 24, 2020
    Dataset provided by
    figshare
    Authors
    Mike Thelwall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Word frequency statistics and graphs from the paper "Gender Differences in Covid-19 Tweeting in English" based on tweets in English March 10-23, 2020 matching the queries:coronavirus; “corona virus”; COVID-19; COVID19

  9. f

    The summary of the results for sex differences.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Máté Fellner; Bálint Varga; Vince Grolmusz (2023). The summary of the results for sex differences. [Dataset]. http://doi.org/10.1371/journal.pone.0227910.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Máté Fellner; Bálint Varga; Vince Grolmusz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The first column list the minimum support, or, in other words, the frequency cut-off values: there are two values: 0.8 or 0.9, i.e., 80% 90%. The second column denotes the righ-, left- or both hippocampi; the abbreviation HPC stands for the word “hippocampus”. In the third column the sex is given; the next four columns contain the number size 1, 2, 3 and 4 frequent neighbor-sets of the hippocampus considered. The next column gives the number of the neighbor-sets, which have significantly different frequencies (p = 0.001) in male and female connectomes. The last, ninth column gives the number of neighbor-sets, which are significantly more frequent in male or in female connectomes: the sum of the two numbers in the ninth column is equal to the number in the eighth column. For example, in the first row, we can see that in males, the left hippocampus has 45 frequent 1-element neighbor sets; 844 frequent 2-element neighbor sets, 9102 3-element neighbor sets and 65150 frequent 4-element neighbor sets, where the frequency cut-off is 0.8. Moreover, one can see that there are 15732 sets, differing significantly in frequency in males and in females; and the last column says that from these 15732 sets, 15497 are present in the braingraph of males and only 235 in the braingraphs of females.

  10. f

    Ontology embeddings of HPO, ORDO, and HOOM

    • figshare.com
    bin
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Sun (2025). Ontology embeddings of HPO, ORDO, and HOOM [Dataset]. http://doi.org/10.6084/m9.figshare.27959826.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    figshare
    Authors
    Chang Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This describes the ontology embeddings of HPO, ORDO, and HOOM for training Onto-CGAN (Paper: Generating Unseen Diseases Patient Data Using Ontology-enhanced Generative Adversarial Networks.)We first combined ORDO, HPO, and HOOM ontologies to create unified resource-capturing relationships between diseases, phenotypes, and other biomedical concepts. The combined ontology is processed by OWL2Vec* to transform into graph-based representations, where nodes represent concepts (e.g., diseases, phenotypes) and edges represent their relationships. We used a random walk method with a depth of 3 to explore the graph structures to capture semantic relationships. Textual annotations (e.g., definitions) are tokenized and processed through a word2vec model. The word2vec model was trained by 10 iterations with a window size of 5 words and a minimum word frequency of 1. The resulting embeddings are 100-dimensional vectors that integrate hierarchical relationships, logical axioms, and textual information from the combined ontology.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei (2023). Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSX [Dataset]. http://doi.org/10.3389/fbioe.2020.00267.s004

Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSX

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Deep learning is an effective method to capture drug-target binding affinity, but low accuracy is still an obstacle to be overcome. Thus, we propose a novel predictor for drug-target binding affinity based on dipeptide frequency of word frequency encoding and a hybrid graph convolutional network. Word frequency characteristics of natural language are used to improve the frequency characteristics of peptides to express target proteins. For each drug molecules, the five different features of drug atoms and the atomic bond relationships are expressed as graphs. The obtained protein features and graph structure are used as the input of convolution neural network and the input of graph convolution neural network, respectively. A prediction model is established to predict the drug affinity by calculating the hidden relationship. In the KIBA data set test experiment, the consistency coefficient of the model is 0.901, which is 0.01 higher than the existing model, and the MSE (mean square error) of the model is 0.126, which is 5% lower than the existing model. In Davis data set test experiment, the consistency coefficient of the model is 0.895, which is 0.006 higher than the existing model, and the MSE of the model is 0.220, which is 4% lower than the existing model. These results show that our proposed method can not only predict the affinity better than those existing models, but also outperform unitary deep learning approaches.

Search
Clear search
Close search
Google apps
Main menu