10 datasets found

f
Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional...
frontiersin.figshare.com
xlsx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei (2023). Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSX [Dataset]. http://doi.org/10.3389/fbioe.2020.00267.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fbioe.2020.00267.s004
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep learning is an effective method to capture drug-target binding affinity, but low accuracy is still an obstacle to be overcome. Thus, we propose a novel predictor for drug-target binding affinity based on dipeptide frequency of word frequency encoding and a hybrid graph convolutional network. Word frequency characteristics of natural language are used to improve the frequency characteristics of peptides to express target proteins. For each drug molecules, the five different features of drug atoms and the atomic bond relationships are expressed as graphs. The obtained protein features and graph structure are used as the input of convolution neural network and the input of graph convolution neural network, respectively. A prediction model is established to predict the drug affinity by calculating the hidden relationship. In the KIBA data set test experiment, the consistency coefficient of the model is 0.901, which is 0.01 higher than the existing model, and the MSE (mean square error) of the model is 0.126, which is 5% lower than the existing model. In Davis data set test experiment, the consistency coefficient of the model is 0.895, which is 0.006 higher than the existing model, and the MSE of the model is 0.220, which is 4% lower than the existing model. These results show that our proposed method can not only predict the affinity better than those existing models, but also outperform unitary deep learning approaches.
h
DataBank
huggingface.co
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Botato (2024). DataBank [Dataset]. https://huggingface.co/datasets/BotatoFontys/DataBank
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Botato
Description
Word Cloud

Frequency of Words

This graph shows a tendency for being about Eindhoven, more specifically, matters of its housing situation, social environments, industry and tech, among other topics.

Word Embeddings Plot

This graph shows us how related words are to each other. The closer one word is to another, the more they are related.
f
An Information Theoretic Clustering Approach for Unveiling Authorship...
figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). An Information Theoretic Clustering Approach for Unveiling Authorship Affinities in Shakespearean Era Plays and Poems [Dataset]. http://doi.org/10.1371/journal.pone.0111445
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0111445
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper we analyse the word frequency profiles of a set of works from the Shakespearean era to uncover patterns of relationship between them, highlighting the connections within authorial canons. We used a text corpus comprising 256 plays and poems from the 16th and 17th centuries, with 17 works of uncertain authorship. Our clustering approach is based on the Jensen-Shannon divergence and a graph partitioning algorithm, and our results show that authors' characteristic styles are very powerful factors in explaining the variation of word use, frequently transcending cross-cutting factors like the differences between tragedy and comedy, early and late works, and plays and poems. Our method also provides an empirical guide to the authorship of plays and poems where this is unknown or disputed.
f
Scoring rule for the tokens in graph node labels.
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). Scoring rule for the tokens in graph node labels. [Dataset]. http://doi.org/10.1371/journal.pone.0111445.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0111445.t001
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scoring rule for the tokens in graph node labels.
f
Significance of authorial affinity observed on clusters obtained with...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). Significance of authorial affinity observed on clusters obtained with different distance metrics. [Dataset]. http://doi.org/10.1371/journal.pone.0111445.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0111445.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Significance of authorial affinity observed on clusters obtained with different distance metrics.
f
Information Theory based kNN classification of the works of uncertain...
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato (2023). Information Theory based kNN classification of the works of uncertain authorship. [Dataset]. http://doi.org/10.1371/journal.pone.0111445.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0111445.t004
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Ahmed Shamsul Arefin; Renato Vimieiro; Carlos Riveros; Hugh Craig; Pablo Moscato
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Information Theory based kNN classification of the works of uncertain authorship.
f
High contributor singleton vertices in the text graph and their frequencies....
plos.figshare.com
xls
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hend Alrasheed (2023). High contributor singleton vertices in the text graph and their frequencies. [Dataset]. http://doi.org/10.1371/journal.pone.0255127.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0255127.t001
Dataset updated
Jun 10, 2023
Dataset provided by
PLOS ONE
Authors
Hend Alrasheed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
High contributor singleton vertices in the text graph and their frequencies.
f
Gender Differences in Covid-19 Tweeting in English
figshare.com
xlsx
Updated Mar 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Thelwall (2020). Gender Differences in Covid-19 Tweeting in English [Dataset]. http://doi.org/10.6084/m9.figshare.12026625.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12026625.v1
Dataset updated
Mar 24, 2020
Dataset provided by
figshare
Authors
Mike Thelwall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Word frequency statistics and graphs from the paper "Gender Differences in Covid-19 Tweeting in English" based on tweets in English March 10-23, 2020 matching the queries:coronavirus; “corona virus”; COVID-19; COVID19
f
The summary of the results for sex differences.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Máté Fellner; Bálint Varga; Vince Grolmusz (2023). The summary of the results for sex differences. [Dataset]. http://doi.org/10.1371/journal.pone.0227910.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0227910.t002
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Máté Fellner; Bálint Varga; Vince Grolmusz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The first column list the minimum support, or, in other words, the frequency cut-off values: there are two values: 0.8 or 0.9, i.e., 80% 90%. The second column denotes the righ-, left- or both hippocampi; the abbreviation HPC stands for the word “hippocampus”. In the third column the sex is given; the next four columns contain the number size 1, 2, 3 and 4 frequent neighbor-sets of the hippocampus considered. The next column gives the number of the neighbor-sets, which have significantly different frequencies (p = 0.001) in male and female connectomes. The last, ninth column gives the number of neighbor-sets, which are significantly more frequent in male or in female connectomes: the sum of the two numbers in the ninth column is equal to the number in the eighth column. For example, in the first row, we can see that in males, the left hippocampus has 45 frequent 1-element neighbor sets; 844 frequent 2-element neighbor sets, 9102 3-element neighbor sets and 65150 frequent 4-element neighbor sets, where the frequency cut-off is 0.8. Moreover, one can see that there are 15732 sets, differing significantly in frequency in males and in females; and the last column says that from these 15732 sets, 15497 are present in the braingraph of males and only 235 in the braingraphs of females.
f
Ontology embeddings of HPO, ORDO, and HOOM
figshare.com
bin
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Sun (2025). Ontology embeddings of HPO, ORDO, and HOOM [Dataset]. http://doi.org/10.6084/m9.figshare.27959826.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27959826.v2
Dataset updated
Jul 3, 2025
Dataset provided by
figshare
Authors
Chang Sun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This describes the ontology embeddings of HPO, ORDO, and HOOM for training Onto-CGAN (Paper: Generating Unseen Diseases Patient Data Using Ontology-enhanced Generative Adversarial Networks.)We first combined ORDO, HPO, and HOOM ontologies to create unified resource-capturing relationships between diseases, phenotypes, and other biomedical concepts. The combined ontology is processed by OWL2Vec* to transform into graph-based representations, where nodes represent concepts (e.g., diseases, phenotypes) and edges represent their relationships. We used a random walk method with a depth of 3 to explore the graph structures to capture semantic relationships. Textual annotations (e.g., definitions) are tokenized and processed through a word2vec model. The word2vec model was trained by 10 iterations with a window size of 5 words and a minimum word frequency of 1. The resulting embeddings are 100-dimensional vectors that integrate hierarchical relationships, logical axioms, and textual information from the combined ontology.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei (2023). Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSX [Dataset]. http://doi.org/10.3389/fbioe.2020.00267.s004

Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSX

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.3389/fbioe.2020.00267.s004

Dataset updated

Jun 1, 2023

Dataset provided by

Frontiers

Authors

Xianfang Wang; Yifeng Liu; Fan Lu; Hongfei Li; Peng Gao; Dongqing Wei

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Deep learning is an effective method to capture drug-target binding affinity, but low accuracy is still an obstacle to be overcome. Thus, we propose a novel predictor for drug-target binding affinity based on dipeptide frequency of word frequency encoding and a hybrid graph convolutional network. Word frequency characteristics of natural language are used to improve the frequency characteristics of peptides to express target proteins. For each drug molecules, the five different features of drug atoms and the atomic bond relationships are expressed as graphs. The obtained protein features and graph structure are used as the input of convolution neural network and the input of graph convolution neural network, respectively. A prediction model is established to predict the drug affinity by calculating the hidden relationship. In the KIBA data set test experiment, the consistency coefficient of the model is 0.901, which is 0.01 higher than the existing model, and the MSE (mean square error) of the model is 0.126, which is 5% lower than the existing model. In Davis data set test experiment, the consistency coefficient of the model is 0.895, which is 0.006 higher than the existing model, and the MSE of the model is 0.220, which is 4% lower than the existing model. These results show that our proposed method can not only predict the affinity better than those existing models, but also outperform unitary deep learning approaches.

Clear search

Close search

Google apps

Main menu

Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional...

DataBank

An Information Theoretic Clustering Approach for Unveiling Authorship...

Scoring rule for the tokens in graph node labels.

Significance of authorial affinity observed on clusters obtained with...

Information Theory based kNN classification of the works of uncertain...

High contributor singleton vertices in the text graph and their frequencies....

Gender Differences in Covid-19 Tweeting in English

The summary of the results for sex differences.

Ontology embeddings of HPO, ORDO, and HOOM

Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSXSee More Versions

Table_4_Dipeptide Frequency of Word Frequency and Graph Convolutional Networks for DTA Prediction.XLSX