Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Gaussian Finder's cavity dataset in XML. This dataset describes the protein cavities output by a protein cavity detection method called Gaussian Finder. This method is described in the article available at: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1913-4
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the TBGA dataset. TBGA is a large-scale, semi-automatically annotated dataset for Gene-Disease Association (GDA) extraction. The dataset consists of three text files, corresponding to train, validation, and test sets, plus an additional JSON file containing the mapping between relation names and IDs. Each record in train, validation, or test files corresponds to a single GDA extracted from a sentence. Records are represented as JSON objects with the following structure: text: sentence from which the GDA was extracted. relation: relation name associated with the given GDA. h: JSON object representing the gene entity, composed of: id: NCBI Entrez ID associated with the gene entity. name: NCBI official gene symbol associated with the gene entity. pos: list consisting of starting position and length of the gene mention within text. t: JSON object representing the disease entity, composed of: id: UMLS CUI associated with the disease entity. name: UMLS preferred term associated with the disease entity. pos: list consisting of starting position and length of the disease mention within text. TBGA contains over 200,000 instances and 100,000 bags. The zip file consists of one folder, named TBGA, containing the files corresponding to the dataset. If you use or extend our work, please cite the following: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04646-6#citeas TBGA paper can be found at: https://rdcu.be/cKkY2 TBGA code is available at: https://github.com/GDAMining/gda-extraction
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PATH SURVEYOR pipeline examples that were originally hosted on http://shawlab.science/shiny/PATH_SURVEYOR_ExampleUseCases/
It was originally presented in our publication PMID: 37380943
https://pubmed.ncbi.nlm.nih.gov/37380943/ and https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05393-y
Please contact timothy.shaw@moffitt.org for any additional questions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the repository for the manuscript "Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection" by Alain J. Mbebi & Zoran Nikoloski.
Organisation
2. The folder Figures contains all figures in the manuscript.
3. The folder Inferred-networks contains all network objects for each dataset and each inference methods in the comparative analysis.
Dependencies and required packages
The following packages are required for the contending approaches in the comparative analysis: "devtools", "foreach", "plyr", "glmnet" and "randomForest".
GENIE3
The GENIE3 package can be installed from: http://bioconductor.org/packages/release/bioc/html/GENIE3.html
TIGRESS
The TIGRESS repository can be obtained from: https://github.com/jpvert/tigress
ENNET
The ENNET repository can be obtained from: https://github.com/slawekj/ennet
PLSNET
The Matlab source code of PLSNET can be obtained from: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1398-6#Sec17
PORTIA
The PORTIA repository can be obtained from: https://github.com/AntoinePassemiers/PORTIA
D3GRN
The Matlab source code of D3GRN can be obtained from: https://github.com/chenxofhit/D3GRN
Fused-LASSO
The fused-LASSO repository can be obtained from: https://github.com/omranian/inference-of-GRN-using-Fused-LASSO
ANOVerence
Because of some technical issues (e.g code's accessibility: http://www2.bio.ifi.lmu.de/˜kueffner/anova.tar.gz), we were not able to reproduce ANOVerence results and used the inferred network from DREAM5 challenge instead.
4. Although the codes here were tested on Fedora 29 (Workstation Edition) using R (version 4.2.2), they can run under any Linux or Windows OS distributions, as long as all the required packages are compatible with the desired R version.
Facebook
Twitterhttp://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
Bio-medical texts have a lot of information which can be used for developments in the medical field. Traditionally, domain experts used to manually extract such information. Automating this information extraction task can help speed up progress in the field. To name a few use cases of bio-medical events, they show the effects of drugs on a person. They can also be used to identify certain medical conditions in a person. Hence automating extraction of events from bio-medical texts is very beneficial
The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES
It consists of the original bio-medical text, labelled trigger words, location of trigger word in the text and the event type associated with the trigger word There are 3 sets of data (train (8k+ sentences), devel (about 3k sentences) and test (about 3k sentences)). Each set has 4 columns namely "Sentence", "TriggerWord", "TriggerWordLoc" and "EventType", capturing the original bio-medical text, trigger words in the sentence, location of the trigger words in the sentence and the event type associated with the trigger words respectively.
The dataset is just a simplified version of the event annotated GENIA dataset derived from the version available in TEES The original source dataset is from BioNLP Shared Task 2011 A complete unprocessed version seems to be present in genia-event-2011 dataset too
For TEES licensing information please refer this link For GENIA dataset licensing information, please refer the file "GE11-LICENSE" present beside the data files (.csv) in this kaggle dataset
Photo Credits: Louis Reed on Unsplash
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the TBGA dataset. TBGA is a large-scale, semi-automatically annotated dataset for Gene-Disease Association (GDA) extraction. The dataset consists of three text files, corresponding to train, validation, and test sets, plus an additional JSON file containing the mapping between relation names and IDs. Each record in train, validation, or test files corresponds to a single GDA extracted from a sentence. Records are represented as JSON objects with the following structure:
text: sentence from which the GDA was extracted. relation: relation name associated with the given GDA.
h: JSON object representing the gene entity, composed of:
id: NCBI Entrez ID associated with the gene entity. name: NCBI official gene symbol associated with the gene entity. pos: list consisting of starting position and length of the gene mention within text.
t: JSON object representing the disease entity, composed of:
id: UMLS CUI associated with the disease entity. name: UMLS preferred term associated with the disease entity. pos: list consisting of starting position and length of the disease mention within text. TBGA contains over 200,000 instances and 100,000 bags. The zip file consists of one folder, named TBGA, containing the files corresponding to the dataset.
If you use or extend our work, please cite the following: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04646-6#citeas TBGA paper can be found at: https://rdcu.be/cKkY2 TBGA code is available at: https://github.com/GDAMining/gda-extraction
Keeping Citation here because I don't know where else to keep it.
"""Cite all versions? You can cite all versions by using the DOI 10.5281/zenodo.5911096. This DOI represents all versions, and will always resolve to the latest one. Read more."""
Data set is taken from https://zenodo.org/records/5911097
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
KVFinder's cavity dataset in CVS.This dataset describes the protein cavities output by a protein cavity detection method called KVFinder. This method is described in the article available at:https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-197
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
A collection of 1200 texts (292173 tokens) about clinical trials studies and clinical trials announcements in Spanish:
Texts were annotated with the following entities types:
86 389 entities and 16 590 attributes were annotated. 10% of the corpus was doubly annotated, and high inter-annotator agreement (IAA) values were achieved: F1-score = 0.84% for entities; and F1-score = 0.88% for attributes (both in strict match).
The dataset includes the texts and annotations used for the human evaluation of the medical named entity tool:
100 clinical trial announcements from EudraCT not used for system development: we provide files of the version revised by medical professionals (Reference folder)- 100 clinical cases with Creative Commons license: we provide files with the files revised by medical professionals (Reference folder). These data come from:
• Urgencias Bidasoa (https://urgenciasbidasoa.wordpress.com/casos-clinicos-3/) • Hipocampo.org (https://www.hipocampo.org/) • Cases published by Sociedad Andaluza de Medicina Familiar y Comunitaria (SAMFyC): we are greatly thankful for giving us permission to use these cases and we acknowledge that the copyright belongs to the authors' contents. Clinical cases were extracted from books published from 2016 to 2022 (https://www.samfyc.es/tipos-publicacion/publicaciones/). If you use these data, please, acknowledge the copyright and intellectual property rights to the authors' contents.
The dataset is freely distributed for research and educational purposes under a Creative Commons Non-Commercial Attribution (CC-BY-NC-A) License.
If you use the CT-EBM-SP vs. 2 dataset, please, cite as follows:
Campillos-Llanos, L., A. Valverde-Mateos & A. Capllonch-Carrion (2024) Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics. BioMed Central.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Gaussian Finder's cavity dataset in XML. This dataset describes the protein cavities output by a protein cavity detection method called Gaussian Finder. This method is described in the article available at: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1913-4