Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Most review authors applied multiple strategies to identify additional trials. Therefore, the summation of percentages exceeds 100%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundLassa fever (LF) is an acute viral hemorrhagic illness endemic in West Africa, representing significant public health challenges, particularly for pregnant persons and children who experience higher morbidity and mortality. Although several vaccine candidates are being developed, no LF vaccine has been licensed yet.MethodsWe conducted a living systematic review (LSR) of the literature to evaluate the safety, efficacy, effectiveness, and immunogenicity of LF vaccines. We performed biweekly searches in major biomedical databases, trial registries, preprint servers, and other sources. Eligible studies included preclinical studies, clinical trials, and observational studies published from January 2014 to April 2025. Reviewer pairs screened studies extracted data (REDCap), and assessed risk of bias independently. Data synthesis involved random-effects pairwise and proportion meta-analyses (R software), with GRADE assessment of evidence certainty. PROSPERO registries: (CRD42024514513; CRD42024516754).ResultsSearches retrieved 1423 records, including 51 studies, 2 clinical trials in adults involving 88 vaccinated persons, and 49 preclinical studies of 30 vaccine candidates. Trials evaluated Recombinant Measles-Vectored (MV-LASV) and Recombinant Vesicular Stomatitis Virus-based (rVSVΔG-LASV-GPC) LF vaccine candidates. No published clinical trials were found to evaluate LF vaccines in special populations such as pregnant persons, infants, children, or adolescents. Although injection site reactogenicity was reported, no vaccine-related serious adverse events (SAEs) were reported in study participants. Immunogenicity was robust in adults, with vaccines achieving around 95% seroconversion at 30 days. Preclinical data evaluated nine different platforms. Findings are disseminated via an interactive online dashboard (https://safeinpregnancy.org/living-systematic-review-lassa/).ConclusionCurrently, two LF vaccine candidates that have advanced to clinical trials exhibit high immunogenicity, but the safety profile in healthy adults is still limited. Clinical evidence in pregnant persons, infants, children, and adolescents is absent. Vaccine platforms of interest have been identified in preclinical studies, providing information on those that could advance to clinical studies.
Facebook
TwitterBibliography to assist in identifying methods and procedures helpful in supporting the development, testing, application, and validation of alternatives to the use of vertebrates in biomedical research and toxicology testing. This bibliography is produced from MEDLINE database searches, performed and analyzed by subject experts from the Toxicology and Environmental Health Information Program (TEHIP) of the Specialized Information Services Division (SIS) of the National Library of Medicine (NLM). The purpose of these bibliographies on animal alternatives is to provide a survey of the literature in a format which facilitates easy scanning. This bibliography includes citations from published articles, books, book chapters, and technical reports. Citations to items in non-English languages are indicated with brackets around the title. The language is also indicated. Citations with abstracts or annotations relating to the method are organized under subject categories. This publication features citations which deal with methods, tests, assays or procedures which may prove useful in establishing alternatives to the use of intact vertebrates. Citations are selected and compiled through searching various computerized on-line bibliographic databases of the National Library of Medicine, National Institutes of Health. The focus of the bibliography is to assist in identifying methods and procedures helpful in supporting the development, testing, application, and validation of alternatives to the use of vertebrates in biomedical research and toxicology testing. Toxicology Databases
Facebook
TwitterBackgroundUnlike bone tissue, little progress has been made regarding cartilage regeneration, and many challenges remain. Furthermore, the key roles of cartilage lesion caused by traumas, focal lesion, or articular overstress remain unclear. Traumatic injuries to the meniscus as well as its degeneration are important risk factors for long-term joint dysfunction, degenerative joint lesions, and knee osteoarthritis (OA) a chronic joint disease characterized by degeneration of articular cartilage and hyperosteogeny. Nearly 50% of the individuals with meniscus injuries develop OA over time. Due to the limited inherent self-repair capacity of cartilage lesion, the Biomaterial drug-nanomedicine is considered to be a promising alternative. Therefore, it is important to elucidate the gene potential regeneration mechanisms and discover novel precise medication, which are identified through this study to investigate their function and role in pathogenesis.MethodsWe downloaded the mRNA microarray statistics GSE117999, involving paired cartilage lesion tissue samples from 12 OA patients and 12 patients from a control group. First, we analyzed these statistics to recognize the differentially expressed genes (DEGs). We then exposed the gene ontology (GO) annotation and the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway enrichment analyses for these DEGs. Protein-protein interaction (PPI) networks were then constructed, from which we attained eight significant genes after a functional interaction analysis. Finally, we identified a potential nanomedicine attained from this assay set, using a wide range of inhibitor information archived in the Search Tool for the Retrieval of Interacting Genes (STRING) database.ResultsSixty-six DEGs were identified with our standards for meaning (adjusted P-value < 0.01, |log2 - FC| ≥1.2). Furthermore, we identified eight hub genes and one potential nanomedicine - Selenocysteine based on these integrative data.ConclusionWe identified eight hub genes that could work as prospective biomarkers for the diagnostic and biomaterial drug treatment of cartilage lesion, involving the novel genes CAMP, DEFA3, TOLLIP, HLA-DQA2, SLC38A6, SLC3A1, FAM20A, and ANO8. Meanwhile, these genes were mainly associated with immune response, immune mediator induction, and cell chemotaxis. Significant support is provided for obtaining a series of novel gene targets, and we identify potential mechanisms for cartilage regeneration and final nanomedicine immunotherapy in regenerative medicine.
Facebook
TwitterData from the paper "The landscape of biomedical research" (https://www.biorxiv.org/content/10.1101/2023.04.10.536208v1). The paper used the PubMed 2020 baseline (download date: 26.01.2021, not available anymore) supplemented with additional files from the 2021 baseline (download date: 27.04.2022, not available anymore), both originally obtained from https://www.nlm.nih.gov/databases/download/pubmed_medline.html, courtesy of the U.S. National Library of Medicine. The data provided here includes: - from the PubMed database: article title, journal, PMID, and publication year. - produced by us: t-SNE embedding X and Y coordinates, label, and color.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Journal lists of all the 46 Sub-Saharan African countries were retrieved manually from Ulrich periodical database using the "country of publication" field in the advanced search interface. Delimiters were used to limit the retrieved results to periodicals in the journal categories and with active status. Ulrich's database usually multiple records for the different formats (eg. online and print), or languages in which a single journal is published. Duplicates were removed from the retrieved results.
Master journal lists for Web of Science indexes comprising of the Science Citation Index Expanded (SCIE), the Social Science Citation Index (SSCI) and the Arts and Humanities Citation Index (A&HCI) and Emerging Sources Citation Index ESCI. Master journal lists for Scopus, EMBASE and MEDLINE databases were downloaded from their respective publishers' websites. Master journal lists for AJOL was not available on the publishers' website. Therefore, the master journal list from AJOL was created manually by extracting journal information from the publishers' websites. Only active journals were included in the study, where active journals were defined as journals that have published at least an issue in 2021 or 2020. The master journal list for AIM was not available as well. The whole database comprising of 18,949 articles were downloaded with the source (journal names). Journals were sorted to identify unique journal names, where only 15,279 articles had identifiable journal names. Five hundred twenty-four unique journals were identified, with only 74 active journals. Journals that were not indexed in the AIM database in 2020 or 2021 were deemed inactive and were not included in the study. This study was not considered for ethics review because data used was collected from publicly available records.
Facebook
TwitterDatabases that represent sets of pre-compiled information on biological relationships and associations, interactions and facts which have been extracted from the biomedical literature using Ariadne's MedScan technology. ResNet databases store information harvested from the entire PubMed in a formal structure that allows searching, retrieval and updating by Pathway Studio user. ResNet is seamlessly installed when Pathway Studio is installed. There are several available ResNet databases: *ResNet Mammalian Database includes data for Human, Rat, and Mouse *ResNet Plant Database has data on Arabidopsis, Rice and several other plants. Features of ResNet: *All extracted relations have linked access to the original article or abstract *Synonyms and homologs are included to maintain gene identity and to obviate redundancy in search results *Users can update ResNet as often as required using the MedScan technology built into all Ariadne products *Updates are made available by Ariadne every quarter To purchase Pathway Studio software with ResNet database, for information, or to schedule a web demonstration, call our sales department at (240) 453-6272, or (866) 340-5040 (toll free)., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE. Documented on August 16, 2019.A database that integrates not only RIKEN''''s original large-scale mammalian databases, such as FANTOM, the ENU mutagenesis program, the RIKEN Cerebellar Development Transcriptome Database and the Bioresource Database, but also imported data from public databases, such as Ensembl, MGI and biomedical ontologies. Our integrated database has been implemented on the infrastructure of publication medium for databases, termed SciNetS/SciNeS, or the Scientists'''' Networking System, where the data and metadata are structured as a semantic web and are downloadable in various standardized formats. The top-level ontology-based implementation of mammal-related data directly integrates the representative knowledge and individual data records in existing databases to ensure advanced cross-database searches and reduced unevenness of the data management operations. Through the development of this database, we propose a novel methodology for the development of standardized comprehensive management of heterogeneous data sets in multiple databases to improve the sustainability, accessibility, utility and publicity of the data of biomedical information.
Facebook
TwitterComprehensive international bibliographic biomedical database that enables users to track and retrieve precise information on drugs and diseases from pre-clinical studies to searches on critical toxicological information. It contains bibliographic records with citations, abstracts and indexing derived from biomedical articles in peer reviewed journals, and is especially strong in its coverage of drug and pharmaceutical research. Embase can help with everything from clinical trials research to pharmacovigilance and is updated online daily and weekly. Its broad biomedical scope covers the following areas: * Drug therapy and research, including pharmaceutics, pharmacology and toxicology * Clinical and experimental (human) medicine * Basic biological science relevant to human medicine * Biotechnology and biomedical engineering, including medical devices * Health policy and management, including pharmacoeconomics * Public, occupational and environmental health, including pollution control * Veterinary science, dentistry, and nursing The Embase Application Programming Interface supports export, RSS feeds, and integration services, making it possible to share data with a wide range of systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A zip file containing training/evaluation data sets for the bio-answerfinder biomedical question answering system training and evaluation. The zip file also contains SQLite databases for named entity lookups, morphology, nominalizations, acronyms, PubMED trained GLoVe word/phrase embeddings and vocabulary with document frequencies and SciCrunch ontology data for named entities such as proteins, anatomical structures,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes the biomedical abbreviations stated between parentheses in the titles of the scholarly publications indexed by PubMed between 1947 and 2019. Each abbreviation is extracted thanks to the parenthetic level count algorithm and is assigned to the title, PMID and year of publication of each corresponding research paper. Then, every acronym is allocated its length and the number of upper and lower case letters it involves. Finally, the entities including one or no upper case letter, less than three characters, eight characters or more, or a high rate of non-alphanumeric characters are semi-automatically eliminated to ensure the consistency of the research database.
Facebook
TwitterVenue for research resource discovery offering resource providers a platform to advertise their services and products, as well as investigators a means to locate services for their use. Search results may be refined by resource type, research area or institution.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
ABSTRACT:
The Human Disease Ontology (DO) (http://www.disease-ontology.org), database has undergone significant expansion in the past three years. The DO disease classification includes specific formal semantic rules to express meaningful disease models and has expanded from a single asserted classification to include multiple-inferred mechanistic disease classifications, thus providing novel perspectives on related diseases. Expansion of disease terms, alternative anatomy, cell type and genetic disease classifications and workflow automation highlight the updates for the DO since 2015. The enhanced breadth and depth of the DO's knowledgebase has expanded the DO's utility for exploring the multi-etiology of human disease, thus improving the capture and communication of health-related data across biomedical databases, bioinformatics tools, genomic and cancer resources and demonstrated by a 6.6× growth in DO's user community since 2015. The DO's continual integration of human disease knowledge, evidenced by the more than 200 SVN/GitHub releases/revisions, since previously reported in our DO 2015 NAR paper, includes the addition of 2650 new disease terms, a 30% increase of textual definitions, and an expanding suite of disease classification hierarchies constructed through defined logical axioms.
Instructions:
Data was cleaned. Duplicates and unnecessary columns were removed. Title of columns were changed.
Inspiration:
This dataset uploaded to U-BRITE for "DRG_DEPOT" summer 2023 team project.
Acknowledgements:
Schriml, L. M., Mitraka, E., Munro, J., Tauber, B., Schor, M., Nickle, L., Felix, V., Jeng, L., Bearer, C., Lichenstein, R., Bisordi, K., Campion, N., Hyman, B., Kurland, D., Oates, C. P., Kibbey, S., Sreekumar, P., Le, C., Giglio, M., & Greene, C.
Human Disease Ontology 2018 update: classification, content and workflow expansion
Nucleic Acids Research 2019; 47(D1), D955–D962;PMID:30407550;DOI:https://doi.org/10.1093/nar/gky1032
U-BRITE last update data: 06/28/2023
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
[Lexical/terminological resource] First release of the Spanish Medical Abbreviation DataBase (AbreMES-DB).
The database is created automatically by detecting abbreviations and their potential definitions explicitly mentioned in the same sentence. These abbreviations are extracted from the metadata of different biomedical publications written in Spanish, which contain the titles and abstracts. The sources of these publications are SciELO, IBECS and Pubmed.
Facebook
TwitterCommunity model organism database for laboratory mouse and authoritative source for phenotype and functional annotations of mouse genes. MGD includes complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics.Contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a Neo4j .dump file for the constructed PICO-based Biomedical Knowledge Graph (EBM-KG). The graph is built from the EBM-NLP dataset and represents key PICO (Population, Intervention, Comparator, Outcome) elements and their relationships.
The knowledge graph can be restored in Neo4j to support biomedical text mining, literature-based discovery, and advanced retrieval-augmented generation (RAG) pipelines.
Neo4j database dump file contains the following : - Document, keyword, and author nodes for each PubMed article in the EBM-NLP dataset. - PICO nodes with their sub-labels as defined in the EBM-NLP dataset.
Total 23 entity types and 22 relation types is present in the knowledge graph
How to Restore the Database 1. Install Neo4j (compatible with version used: Neo4j 5.24.0). 2. Stop the Neo4j server. 3. Run : neo4j-admin database load --from-path=[xxx/neo4j.dump]/backups --overwrite-destination=true 4. Start the Neo4j server
Facebook
TwitterData collection of gene expression patterns mapped in whole-mount mouse embryo (ICR strain) of mid-gestational stages (Embryonic Day 9.5, 10.5, 11.5), in which most striking dynamics in pattern formation and organogenesis is observed. Collection of gene expression patterns of transcription factors (TFs) and TF-related factors such as transcription cofactors. Genes were extracted from databases including RIKEN Transcription Factor Database and Panther Classification System.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains relevant datasets and software used in a paper "KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description". They are used to run the code of KGML-xDTD stored on Github and support the results of this paper.
About the datasets
1. bkg_rtxkg2c_v2.7.3.tar.gz
This tar.gz file contains three sub-folders: tsv_files, scripts, and relevant_dbs. The "tsv_files" sub-folder has the input files that the neo4j software uses. The "scripts" sub-folder contains a shell script with a relevant python script to construct the biomedical knowledge graph. The "relevant_dbs" sub-folder stores two auxiliary databases that KGML-xDTD needs to use.
2. indication_paths.yaml
This file contains the DrugMechDB MOA paths that we used to evaluate the predicted MOA paths by KGML-xDTD. It is downloaded from the official GitHub repository of DrugMechDB.
3. training_data.tar.gz
This tar.gz file contains the processed training data of four data sources (e.g., MyChem, SemMedDB, NDF-RT, RepoDB) mentioned in the paper. These processed drug-disease pairs have been matched to the identifiers of biological entities used in our biomedical knowledge graph and respectively split into true positive (tp) sets and true negative (tn) sets. We also provide the names of these drug identifiers and disease identifiers under a sub-folder "translated _to_name".
About the software
neo4j-community-3.5.26.tar.gz
This tar.gz is the Neo4j community version 3.5.26 downloaded from Neo4j Download Center. Although the newer versions are available, due to their big changes in the Neo4j setting that are not compatible with our scripts on Github, we provide the version that we used in our research. If you would like to use the newer version, modifications to our script will be required to import the biomedical knowledge graph into your local Neo4j database with the new setting.
Facebook
TwitterA virtual database of annotations made by 50 database providers (April 2014) - and growing (see below), that map data to publication information. All NIF Data Federation sources can be part of this virtual database as long as they indicate the publications that correspond to data records. The format that NIF accepts is the PubMed Identifier, category or type of data that is being linked to, and a data record identifier. A subset of this data is passed to NCBI, as LinkOuts (links at the bottom of PubMed abstracts), however due to NCBI policies the full data records are not currently associated with PubMed records. Database providers can use this mechanism to link to other NCBI databases including gene and protein, however these are not included in the current data set at this time. (To view databases available for linking see, http://www.ncbi.nlm.nih.gov/books/NBK3807/#files.Databases_Available_for_Linking ) The categories that NIF uses have been standardized to the following types: * Resource: Registry * Resource: Software * Reagent: Plasmid * Reagent: Antibodies * Data: Clinical Trials * Data: Gene Expression * Data: Drugs * Data: Taxonomy * Data: Images * Data: Animal Model * Data: Microarray * Data: Brain connectivity * Data: Volumetric observation * Data: Value observation * Data: Activation Foci * Data: Neuronal properties * Data: Neuronal reconstruction * Data: Chemosensory receptor * Data: Electrophysiology * Data: Computational model * Data: Brain anatomy * Data: Gene annotation * Data: Disease annotation * Data: Cell Model * Data: Chemical * Data: Pathways For more information refer to Create a LinkOut file, http://neuinfo.org/nif_components/disco/interoperation.shtm Participating resources ( http://disco.neuinfo.org/webportal/discoLinkoutServiceSummary.do?id=4 ): * Addgene http://www.addgene.org/pgvec1 * Animal Imaging Database http://aidb.crbs.ucsd.edu * Antibody Registry http://www.neuinfo.org/products/antibodyregistry/ * Avian Brain Circuitry Database http://www.behav.org/abcd/abcd.php * BAMS Connectivity http://brancusi.usc.edu/ * Beta Cell Biology Consortium http://www.betacell.org/ * bioDBcore http://biodbcore.org/ * BioGRID http://thebiogrid.org/ * BioNumbers http://bionumbers.hms.harvard.edu/ * Brain Architecture Management System http://brancusi.usc.edu/bkms/ * Brede Database http://hendrix.imm.dtu.dk/services/jerne/brede/ * Cell Centered Database http://ccdb.ucsd.edu * CellML Model Repository http://www.cellml.org/models * CHEBI http://www.ebi.ac.uk/chebi/ * Clinical Trials Network (CTN) Data Share http://www.ctndatashare.org/ * Comparative Toxicogenomics Database http://ctdbase.org/ * Coriell Cell Repositories http://ccr.coriell.org/ * CRCNS - Collaborative Research in Computational Neuroscience - Data sharing http://crcns.org * Drug Related Gene Database https://confluence.crbs.ucsd.edu/display/NIF/DRG * DrugBank http://www.drugbank.ca/ * FLYBASE http://flybase.org/ * Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ * Gene Ontology Tools http://www.geneontology.org/GO.tools.shtml * Gene Weaver http://www.GeneWeaver.org * GeneDB http://www.genedb.org/Homepage * Glomerular Activity Response Archive http://gara.bio.uci.edu * GO http://www.geneontology.org/ * Internet Brain Volume Database http://www.cma.mgh.harvard.edu/ibvd/ * ModelDB http://senselab.med.yale.edu/modeldb/ * Mouse Genome Informatics Transgenes ftp://ftp.informatics.jax.org/pub/reports/MGI_PhenotypicAllele.rpt * NCBI Taxonomy Browser http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html * NeuroMorpho.Org http://neuromorpho.org/neuroMorpho * NeuronDB http://senselab.med.yale.edu/neurondb * SciCrunch Registry http://neuinfo.org/nif/nifgwt.html?tab=registry * NIF Registry Automated Crawl Data http://lucene1.neuinfo.org/nif_resource/current/ * NITRC http://www.nitrc.org/ * Nuclear Receptor Signaling Atlas http://www.nursa.org * Olfactory Receptor DataBase http://senselab.med.yale.edu/ordb/ * OMIM http://omim.org * OpenfMRI http://openfmri.org * PeptideAtlas http://www.peptideatlas.org * RGD http://rgd.mcw.edu * SFARI Gene: AutDB https://gene.sfari.org/autdb/Welcome.do * SumsDB http://sumsdb.wustl.edu/sums/ * Temporal-Lobe: Hippocampal - Parahippocampal Neuroanatomy of the Rat http://www.temporal-lobe.com/ * The Cell: An Image Library http://www.cellimagelibrary.org/ * Visiome Platform http://platform.visiome.neuroinf.jp/ * WormBase http://www.wormbase.org * YPED http://medicine.yale.edu/keck/nida/yped.aspx * ZFIN http://zfin.org
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This dataset presents a comprehensive collection of heterogeneous network structures commonly encountered in biomedical research. The dataset encompasses diverse types of networks, including protein-protein interaction networks, herb-target interaction networks, herb-symptom interaction networks, herb-herb association networks, and symptom-symptom association networks. Each network is represented as a graph, with nodes representing entities such as proteins, herb, symptoms, and efficacy, and edges representing the relationships between these entities. The dataset provides detailed descriptions of the nodes and edges in each network, along with metadata such as node attributes where applicable. This dataset serves as a valuable resource for researchers studying complex biological systems and exploring relationships between various biomedical entities. Methods The dataset is composed of seven files that encompass a range of networks, including protein-protein interactions, herb-target interactions, herb-symptom interactions, herb-herb associations, and symptom-symptom associations. It features three types of nodes: 1,254 herbs, 1,027 symptoms, and 2,208 protein targets. Additionally, the dataset includes five types of associations: 1,912 herb-herb associations, 7,220 symptom-symptom associations, 25,133 protein-protein interactions, 168,797 herb-target associations, and 16,739 herb-symptom associations. Below, you'll find detailed descriptions of the contents contained within these seven files. The first file is a compressed CSV that encompasses data on proteins and herbs. It begins with a header row outlining the dataset's features, followed by rows detailing specific protein-herb interactions. The columns are divided into two segments: the first segment includes attributes related to proteins such as Target ID, Protein Name, Gene Symbol, and Uniprot ID, while the second segment contains herb-related information, including Herb ID, Chinese Pin Yin, English Name, and Latin Name. Table 1 presents some of its associated targets. This compilation serves as an extensive repository of knowledge about herbs and their molecular targets, enriched with supplementary information. In this dataset, the designation "NA" signifies a lack of information regarding specific protein-target interactions related to herbs. Specifically, it indicates that there is no identified protein name associated with the herb-target interaction documented in the dataset. Furthermore, the corresponding gene linked to this protein and UniProt ID remains unavailable or undocumented. The CSV file encompasses 14,985 pairs of compound-herb activities and includes quality indicators for 1,254 herbs and 1,237 ingredients, detailing the interactions between these herbs and compounds organized into the following fields (Table 2 shows various compounds associated with the herb Coptis deltoidea):
The CAS number uniquely identifies each chemical compound and is essential for scientific communication. The preferred name is based on the IUPAC nomenclature, providing a standardized way to refer to the compound. These compounds contribute to the medicinal benefits of their respective herbs, making them valuable in traditional and modern herbal medicine. PubChem ID: A unique identifier assigned to chemical substances in the PubChem database, offering an abundance of information regarding the chemical's properties, biological activities, and safety data. ChEMBL ID: A unique identifier for bioactive compounds in the ChEMBL database, focusing on the activities of these compounds relevant to drug discovery and medicinal chemistry. Formula: The chemical composition and structure of a compound, indicating the types and quantities of atoms present in the molecule. Smiles: (Simplified Molecular Input Line Entry System): SMILES is a string notation used to represent the structure of molecules. It is a compact and human-readable way to describe the molecular structure, including the atoms, bonds, and functional groups. In the CSV file, the Smiles column contains the SMILES representation of the compounds. This information can be used to identify and compare the chemical structures of the compounds, as well as to predict their properties and activities. Herb ID: A unique identifier designated for each herb.
In summary, the occurrence of "None" in the CAS, PubChem ID, or ChEMBL ID columns of the herb compound dataset indicates missing or unavailable identifiers. The other CSV file contains detailed interactions between herbs and symptoms, organized into the following fields:
Symptom: Describes the symptom associated with the herb. Herb_id: Provides a unique identifier for each herb. Pinyin name: Represents the Romanized version of the herb's Chinese name. Latin name: Lists the herb's Latin botanical name. English name: Identifies the herb by its common English name. Class in Chinese: Indicates the herb's classification in Chinese. Class in English: Describes the herb's classification in English. P-value: Represents the statistical significance value. FDR (BH): Denotes the False Discovery Rate adjusted using the Benjamini-Hochberg method. FDR (Bonferroni): Shows the False Discovery Rate adjusted by the Bonferroni method.
Relationship: Explains the association between the symptom and the herb. The presence of "None" for P_value , FDR(BH) and FDR(Bonferroni) in the herb-symptom interaction dataset indicates a lack of statistical evaluation for that particular interaction. This dataset is sourced from the Chinese Pharmacopoeia (CHPA), ensuring its credibility. It includes 1,254 unique herb nodes and 1,027 symptom nodes, resulting in a total of 2,281 nodes. Furthermore, it features 16,739 interactions as edges, offering a comprehensive overview of the relationships between herbs and symptoms. There are two supplementary files related to herbs. One likely details the interactions among various herbs, while the other appears to outline the therapeutic properties associated with each specific herb. These files can offer valuable insights into the relationships between herbs and their effectiveness in treating different ailments or health conditions. The CSV file encompasses 1,912 pairs of herb-herb activities and includes quality indicators for 1,254 herbs, detailing the interactions between these herbs. Herb-herb association interaction indicate relationships among herbs based on shared properties and are available in CSV format. The CSV file encompasses 25133 pairs of target-target activities and includes quality indicators for 22008 protein targets, detailing the interactions between these proteins. Protein-Protein Interaction illustrate the interactions among proteins and are also provided in CSV format. This data, sourced from the STRING database, consists of 2,208 protein targets connected by 25,133 interactions. The CSV file encompasses 7220 pairs of symptom-symptom activities and includes quality indicators for 22008 symptoms, detailing the interactions between these symptoms. Additionally, Symptom-Symptom Association show relationships between symptoms and are available in CSV format as well. The data is extracted from the Semantic MEDLINE Database (SemMedDB) and encompasses 1,027 symptoms linked by 7,220 associations. EXPERIMENTAL DESIGN, MATERIALS AND METHODS This study compiles data on herbs, symptoms, targets, and their interactions from various public databases and publications, as detailed in Table 3. Herb-target and herb-compound associations are obtained from the HIT2 database, while relationships between herbs and their indications are derived from the 2015 edition of the Chinese Pharmacopoeia (CHPA). Herb-herb associations are based on research described in the reference. Furthermore, associations related to herb efficacy were specifically gathered from the Chinese Pharmacopoeia (CHPA). Using these efficacy-based herb relationships, herb vectors are constructed, and cosine similarities between pairs of herbs are calculated, resulting in the development of herb-herb connections. These similarity scores are then applied as edge weights in the network analysis. Table 3 summarizes information collected from various public databases and publications regarding herbs, symptoms, targets, and their interactions.
Name
Composition
Source
Herb-target associations
1254 herbs, 2208 targets and 168797 herb-target links
HIT2
Herb-compound associations
1254 herbs, 1237 compound and 14985 herb-compound links
HIT2
Herb-efficacy associations
829 herbs, 373 efficacies and 3830 herb-efficacy links
CHPA
Herb-symptom associations
465 herbs, 1027 symptoms and 16739 herb-symptom links
CHPA
Herb-herb associations
809 herbs and 1912 links
Herbs linked with similar efficacy
Protein-protein interactions
10622 proteins and 25133 interactions
String10
Symptom-symptom associations
1027 symptoms and 7220 links
SemMedDB
Assume, there are m types of herbs and n types of efficacies. Each herb is represented by a vector of efficacy Va = (w1, w2, …, wj, …, wn), where wi=1 indicates that herb Va has relationship with efficacy j, otherwise there is no relationship. The efficacy-based cosine similarity of herb Va and herb Vb can be calculated . Relationships between symptoms were analyzed using text mining methodologies. Initially, these connections were drawn from the Semantic MEDLINE Database (SemMedDB), which includes ternary semantic relationships sourced from the MEDLINE database via the biomedical semantic relation extraction tool, SemRep. The significance of each relationship was assessed using Fisher’s exact test ,with relationships displaying a significance level of P<0.05 deemed reliable. Additionally, protein-protein interactions were obtained from a well-known gene-gene interaction network database. For relationships with weights exceeding 700, a filtering process was applied, followed by linear
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Most review authors applied multiple strategies to identify additional trials. Therefore, the summation of percentages exceeds 100%.