Abstract: Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data are spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge excerpt out of the interlinked databases. A prerequisit for supporting the concept of an integrated data view is the to acquiring insights into cross-references among database entities. But only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predict and extracts cross-references from multiple life science databases and thier possible referenced data targets. We study the retrieval quality of our method and the relationship between manually crafted relevance ranking and relevance ranking based on cross-references, and report on first, promising results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview of studies investigating the association of ID and an NDD and / or prevalence studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
which is the largest to-date and is also made available in public domain to advance much needed research in this area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A zip file containing training/evaluation data sets for the bio-answerfinder biomedical question answering system training and evaluation. The zip file also contains SQLite databases for named entity lookups, morphology, nominalizations, acronyms, PubMED trained GLoVe word/phrase embeddings and vocabulary with document frequencies and SciCrunch ontology data for named entities such as proteins, anatomical structures,
Bibliography to assist in identifying methods and procedures helpful in supporting the development, testing, application, and validation of alternatives to the use of vertebrates in biomedical research and toxicology testing. This bibliography is produced from MEDLINE database searches, performed and analyzed by subject experts from the Toxicology and Environmental Health Information Program (TEHIP) of the Specialized Information Services Division (SIS) of the National Library of Medicine (NLM). The purpose of these bibliographies on animal alternatives is to provide a survey of the literature in a format which facilitates easy scanning. This bibliography includes citations from published articles, books, book chapters, and technical reports. Citations to items in non-English languages are indicated with brackets around the title. The language is also indicated. Citations with abstracts or annotations relating to the method are organized under subject categories. This publication features citations which deal with methods, tests, assays or procedures which may prove useful in establishing alternatives to the use of intact vertebrates. Citations are selected and compiled through searching various computerized on-line bibliographic databases of the National Library of Medicine, National Institutes of Health. The focus of the bibliography is to assist in identifying methods and procedures helpful in supporting the development, testing, application, and validation of alternatives to the use of vertebrates in biomedical research and toxicology testing. Toxicology Databases
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Journal lists of all the 46 Sub-Saharan African countries were retrieved manually from Ulrich periodical database using the "country of publication" field in the advanced search interface. Delimiters were used to limit the retrieved results to periodicals in the journal categories and with active status. Ulrich's database usually multiple records for the different formats (eg. online and print), or languages in which a single journal is published. Duplicates were removed from the retrieved results.
Master journal lists for Web of Science indexes comprising of the Science Citation Index Expanded (SCIE), the Social Science Citation Index (SSCI) and the Arts and Humanities Citation Index (A&HCI) and Emerging Sources Citation Index ESCI. Master journal lists for Scopus, EMBASE and MEDLINE databases were downloaded from their respective publishers' websites. Master journal lists for AJOL was not available on the publishers' website. Therefore, the master journal list from AJOL was created manually by extracting journal information from the publishers' websites. Only active journals were included in the study, where active journals were defined as journals that have published at least an issue in 2021 or 2020. The master journal list for AIM was not available as well. The whole database comprising of 18,949 articles were downloaded with the source (journal names). Journals were sorted to identify unique journal names, where only 15,279 articles had identifiable journal names. Five hundred twenty-four unique journals were identified, with only 74 active journals. Journals that were not indexed in the AIM database in 2020 or 2021 were deemed inactive and were not included in the study. This study was not considered for ethics review because data used was collected from publicly available records.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data from the paper "The landscape of biomedical research" (https://www.biorxiv.org/content/10.1101/2023.04.10.536208v1).
The paper used the PubMed 2020 baseline (download date: 26.01.2021, not available anymore) supplemented with additional files from the 2021 baseline (download date: 27.04.2022, not available anymore), both originally obtained from https://www.nlm.nih.gov/databases/download/pubmed_medline.html, courtesy of the U.S. National Library of Medicine.
The data provided here includes the following files:
pubmed_landscape_data.zip, which includes:
- from the PubMed database: article title, journal, PMID, and publication year.
- produced by us: t-SNE embedding X and Y coordinates, label, and color.
pubmed_landscape_abstracts.zip, which includes:
- from the PubMed database: PMID, and paper abstracts.
PubMedBERT_embeddings_float16.npy, which includes:
- produced by us: PubMedBERT embeddings of the paper abstracts (numpy.ndarray of shape 20,687,150x768).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Baseline characteristics and methodological quality of all included studies.
Comprehensive international bibliographic biomedical database that enables users to track and retrieve precise information on drugs and diseases from pre-clinical studies to searches on critical toxicological information. It contains bibliographic records with citations, abstracts and indexing derived from biomedical articles in peer reviewed journals, and is especially strong in its coverage of drug and pharmaceutical research. Embase can help with everything from clinical trials research to pharmacovigilance and is updated online daily and weekly. Its broad biomedical scope covers the following areas: * Drug therapy and research, including pharmaceutics, pharmacology and toxicology * Clinical and experimental (human) medicine * Basic biological science relevant to human medicine * Biotechnology and biomedical engineering, including medical devices * Health policy and management, including pharmacoeconomics * Public, occupational and environmental health, including pollution control * Veterinary science, dentistry, and nursing The Embase Application Programming Interface supports export, RSS feeds, and integration services, making it possible to share data with a wide range of systems.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
C. Bai
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this database, 238 titanium alloys were collected, almost entirely of biocompatible alloying elements. The primary motivation behind creating such a database is to establish a foundation for designing new alloys using machine learning methods. The database can assist researchers, engineers, and biomedical professionals in developing titanium alloys for various medical applications, thereby improving health outcomes and driving advancements in biomaterials and biomedical engineering.
For more information read the paper at: https://doi.org/10.30544/MMD5
NOTE: To avoid misunderstandings, please cite both the database and the published article when citing this database.
We invite other authors to contribute to the updating of this database (send at least 20 new alloys to appear as co-author)
Databases that represent sets of pre-compiled information on biological relationships and associations, interactions and facts which have been extracted from the biomedical literature using Ariadne's MedScan technology. ResNet databases store information harvested from the entire PubMed in a formal structure that allows searching, retrieval and updating by Pathway Studio user. ResNet is seamlessly installed when Pathway Studio is installed. There are several available ResNet databases: *ResNet Mammalian Database includes data for Human, Rat, and Mouse *ResNet Plant Database has data on Arabidopsis, Rice and several other plants. Features of ResNet: *All extracted relations have linked access to the original article or abstract *Synonyms and homologs are included to maintain gene identity and to obviate redundancy in search results *Users can update ResNet as often as required using the MedScan technology built into all Ariadne products *Updates are made available by Ariadne every quarter To purchase Pathway Studio software with ResNet database, for information, or to schedule a web demonstration, call our sales department at (240) 453-6272, or (866) 340-5040 (toll free).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The amount of digital data derived from healthcare processes have increased tremendously in the last years. This applies especially to unstructured data, which are often hard to analyze due to the lack of available tools to process and extract information. Natural language processing is often used in medicine, but the majority of tools used by researchers are developed primarily for the English language. For developing and testing natural language processing methods, it is important to have a suitable corpus, specific to the medical domain that covers the intended target language. To improve the potential of natural language processing research, we developed tools to derive language specific medical corpora from publicly available text sources. n order to extract medicine-specific unstructured text data, openly available pub-lications from biomedical journals were used in a four-step process:(1) medical journal databases were scraped to download the articles,(2) the articles were parsed and consolidated into a single repository,(3) the content of the repository was de-scribed, and (4) the text data and the codes were released. In total, 93 969 articles were retrieved, with a word count of 83 868 501 in three different languages (German, English, and Spanish) from two medical journal databases Our results show that unstructured text data extraction from openly available medical journal databases for the construction of unified corpora of medical text data can be achieved through web scraping techniques.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Liver cancer (347 patients). Binary classification Normal vs Tumor Tissue.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data sources of MKG with structured medical knowledge database and unstructured scientific publicationsSource TypeNameRelated researchesStructured medical knowledge databaseKEGG[20]SIDER[21]ICD-10[22]InterBioScreen[23]DrugBank[24]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a set of databases containing published use of substances which can be applied to rodents in order to contrast specific structures for optical intravital microscopy.
The first dataset contains applied final dosages, calculated for 25g-mice, as well as the orignally published amounts, concentrations and application routes of agents directly applied into the target organism.
The second dataset contains dosages and cell numbers for the external contrastation and subsequent application of cells into the target organism.
Filtering possible for organ system and contrasted structure/cell type in both datasets, substance class and fluorescent detection windows can be filtered in the dataset for direct agent application.
Source publications are listed by DOI.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes the biomedical abbreviations stated between parentheses in the titles of the scholarly publications indexed by PubMed between 1947 and 2019. Each abbreviation is extracted thanks to the parenthetic level count algorithm and is assigned to the title, PMID and year of publication of each corresponding research paper. Then, every acronym is allocated its length and the number of upper and lower case letters it involves. Finally, the entities including one or no upper case letter, less than three characters, eight characters or more, or a high rate of non-alphanumeric characters are semi-automatically eliminated to ensure the consistency of the research database.
Medical Subject Headings (MeSH) is a hierarchically-organized terminology for indexing and cataloging of biomedical information. It is used for the indexing of PubMed and other NLM databases. Please see the Terms and Conditions for more information regarding the use and re-use of MeSH. NLM produces Medical Subject Headings XML, ASCII, MARC 21 and RDF formats. Updates to the data files are made according to the following schedule: MeSH XML MeSH Descriptor files updated annually MeSH Qualifier files updated annually MeSH Supplemental Concept Records (SCR) updated daily (Monday - Friday) MeSH ASCII MeSH Descriptor files updated annually MeSH Qualifier files updated annually MeSH Supplemental Concept Records (SCR) updated daily (Monday - Friday) MeSH MARC21 All files posted monthly MeSH RDF All files posted daily (Monday - Friday)
A database of federally funded biomedical research projects conducted at universities, hospitals, and other research institutions that provides a central point of access to reports, data, and analyses of NIH research. The RePORTER has replaced the CRISP database. The database, maintained by the Office of Extramural Research at the National Institutes of Health, includes projects funded by the National Institutes of Health (NIH), Substance Abuse and Mental Health Services (SAMHSA), Health Resources and Services Administration (HRSA), Food and Drug Administration (FDA), Centers for Disease Control and Prevention (CDCP), Agency for Health Care Research and Quality (AHRQ), and Office of Assistant Secretary of Health (OASH).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
[Lexical/terminological resource] First release of the Spanish Medical Abbreviation DataBase (AbreMES-DB).
The database is created automatically by detecting abbreviations and their potential definitions explicitly mentioned in the same sentence. These abbreviations are extracted from the metadata of different biomedical publications written in Spanish, which contain the titles and abstracts. The sources of these publications are SciELO, IBECS and Pubmed.
Abstract: Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data are spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge excerpt out of the interlinked databases. A prerequisit for supporting the concept of an integrated data view is the to acquiring insights into cross-references among database entities. But only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predict and extracts cross-references from multiple life science databases and thier possible referenced data targets. We study the retrieval quality of our method and the relationship between manually crafted relevance ranking and relevance ranking based on cross-references, and report on first, promising results.