Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chapter from: The Fourth Paradigm: Data-Intensive Scientific Discovery Presenting the first broad look at the rapidly emerging field of data-intensive science. 2009
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In 2020, EPFL Library conducted a study about Tools and Metadata Standards practice in EPFL School of Life Sciences.
By standard, we mean:
- terminological resources (vocabularies, terminologies, classifications, thesauri),
- formats and data models / schemas,
- structured knowledge bases (databases, reference databases, ontologies).
And by tools, we mean:
- bioinformatics software (i.e. for sequence or molecular structure analysis of proteins and genes)
- databases from the Life Sciences field (i.e. genome databases).
Our goal was twofold: on the one hand, to gain new knowledge and insights, and on the other hand, to develop a reproducible survey methodology resolutely based on liaison librarian-data librarian collaboration.
This dataset reflects the results collected during the second phase of the study: "Survey in EPFL Life Sciences Community".
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
The COVID-19 pandemic has shown that bioinformatics--a multidisciplinary field that combines biological knowledge with computer programming concerned with the acquisition, storage, analysis, and dissemination of biological data--has a fundamental role in scientific research strategies in all disciplines involved in fighting the virus and its variants. It aids in sequencing and annotating genomes and their observed mutations; analyzing gene and protein expression; simulation and modeling of DNA, RNA, proteins and biomolecular interactions; and mining of biological literature, among many other critical areas of research. Studies suggest that bioinformatics skills in the Latin American and Caribbean region are relatively incipient, and thus its scientific systems cannot take full advantage of the increasing availability of bioinformatic tools and data. This dataset is a catalog of bioinformatics software for researchers and professionals working in life sciences. It includes more than 300 different tools for varied uses, such as data analysis, visualization, repositories and databases, data storage services, scientific communication, marketplace and collaboration, and lab resource management. Most tools are available as web-based or desktop applications, while others are programming libraries. It also includes 10 suggested entries for other third-party repositories that could be of use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The software and data provided herein is free for academic instruction and research use only. Commercial licenses are available to legal entities, including companies and organizations (both for-profit and non-profit), requiring the software for general commercial use. To obtain a commercial license please, contact us via e-mail.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Databases for three populations studied in above formative research in Stata 13. Final report document also attached with questionnaires as annex. Coding corresponds to question number.
Données ayant servi au projet "Développement d’outils de bio-indication «phytobenthos» et «macro-invertébrés benthiques» pour les eaux de surface continentales de Mayotte". Ce projet a été financé par l'AFB, Agence Française pour la Biodiversité Les correspondants AFB: Olivier MONNIER et Yorick REYJOL (chargés de mission)
The BioStudies database holds descriptions of biological studies, links to data from these studies in other databases at EMBL-EBI or outside, as well as data that do not fit in the structured archives at EMBL-EBI. The database can accept a wide range of types of studies described via a simple format. It also enables manuscript authors to submit supplementary information and link to it from the publication.
This dataset contains the results of a systematic review of the genetics literature regarding Alzheimer's disease. It was used in the original paper to assess the demographic diversity of the studies and their underlying databases.
OReFiL is a database and a search system for online life science resources (databases, tools and web-services) mentioned in peer-reviewed papers. OReFiL covers MEDLINE entries and BioMed Central full-text papers to extract URLs of online resources. Users can search resources by free words, MeSH (Medical Subject Headings) terms and author names. Search results show titles of the hit resources with URLs, MeSH terms and links to corresponding PubMed entries, web pages and papers.
Archive of the covidestim.org databases.. Visit https://dataone.org/datasets/sha256%3A3d463136190bb9250e5e78d4b9182692dfe1f27592c1a49b766b03c78f77d306 for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundMeasures to ensure research integrity have been widely discussed due to the social, economic and scientific impact of research integrity. In the past few years, financial support for health research in emerging countries has steadily increased, resulting in a growing number of scientific publications. These achievements, however, have been accompanied by a rise in retracted publications followed by concerns about the quality and reliability of such publications.ObjectiveThis systematic review aimed to investigate the profile of medical and life sciences research retractions from authors affiliated with Brazilian academic institutions. The chronological trend between publication and retraction date, reasons for the retraction, citation of the article after the retraction, study design, and the number of retracted publications by author and affiliation were assessed. Additionally, the quality, availability and accessibility of data regarding retracted papers from the publishers are described.MethodsTwo independent reviewers searched for articles that had been retracted since 2004 via PubMed, Web of Science, Biblioteca Virtual em Saúde (BVS) and Google Scholar databases. Indexed keywords from Medical Subject Headings (MeSH) and Descritores em Ciências da Saúde (DeCS) in Portuguese, English or Spanish were used. Data were also collected from the Retraction Watch website (www.retractionwatch.com). This study was registered with the PROSPERO systematic review database (CRD42017071647).ResultsA final sample of 65 articles was retrieved from 55 different journals with reported impact factors ranging from 0 to 32.86, with a median value of 4.40 and a mean of 4.69. The types of documents found were erratum (1), retracted articles (3), retracted articles with a retraction notice (5), retraction notices with erratum (3), and retraction notices (45). The assessment of the Retraction Watch website added 8 articles that were not identified by the search strategy using the bibliographic databases. The retracted publications covered a wide range of study designs. Experimental studies (40) and literature reviews (15) accounted for 84.6% of the retracted articles. Within the field of health and life sciences, medical science was the field with the largest number of retractions (34), followed by biological sciences (17). Some articles were retracted for at least two distinct reasons (13). Among the retrieved articles, plagiarism was the main reason for retraction (60%). Missing data were found in 57% of the retraction notices, which was a limitation to this review. In addition, 63% of the articles were cited after their retraction.ConclusionPublications are not retracted solely for research misconduct but also for honest error. Nevertheless, considering authors affiliated with Brazilian institutions, this review concluded that most of the retracted health and life sciences publications were retracted due to research misconduct. Because the number of publications is the most valued indicator of scientific productivity for funding and career progression purposes, a systematic effort from the national research councils, funding agencies, universities and scientific journals is needed to avoid an escalating trend of research misconduct. More investigations are needed to comprehend the underlying factors of research misconduct and its increasing manifestation.
Long-Term ST Database carefully split into training and testing datasets.
The International Nucleotide Sequence Database Collaboration (INSDC) consists of a joint effort to collect and disseminate databases containing DNA and RNA sequences.
The National Bioscience Database Center (NBDC) intends to integrate all databases for life sciences in Japan, by linking each database with expediency to maximize convenience and make the entire system more user-friendly. We aim to focus our attention on the needs of the users of these databases who have all too often been neglected in the past, rather than the needs of the people tasked with the creation of databases. It is important to note that we will continue to honor the independent integrity of each database that will contribute to our endeavor, as we are fully aware that each database was originally crafted for specific purposes and divergent goals. Services: * Database Catalog - A catalog of life science related databases constructed in Japan that are also available in English. Information such as URL, status of the database site (active vs. inactive), database provider, type of data and subjects of the study are contained for each database record. * Life Science Database Cross Search - A service for simultaneous searching across scattered life-science databases, ranging from molecular data to patents and literature. * Life Science Database Archive - maintains and stores the datasets generated by life scientists in Japan in a long-term and stable state as national public goods. The Archive makes it easier for many people to search datasets by metadata in a unified format, and to access and download the datasets with clear terms of use. * Taxonomy Icon - A collection of icons (illustrations) of biological species that is free to use and distribute. There are more than 200 icons of various species including Bacteria, Fungi, Protista, Plantae and Animalia. * GenLibi (Gene Linker to bibliography) - an integrated database of human, mouse and rat genes that includes automatically integrated gene, protein, polymorphism, pathway, phenotype, ortholog/protein sequence information, and manually curated gene function and gene-related or co-occurred Disease/Phenotype and bibliography information. * Allie - A search service for abbreviations and long forms utilized in life sciences. It provides a solution to the issue that many abbreviations are used in the literature, and polysemous or synonymous abbreviations appear frequently, making it difficult to read and understand scientific papers that are not relevant to the reader's expertise. * inMeXes - A search service for English expressions (multiple words) that appear no less than 10 times in PubMed/MEDLINE titles or abstracts. In addition, you can easily access the sentences where the expression was used or other related information by clicking one of the search results. * HOWDY - (Human Organized Whole genome Database) is a database system for retrieving human genome information from 14 public databases by using official symbols and aliases. The information is daily updated by extracting data automatically from the genetic databases and shown with all data having the identifiers in common and linking to one another. * MDeR (the MetaData Element Repository in life sciences) - a web-based tool designed to let you search, compare and view Data Elements. MDeR is based on the ISO/IEC 11179 Part3 (Registry metamodel and basic attributes). * Human Genome Variation Database - A database for accumulating all kinds of human genome variations detected by various experimental techniques. * MEDALS - A portal site that provides information about databases, analysis tools, and the relevant projects, that were conducted with the financial support from the Ministry of Economy, Trade and Industry of Japan.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Text files containing data output from CellProfiler analysis of image files.
Originally developed in 1963, the Discharge Abstract Database (DAD) captures administrative, clinical and demographic information on hospital discharges (including deaths, sign-outs and transfers). Some provinces and territories also use the DAD to capture day surgery. Data extracted from the DAD is used to populate other CIHI databases, including The Hospital Morbidity Database (HMDB) The Hospital Mental Health Database (HMHDB)
phytoplanktpon, macrophytes and background abiotic data of lakes in Estonia More information on this dataset can be found in the Freshwater Metadatabase - BF_W_32-L-CB (http://www.freshwatermetadata.eu/metadb/bf_mdb_view.php?entryID=BF_W_32-L-CB).
Dataset III and dictionary III. Visit https://dataone.org/datasets/sha256%3Acebd8a39fc73ca76a1153ac3654dac88bd84ca853c12272075d8f142b5a30c52 for complete metadata about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chapter from: The Fourth Paradigm: Data-Intensive Scientific Discovery Presenting the first broad look at the rapidly emerging field of data-intensive science. 2009