Facebook
TwitterKEGG LIGAND contains knowledge of chemical substances and reactions that are relevant to life. It is a composite database consisting of COMPOUND, GLYCAN, REACTION, RPAIR, and ENZYME databases, whose entries are identified by C, G, R, RP, and EC numbers, respectively. ENZYME is derived from the IUBMB/IUPAC Enzyme Nomenclature, but the others are internally developed and maintained. The primary database of KEGG LIGAND is a relational database with the KegDraw interface, which is used to generated the secondary (flat file) database for DBGET.
Facebook
TwitterAttribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
The datasets (hgnc_complete_set and withdrawn) used to create this ID mapping database were downloaded from HGNC (HUGO Gene Nomenclature Committee at the European Bioinformatics Institute, website URL: https://www.genenames.org/) on 09/05/2022.
This database was used for the BridgeDb demo at BioSB 2022 conference.
The scripts used to create this database based on HGNC: https://github.com/tabbassidaloii/create-bridgedb-secondary2primary
This work was funded by the FAIRplus project (grant agreement no 802750) and NWO Open Science Fund (grant no 203.001.121).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Includes data files and supplemental information. Supplemental information includes a reproducible RMarkdown file, an Excel sheet with metadata, and complete webpage files. Please note that CCD nonfiscal documentation files have been downloaded manually.From the Common Core of Data website:The Common Core of Data (CCD) is the Department of Education's primary database on public elementary and secondary education in the United States. CCD is a comprehensive, annual, national database of all public elementary and secondary schools and school districts.Information on the Common Core of Data (CCD)The primary purpose of the CCD is to provide basic information on public elementary and secondary schools, local education agencies (LEAs), and state education agencies (SEAs) for each state, the District of Columbia, and the outlying territories with a U.S. relationship. CCD is composed of two components: Nonfiscal CCD and Fiscal CCD.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The purpose of the collection of outpatient health statistics is to monitor, evaluate and plan curative and preventive health care at the primary and secondary level of health care system.
Data on outpatient statistics are an important source of information for population health monitoring indicators
and accessibility of outpatient health care activities in Slovenia. Health care providers collect data for each individual contact of the patients with the health service. It is reported by public and private healthcare providers.
Outpatient health statistics record contacts and services at general practicioners and specialist outpatient activities at the secondary level.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes bibliographic information for 501 papers that were published from 2010-April 2017 (time of search) and use online biodiversity databases for research purposes. Our overarching goal in this study is to determine how research uses of biodiversity data developed during a time of unprecedented growth of online data resources. We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. Specifically, we address the following questions:
1.) What primary biodiversity databases have been cited in published research, and which
databases have been cited most often?
2.) Is the biodiversity research community citing databases appropriately, and are
the cited databases currently accessible online?
3.) What are the most common uses, general taxa addressed, and data linkages, and how
have they changed over time?
4.) What uses have the highest impact, as measured through the mean number of citations
per year?
5.) Are certain uses applied more often for plants/invertebrates/vertebrates?
6.) Are links to specific data types associated more often with particular uses?
7.) How often are major data quality issues addressed?
8.) What data quality issues tend to be addressed for the top uses?
Relevant papers for this analysis include those that use online and openly accessible primary occurrence records, or those that add data to an online database. Google Scholar (GS) provides full-text indexing, which was important to identify data sources that often appear buried in the methods section of a paper. Our search was therefore restricted to GS. All authors discussed and agreed upon representative search terms, which were relatively broad to capture a variety of databases hosting primary occurrence records. The terms included: “species occurrence” database (8,800 results), “natural history collection” database (634 results), herbarium database (16,500 results), “biodiversity database” (3,350 results), “primary biodiversity data” database (483 results), “museum collection” database (4,480 results), “digital accessible information” database (10 results), and “digital accessible knowledge” database (52 results)--note that quotations are used as part of the search terms where specific phrases are needed in whole. We downloaded all records returned by each search (or the first 500 if there were more) into a Zotero reference management database. About one third of the 2500 papers in the final dataset were relevant. Three of the authors with specialized knowledge of the field characterized relevant papers using a standardized tagging protocol based on a series of key topics of interest. We developed a list of potential tags and descriptions for each topic, including: database(s) used, database accessibility, scale of study, region of study, taxa addressed, research use of data, other data types linked to species occurrence data, data quality issues addressed, authors, institutions, and funding sources. Each tagged paper was thoroughly checked by a second tagger.
The final dataset of tagged papers allow us to quantify general areas of research made possible by the expansion of online species occurrence databases, and trends over time. Analyses of this data will be published in a separate quantitative review.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data (hmdb_metabolites, released on 17/11/2021) used to create this ID mapping database was downloaded from HMDB (Human Metabolome Database, website URL: https://hmdb.ca/).
This database was used for the BridgeDb demo at BioSB 2022 conference.
The scripts used to create this database based on HGNC: https://github.com/tabbassidaloii/create-bridgedb-secondary2primary
This work was funded by the FAIRplus project (grant agreement no 802750) and NWO Open Science Fund (grant no 203.001.121).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
Facebook
TwitterWhile art is omnipresent in human history, the neural mechanisms of how we perceive, value and differentiate art has only begun to be explored. Functional magnetic resonance imaging (fMRI) studies suggested that art acts as secondary reward, involving brain activity in the ventral striatum and prefrontal cortices similar to primary rewards such as food. However, potential similarities or unique characteristics of art-related neuroscience (or neuroesthetics) remain elusive, also because of a lack of adequate experimental tools: the available collections of art stimuli often lack standard image definitions and normative ratings. Therefore, we here provide a large set of well-characterized, novel art images for use as visual stimuli in psychological and neuroimaging research. The stimuli were created using a deep learning algorithm that applied different styles of popular paintings (based on artists such as Klimt or Hundertwasser) on ordinary animal, plant and object images which were drawn from established visual stimuli databases. The novel stimuli represent mundane items with artistic properties with proposed reduced dimensionality and complexity compared to paintings. In total, 2,332 novel stimuli are available open access as “art.pics” database at https://osf.io/BTWNQ/ with standard image characteristics that are comparable to other common visual stimuli material in terms of size, variable color distribution, complexity, intensity and valence, measured by image software analysis and by ratings derived from a human experimental validation study [n = 1,296 (684f), age 30.2 ± 8.8 y.o.]. The experimental validation study further showed that the art.pics elicit a broad and significantly different variation in subjective value ratings (i.e., liking and wanting) as well as in recognizability, arousal and valence across different art styles and categories. Researchers are encouraged to study the perception, processing and valuation of art images based on the art.pics database which also enables real reward remuneration of the rated stimuli (as art prints) and a direct comparison to other rewards from e.g., food or money.Key Messages: We provide an open access, validated and large set of novel stimuli (n = 2,332) of standardized art images including normative rating data to be used for experimental research. Reward remuneration in experimental settings can be easily implemented for the art.pics by e.g., handing out the stimuli to the participants (as print on premium paper or in a digital format), as done in the presented validation task. Experimental validation showed that the art.pics’ images elicit a broad and significantly different variation in subjective value ratings (i.e., liking, wanting) across different art styles and categories, while size, color and complexity characteristics remained comparable to other visual stimuli databases.
Facebook
TwitterThe TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Primary roads are generally divided, limited-access highways within the interstate highway system or under State management, and are distinguished by the presence of interchanges. These highways are accessible by ramps and may include some toll highways. The MAF/TIGER Feature Classification Code (MTFCC) is S1100 for primary roads. Secondary roads are main arteries, usually in the U.S. Highway, State Highway, and/or County Highway system. These roads have one or more lanes of traffic in each direction, may or may not bedivided, and usually have at-grade intersections with many other roads and driveways. They usually have both a local name and a route number. The MAF/TIGER Feature Classification Code (MTFCC) is S1200 for secondary roads.
Facebook
Twitterhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Intego-II database builds on three decades of primary care data collection in Flanders, Belgium. Since 1994 pseudonymized electronic medical record (EMR) data from participating general practices are collected within Intego. Its integration with Healthdata.be provides scalable linkage to mortality, environmental, and disease-specific datasets at the national level. Intego-II incorporates substantial advancements in the database’s structure, operations, and accessibility. A robust two-step Extract-Transform-Load (ETL) process ensures data security, privacy, tidiness, and quality. To enhance international research interoperability, the database is aligned with the OMOP Common Data Model. Intego-II is organized into three key modules: Patient Information, Medical History, and Clinical Encounters, enabling longitudinal analyses across diverse healthcare domains covering, among others, demographic variables, diagnoses, prescriptions, and laboratory test results. Structured quarterly releases with detailed metadata ensure findability and reusability. Researchers can access the full Intego-II database via a secure research environment provided by Healthdata.be, following submission and approval of a study protocol. The data access process can be found on www.intego.be.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CyanoMetDB is a comprehensive database of secondary metabolites from cyanobacteria manually curated from primary references described in Jones et al (2021), DOI: 10.1016/j.watres.2021.117017 (preprint DOI: 10.1101/2020.04.16.038703). This upload contains the 2023 release. Please cite Jones et al (2021) DOI: 10.1016/j.watres.2021.117017 and this record Janssen et al (2023) DOI: 10.5281/zenodo.7922070 when using this CyanoMetDB Version 2!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the collection associated with list S75 CyanoMetDB Comprehensive database of secondary metabolites from cyanobacteria on the NORMAN Suspect List Exchange.
https://www.norman-network.com/nds/SLE/
CyanoMetDB is a comprehensive database of secondary metabolites from cyanobacteria manually curated from primary references described in Jones et al (2021), DOI: 10.1016/j.watres.2021.117017 (preprint DOI: 10.1101/2020.04.16.038703)
Contents:
Original database: CyanoMetDB_WR_Feb2021.csv and CyanoMetDB_WR_Feb2021.xlsx
Reference information (only) expanding Ref entries in database: CyanoMetDB_References_v02_WR_Feb2021.csv and CyanoMetDB_References_v02_WR_Feb2021.xlsx
MetFrag local CSV file (original database abridged and reformatted for use in MetFrag): CyanoMetDB_MetFrag_Feb2021.csv
Additional files for matching InChIKeys (rapid suspect flagging): CyanoMetDB_v02_InChIKeys.txt
DTXSIDs for CompTox LIST use only: CyanoMetDB_v02_DTXSIDs.txt
Changes in versions are documented in the changelog file: CyanoMetDB_WR_changelog.txt
NOTE: several "names" fields contain special characters, please use CSV name column entries with caution (refer to "*_CSV columns for plain text forms). Original entries with special characters are in the respective XLSX files.
Facebook
TwitterThis genomic tRNA database contains tRNA gene predictions made by the program tRNAscan-SE (Lowe & Eddy, Nucl Acids Res 25: 955-964, 1997) on complete or nearly complete genomes. Unless otherwise noted, all annotation is automated, and has not been inspected for agreement with published literature. Transfer RNAs (tRNAs) represent the single largest, best-understood class of non-protein coding RNA genes found in all living organisms. By far, the major source of new tRNAs is computational identification of genes within newly sequenced genomes. To organize the rapidly growing collection and enable systematic analyses, we created the Genomic tRNA Database (GtRNAdb). The web resource provides overview statistics of tRNA genes within each analyzed genome, including information by isotype and genetic locus, easily downloadable primary sequences, graphical secondary structures and multiple sequence alignments. Direct links for each gene to UCSC eukaryotic and microbial genome browsers provide graphical display of tRNA genes in the context of all other local genetic information. The database can be searched by primary sequence similarity, tRNA characteristics or phylogenetic group. Inevitably with automated sequence analysis, we find exceptions to general identification rules, isoacceptor type predictions (esp. due to variable post-transcriptional anticodon modification), and questionable tRNA identifications (due to pseudogenes, SINES, or other tRNA-derived elements). We attempt to document all cases we come across, and welcome feedback on new or unrecognized discrepancies.
Facebook
Twitterhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1A pool of 995 amino acid sequences of influenza A virus nucleoprotein were retrieved from the Entrez protein database of National Center for Biotechnology Information (NCBI). This includes 46 NP sequences isolated from wild birds (quail and gull), 546 from domestic birds (chicken, duck and goose), 187 from humans, 156 from swine, and 20 from equine. Each NP sequence was derived from one isolate of influenza A virus. Sequence alignment was performed by using MAFF (version 5.8) multiple sequence alignment program accessible at http://us.expasy.org. Listed are the ten naturally occurring Db-restricted NP366 variants with amino acid mutations at the potential TCR contact positions (position 4, 6, 7 and 8 of the peptides). Mutations at the primary and secondary Db anchor positions of the NP366 variants (position 3, 5 and 9) are anticipated to result in the considerable loss of the peptide binding to the MHC molecule, thus not included for further experimental analyses in the present study. One representative strain of influenza A virus for each NP366 variant identified is listed to illustrate the serological heterogeneity of the influenza A viruses that bear these CTL epitope mutations in nature. Amino acids underlined represent mutations relative to PR8-NP366 sequence.
Facebook
TwitterSPACK is a spatio-temporal database dedicated to whaling, sealing and fishing history. It aims to gather miscellaneous and scattered sources about whaling, sealing and fishing voyages that visited Saint-Paul, Amsterdam, Crozet and Kerguelen Islands between 1780’s and 1930’s.
SPACK has been defined and populated during a PhD thesis in history. The main purpose is to assess the attendance of whaling, sealing and fishing ships around the French Southern Islands from the late 18th century. The goal is also to shed light on the issues arising from the first public policies for managing natural resources once French sovereignty was affirmed in the late 19th.
The data collected in SPACK are stored in the object-relational database, PostgreSQL, plus its spatial extension PostGIS. This repository can be used to create a new instance of the SPACK database. It contains 7 SQL files that represent the main tables of the SPACK model.
attested_presence_areas: this table shows the dates on which the vessel is present in the area.
code_areas: this table indicates the codes used to identify each covered area.
code_sealing_gangs: this table shows the code used to indicate when a gang of hunters has been dropped off or relieved on shore by the ship.
natural_resources: this table provides the codes used to classify vessel activity by 'area'.
shipment_origin: this table lists the codes for the main shipowner's geographical origin.
stop_over_voyages: this table describes the date of arrival and departure by 'area'. It also indicates the degree of interpolation of the data, month or day.
voyages_areas: this table contains a list of vessels involved in whaling, fishing and sealing activities that crossed Saint-Paul, Amsterdam, Crozet or/and Kerguelen islands. It provides information such as vessel name, rig type, tonnage, port, shipment origin, natural resource exploited, agent, dates of presence, primary and secondary sources.
The main entity of this database is a ship attached to a voyage and a geographical area. This entity is described by a set of properties: ship’s and master’s names, geographical origin, shipowner, port, arrival and departure dates. Those data are featured in the voyages_areas table. The database also provides other helpful information, such as the dates of attendance on the island, the type of natural resource exploited and the sources used to identify a voyage.
The SPACK database takes profit from the Whaling History Database (https://whalinghistory.org/). It does not contain any data imported from WHDB, but it is still possible to link the two sources. Indeed, the voyages_areas table stores the identifier used by the WHDB to describe each voyage.
The WHDB provides the vessel's location in lat/lon for several voyages. Those locations have been processed to populate the voyages_areas table and check when a voyage crossed a study area: Saint-Paul, Amsterdam, Crozet or Kerguelen Islands. However, no spatial information is saved in the SQL files. You can contact the authors if you want more information about the spatial analysis techniques used.
Facebook
TwitterThe TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Primary roads are generally divided, limited-access highways within the interstate highway system or under State management, and are distinguished by the presence of interchanges. These highways are accessible by ramps and may include some toll highways. The MAF/TIGER Feature Classification Code (MTFCC) is S1100 for primary roads. Secondary roads are main arteries, usually in the U.S. Highway, State Highway, and/or County Highway system. These roads have one or more lanes of traffic in each direction, may or may not bedivided, and usually have at-grade intersections with many other roads and driveways. They usually have both a local name and a route number. The MAF/TIGER Feature Classification Code (MTFCC) is S1200 for secondary roads.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PurposePrevious research has shown that bladder cancer has one of the highest incidences of developing a second primary malignancy. So, we designed this study to further examine this risk in light of race and histology.Patients and methodsUsing the surveillance, epidemiology, and end results (SEER) 18 registry, we retrospectively screened patients who had been diagnosed with bladder cancer between 2000 and 2018. We then tracked these survivors until a second primary cancer diagnosis, the conclusion of the trial, or their deaths. In addition to doing a competing risk analysis, we derived standardized incidence ratios (SIRs) and incidence rate ratios (IRRs) for SPMs by race and histology.ResultsA total of 162,335 patients with bladder cancer were included, and during follow-ups, a second primary cancer diagnosis was made in 31,746 of these patients. When the data were stratified by race, SIRs and IRRs for SPMs showed a significant difference: Asian/Pacific Islanders (APIs) had a more pronounced increase in SPMs (SIR: 2.15; p 0.05) than White and Black individuals who had an SIRs of 1.69 and 1.94, respectively; p 0.05. In terms of histology, the epithelial type was associated with an increase in SPMs across all three races, but more so in APIs (IRR: 3.51; 95% CI: 2.11–5.85; p 0.001).ConclusionWe found that race had an impact on both the type and risk of SPMs. Additionally, the likelihood of an SPM increases with the length of time between the two malignancies and the stage of the index malignancy.
Facebook
TwitterKEGG LIGAND contains knowledge of chemical substances and reactions that are relevant to life. It is a composite database consisting of COMPOUND, GLYCAN, REACTION, RPAIR, and ENZYME databases, whose entries are identified by C, G, R, RP, and EC numbers, respectively. ENZYME is derived from the IUBMB/IUPAC Enzyme Nomenclature, but the others are internally developed and maintained. The primary database of KEGG LIGAND is a relational database with the KegDraw interface, which is used to generated the secondary (flat file) database for DBGET.