Facebook
TwitterThe ability to correctly and consistently identify sea turtles over time was evaluated using digital imagery of the turtles dorsal and side views of their heads and dorsal views of their carapaces
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern methods of mass spectrometry have emerged recently allowing reliable, fast and cost-effective identification of pathogenic microorganisms. For example, matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (MS) has revolutionized the way pathogenic microorganisms are identified in today’s routine clinical microbiology. Furthermore, recent years have witnessed also substantial progress in the development of liquid chromatography-mass spectrometry (LC-MS) based proteomics for microbiological applications.
In this context, we introduce a new concept for microbial identification by mass spectrometry. The proposed approach involves efficient extraction of proteins from cultivated microbial cells, digestion by trypsin and LC-MS measurements. MS1 data are then extracted and systematically tested against in silico libraries of peptide mass data. The first version of such a database has been computed from UniProt Knowledgebase [Swiss-Prot and TrEMBL] and contains more than 12,000 strain-specific synthetic mass profiles. The database is stored in the pkf data format which is interpretable by the MicrobeMS software package (requires MicrobeMS version 0.82, or later).
For details see the following preprint: Lasch, P. Schneider, A. Blumenscheit, C. and Doellinger, J. “Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) and in silico Peptide Mass Data”. bioRxiv preprint, http://dx.doi.org/10.1101/870089.
Facebook
TwitterSearchable database for predicted protein sequences and structures. It has the ability to search through PDB ID, UniProt ID, and descriptive classifiers.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The 2011 United States Automatic Identification System Database contains vessel traffic data for planning purposes within the U.S. coastal waters. The database is composed of 204 self-contained File Geodatabases (FGDB). Each FGDB represents one month of data for a single UTM zone. The UTM zones represented cover the entire United States and include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 15, 16, 17, 18, and 19. Each FGDB consists of one feature class, four tables, and two relationship classes. The Broadcast point feature class contains the position reports, which have been pre-filtered to a one-minute time step. The Voyage table contains elements of the static data reports that are updated for each ship voyage. The Vessel table contains elements of the static data reports that are specific to a particular vessel. The BaseStations table lists the base stations collecting data for a particular month/UTM zone. The AttributeUnits table contains a list of units for each of the attribute fields in the Broadcast, Voyage, and Vessel tables. The BroadcastHasVessel relationship class relates the broadcast points to the vessel table records. The BroadcastHasVoyage relationship class relates the broadcast points to the voyage table records.
Facebook
TwitterThis project was undertaken to establish a computerized skeletal database composed of recent forensic cases to represent the present ethnic diversity and demographic structure of the United States population. The intent was to accumulate a forensic skeletal sample large and diverse enough to reflect different socioeconomic groups of the general population from different geographical regions of the country in order to enable researchers to revise the standards being used for forensic skeletal identification. The database is composed of eight data files, comprising four categories. The primary "biographical" or "identification" files (Part 1, Demographic Data, and Part 2, Geographic and Death Data) comprise the first category of information and pertain to the positive identification of each of the 1,514 data records in the database. Information in Part 1 includes sex, ethnic group affiliation, birth date, age at death, height (living and cadaver), and weight (living and cadaver). Variables in Part 2 pertain to the nature of the remains, means and sources of identification, city and state/country born, occupation, date missing/last seen, date of discovery, date of death, time since death, cause of death, manner of death, deposit/exposure of body, area found, city, county, and state/country found, handedness, and blood type. The Medical History File (Part 3) represents the second category of information and contains data on the documented medical history of the individual. Variables in Part 3 include general comments on medical history as well as comments on congenital malformations, dental notes, bone lesions, perimortem trauma, and other comments. The third category consists of an inventory file (Part 4, Skeletal Inventory Data) in which data pertaining to the specific contents of the database are maintained. This includes the inventory of skeletal material by element and side (left and right), indicating the condition of the bone as either partial or complete. The variables in Part 4 provide a skeletal inventory of the cranium, mandible, dentition, and postcranium elements and identify the element as complete, fragmentary, or absent. If absent, four categories record why it is missing. The last part of the database is composed of three skeletal data files, covering quantitative observations of age-related changes in the skeleton (Part 5), cranial measurements (Part 6), and postcranial measurements (Part 7). Variables in Part 5 provide assessments of epiphyseal closure and cranial suture closure (left and right), rib end changes (left and right), Todd Pubic Symphysis, Suchey-Brooks Pubic Symphysis, McKern & Steward--Phases I, II, and III, Gilbert & McKern--Phases I, II, and III, auricular surface, and dorsal pubic pitting (all for left and right). Variables in Part 6 include cranial measurements (length, breadth, height) and mandibular measurements (height, thickness, diameter, breadth, length, and angle) of various skeletal elements. Part 7 provides postcranial measurements (length, diameter, breadth, circumference, and left and right, where appropriate) of the clavicle, scapula, humerus, radius, ulna, scarum, innominate, femur, tibia, fibula, and calcaneus. A small file of noted problems for a few cases is also included (Part 8).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COins is a database of COI-5P sequences of insects that includes over 532,000 representative sequences of more than 106,000 species specifically formatted for the QIIME2 software platform. It was developed through a combination of automated and manually curated steps, starting from insects COI sequences available in the Barcode of Life Data System selecting sequences that comply to several standards, including a species-level identification.seq-degapped.qza --> reference sequencestaxonomy.qza --> sequences taxonomySklearnClassifier_COins_QIIME2_v2024.5.qza (NEW!) --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2024.5)SklearnClassifier_COins_QIIME2_v2023.5.qza --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2023.5)SklearnClassifier_COins_QIIME2_v2022.2.qza --> naïve Bayes taxonomic classifier trained on COins (QIIME2 version 2022.2)Sequences_metadata1.tsv --> Identification procedure of voucher specimens from which reference sequences were developed.Identification procedure is reported for each sequence included in COins (BOLD id reported in BOLDid reference column) and for all identical sequences within haplotypes that were removed at Step 5 of COins curation (those for which BOLD id is not available in BOLDid reference column). The haplotype to which each sequence belongs is reported in Haplotype column (haplotypes of each species are labeled with increasing numbers). Identification procedure information derived from sequences associated metadata provided by BOLD system.Sequences_metadata2.tsv -->Identical sequences belonging to different species present within COins.Each row represents a cluster of identical sequences associated to different species, sequences included in the cluster are labeled with species name and BOLD id.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Tandem mass spectrometry-based proteomics experiments produce large amounts of raw data, and different database search engines are needed to reliably identify all the proteins from this data. Here, we present Compid, an easy-to-use software tool that can be used to integrate and compare protein identification results from two search engines, Mascot and Paragon. Additionally, Compid enables extraction of information from large Mascot result files that cannot be opened via the Web interface and calculation of general statistical information about peptide and protein identifications in a data set. To demonstrate the usefulness of this tool, we used Compid to compare Mascot and Paragon database search results for mitochondrial proteome sample of human keratinocytes. The reports generated by Compid can be exported and opened as Excel documents or as text files using configurable delimiters, allowing the analysis and further processing of Compid output with a multitude of programs. Compid is freely available and can be downloaded from http://users.utu.fi/lanatr/compid. It is released under an open source license (GPL), enabling modification of the source code. Its modular architecture allows for creation of supplementary software components e.g. to enable support for additional input formats and report categories.
Facebook
TwitterThe U.S. Geological Survey (USGS), in cooperation with the Pennsylvania Department of Environmental Protection (PADEP), conducted an evaluation of data used by the PADEP to identify groundwater sources under the direct influence of surface water (GUDI) in Pennsylvania (Gross and others, 2022). The data used in this evaluation and the processes used to compile them from multiple sources are described and provided herein. Data were compiled primarily but not exclusively from PADEP resources, including (1) source-information for public water-supply systems and Microscopic Particulate Analysis (MPA) results for public water-supply system groundwater sources from the agency’s Pennsylvania Drinking Water Information System (PADWIS) database (Pennsylvania Department of Environmental Protection, 2016), and (2) results associated with MPA testing from the PADEP Bureau of Laboratories (BOL) files and water-quality analyses obtained from the PADEP BOL, Sample Information System (Pennsylvania Department of Environmental Protection, written commun., various dates). Information compiled from sources other than the PADEP includes anthropogenic (land cover and PADEP region) and naturogenic (geologic and physiographic, hydrologic, soil characterization, and topographic) spatial data. Quality control (QC) procedures were applied to the PADWIS database to verify spatial coordinates, verify collection type information, exclude sources not designated as wells, and verify or remove values that were either obvious errors or populated as zero rather than as “no data.” The QC process reduced the original PADWIS dataset to 12,147 public water-supply system wells (hereafter referred to as the PADWIS database). An initial subset of the PADWIS database, termed the PADWIS database subset, was created to include 4,018 public water-supply system community wells that have undergone the Surface Water Identification Protocol (SWIP), a protocol used by the PADEP to classify sources as GUDI or non-GUDI (Gross and others, 2022). A second subset of the PADWIS database, termed the MPA database subset, represents MPA results for 631 community and noncommunity wells and includes water-quality data (alkalinity, chloride, Escherichia coli, fecal coliform, nitrate, pH, sodium, specific conductance, sulfate, total coliform, total dissolved solids, total residue, and turbidity) associated with groundwater-quality samples typically collected concurrently with the MPA sample. The PADWIS database and two subsets (PADWIS database subset and MPA database subset) are compiled in a single data table (DR_2022_Table.xlsx), with the two subsets differentiated using attributes that are defined in the associated metadata table (DR_2022_Metadata_Table_Variables.xlsx). This metadata file (DR_2022_Metadata.xml) describes data resources, data compilation, and QC procedures in greater detail.
Facebook
TwitterGroundwater is a major drinking water resource but its quality with regard to organic micropollutants (MPs) is insufficiently assessed. Therefore, we aimed to investigate Swiss groundwater more comprehensively using liquid chromatography high-resolution tandem mass spectrometry (LC-HRMS/MS). First, samples from 60 sites were classified as having high or low urban or agricultural influence based on 498 target compounds associated with either urban or agricultural sources. Second, all LC-HRMS signals were related to their potential origin (urban, urban and agricultural, agricultural, or not classifiable) based on their occurrence and intensity in the classified samples. A considerable fraction of estimated concentrations associated with urban and/or agricultural sources could not be explained by the 139 detected targets. The most intense nontarget signals were automatically annotated with structure proposals using MetFrag and SIRIUS4/CSI:FingerID with a list of >988,000 compounds. Additionally, suspect screening was performed for 1162 compounds with predicted high groundwater mobility from primarily urban sources. Finally, 12 nontargets and 11 suspects were identified unequivocally (Level 1), while 17 further compounds were tentatively identified (Level 2a/3). amongst these were 13 pollutants thus far not reported in groundwater, such as: the industrial chemicals 2,5-dichlorobenzenesulfonic acid (19 detections, up to 100 ng L-1), phenylphosponic acid (10 detections, up to 50 ng L-1), triisopropanolamine borate (2 detections, up to 40 ng L-1), O-des[2-aminoethyl]-O-carboxymethyl dehydroamlodipine, a transformation product (TP) of the blood pressure regulator amlodipine (17 detections), and the TP SYN542490 of the herbicide metolachlor (Level 3, 33 detections, estimated concentrations up to 100–500 ng L-1). One monitoring site was far more contaminated than other sites based on estimated total concentrations of potential MPs, which was supported by the elucidation of site-specific nontarget signals such as the carcinogen chlorendic acid, and various naphthalenedisulfonic acids. Many compounds remained unknown, but overall, source related prioritisation proved an effective approach to support identification of compounds in groundwater.
Facebook
TwitterWell-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.
The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.
National Coverage.
Individual
The target population is the civilian, non-institutionalized population 15 years and above.
Sample survey data [ssd]
The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.
Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.
Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.
The sample size in Afghanistan was 1,000 individuals. Gender-matched sampling was used during the final stage of selection.
Face-to-face [f2f]
The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.
Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.
Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.
Facebook
TwitterDefines how to encode NHS approved patient identifiers into a two-dimensional barcode.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Correct identification of protein post-translational modifications (PTMs) is crucial to understanding many aspects of protein function in biological processes. G-PTM-D is a recently developed technique for global identification and localization of PTMs. Spectral file calibration prior to applying G-PTM-D, and algorithmic enhancements in the peptide database search significantly increase the accuracy, speed, and scope of PTM identification. We enhance G-PTM-D by using multinotch searches and demonstrate its effectiveness in identification of numerous types of PTMs including high-mass modifications such as glycosylations. The changes described in this work lead to a 20% increase in the number of identified modifications and an order of magnitude decrease in search time. The complete workflow is implemented in MetaMorpheus, a software tool that integrates the database search procedure, identification of coisolated peptides, spectral calibration, and the enhanced G-PTM-D workflow. Multinotch searches are also shown to be useful in contexts other than G-PTM-D by producing superior results when used instead of standard narrow-window and open database searches.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/8226/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8226/terms
Patterns of adult criminal behavior are examined in this data collection. Data covering the adult years of peak criminal activity (from approximately 18 to 26 years of age) were obtained from samples of delinquent youths who had been incarcerated in three California Youth Authority institutions during the 1960s: Preston, Fricot, and the Northern California Youth Center. Data were obtained from three sources: official arrest records of the California Bureau of Criminal Investigation and Identification (CII), supplementary data from the Federal Bureau of Investigation, and the California Bureau of Vital Statistics. Follow-up data were collected between 1978 and 1981. There are two files per sample site. The first is a background data file containing information obtained while the subjects were housed in Youth Authority institutions, and the second is a follow-up history offense file containing data from arrest records. Each individual is identified by a unique ID number, which is the same in the background and offense history files.
Facebook
TwitterDatabase used to store client data both Identity and customer relationship management.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The zip file contains the benchmark data used for the TIPP3 simulation study. See the README file for more information.
Facebook
TwitterThis data package contains the details of substances in drugs, biologics, foods and devices registered with a Unique Ingredient Identifier (UNII) through the joint FDA/USP Substance Registration System (SRS). It also contains a list of the names used for each UNII and the changes made to Unique Ingredient Identifiers' (UNIIs) descriptions to the latest update.
Facebook
TwitterEnsembl Metazoa 49 derived ID mapping databases for use with BridgeDb.
The scripts used to create these databases based on Ensembl BioMart can be found at https://github.com/bridgedb/create-bridgedb-genedb.
This work was funded by the FAIRplus project (grant agreement no 802750) and NWO Open Science Fund (grant no 203.001.121).
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset to use in a tutorial about sample identification, using the tool Kraken.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In addition to respondents’ highest educational qualification, some surveys also collect data on their main field of education. Current measurement practice involves either a closed question with highly aggregated response categories, which are difficult to use for respondents, or an open question, requiring expensive post-coding. Therefore, a measurement tool for fields of education was developed in the SERISS-project in work package 8, Task 8.3. In deliverable D8.9 we provide a database of fields of education and training in 34 languages, including the definition of a search tree interface to facilitate navigation of categories for respondents. All 120 standard categories and classification codes are taken from UNESCO's International Standard Classification of Education for Fields of Education and Training (ISCED-F). For most languages, detailed 3-digit information is available. The database, including a live search feature, is available at the surveycodings website at https://surveycodings.org/articles/codings/fields-of-education. The search tree can be used for respondents’ self-identification of fields of education and training in computer-assisted surveys. The live search feature can also be used for post-coding open answers in already collected data.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Discover the booming Data De-identification and Pseudonymity Software market! Learn about its $2 billion valuation, 15% CAGR, key drivers, restraints, and top players like IBM and Thales. Explore regional insights and future projections in this comprehensive market analysis.
Facebook
TwitterThe ability to correctly and consistently identify sea turtles over time was evaluated using digital imagery of the turtles dorsal and side views of their heads and dorsal views of their carapaces