Facebook
TwitterA database for phylogenetic classification for proteins encoded in complete genomes. Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Please be aware that COGs hasn't been updated in many years and will not be.
Facebook
TwitterBackground: The Serratia plymuthica UBCF_13 is a phylloplane associated plant bacterium showing antifungal activity. Whole genome sequence provides information to get more insight about evolutionary study, unique traits in the genome and possibility to explore potential of this microorganism for future study. Here, we report the genome sequence of S. plymuthica UBCF_13 and the comparison with other seventeen strain.
Methods: Continuous short reads were attained from Illumina sequencing runs and reads of 150 bp were merged into a single dataset. A pan-genome based method was used to identify the core-genome of S. plymuthica species and the unique gene in UBCF-13.
Results: Assembled Illumina reads of S. plymuthica strain UBCF_13 genome was produced a 5.46 Mb circular genome sequence. 3315 genes were found to belong to the core-genome sheared by the 18 strains evaluated. The UBCF_13 genome harbors 488 unique genes, where 300 of which only can be found in this strain. The raw and assemble...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The three numbers in the COG database row represent the following: total number of proteins in the 66 genome COG database/total number of genomes in which those proteins were found/highest number of proteins per single bacterial genome.
Facebook
TwitterDatabase of Clusters of Orthologous Genes grouped by pathways and functional systems. It includes the complete genomes of 1,187 bacteria and 122 archaea that map into 1,234 genera.
Facebook
TwitterThe Prokaryotic Operon DataBase (ProOpDB) constitutes one of the most precise and complete repository of operon predictions in our days. Using our novel and highly accurate operon algorithm, we have predicted the operon structures of more than 1,200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: i) organism name, ii) metabolic pathways, as defined by the KEGG database, iii) gene orthology, as defined by the COG database, iv) conserved protein motifs, as defined by the Pfam database, v) reference gene, vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient protocol to select the more representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool (GeConT) to visualize their genomic context and retrieve the sequence of their corresponding 5�� regulatory regions, as well as the nucleotide or amino acid sequences of their genes. The prediction algorithm The algorithm is a multilayer perceptron neural network (MLP) classifier, that used as input the intergenic distances of contiguous genes and the functional relationship scores of the STRING database between the different groups of orthologous proteins, as defined in the COG database. Nevertheless, the operon prediction of our method is not restricted to only those genes with a COG assignation, since we successfully defined new groups of orthologous genes and obtained, by extrapolation, a set of equivalent STRING-like scores based on conserved gene pairs on different genomes. Since the STRING functional relationships scores are determined in an un-bias manner and efficiently integrates a large amount of information coming from different sources and kind of evidences, the prediction made by our MLP are considerably less influenced by the bias imposed in the training procedure using one specific organism.
Facebook
TwitterTHIS RESOURCE IS NO LONGER IN SERVICE, documented on August 20,2019.The COG-database has become a powerful tool in the field of comparative genomics. The construction of this data-base is based on sequence homologies of proteins from different completely sequenced genomes. Highly homologous proteins are assigned to clusters of orthologous groups. The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Here is a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Master tally sheet of the total curation process of generating a new list of COGs representative of gene/protein families involved in tRNA modifications as per published gene-/protein-modification pairs curated from the literature. Original COG Pathway list (via the COG Database, June 2022) contained 59 COGs; the final list (see other Object, namely 4-S3) totalled 89 COGs, 52 retained from the original list and 37 were added to contribute to the new list. Of the original 59, 7 were removed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Brain-Computer Interfaces, and especially passive Brain-Computer Interfaces (pBCI), with their ability to estimate and detect mental states, are receiving increasing attention from both the scientific and the research and development communities. Many pBCIs aim to increase the safety of complex work environments such as in the aeronautical domain. Therefore, mental workload, vigilance and decision-making are some of the most commonly examined aspects of cognition within this field of research. A large proportion of pBCIs involve a component of machine learning and signal processing as the data that are collected need to be transformed into a reliable estimate of the users’ current mental state (e.g. mental workload). Improving this component is a major challenge for researchers, requiring large quantities of data. While data sharing is common for the active BCI community, open pBCI datasets are scarcer and generally incomplete with regards to the information they report. This is particularly true for datasets encompassing several tasks or sessions, which are of importance for tackling the challenges of transfer learning. Testing new pipelines, feature extraction algorithms and classifiers are central issues for future advances in research within this domain, as well as for algorithm benchmark and research reproducibility.The COG-BCI database presented here is comprised of the recordings of 29 participants over 3 individual sessions with 4 different tasks designed to elicit different cognitive states. This results in a total of over 100 hours of open electrophysiological (EEG) and electrocardiogram (ECG) data. The project was validated by the local ethical committee of the University of Toulouse (CER number 2021-342). The dataset was validated on a subjective, behavioral and physiological level (i.e. cardiac and cerebral activity), to ensure its usefulness to the pBCI community. This body of work represents a large effort to promote the use of pBCIs, as well as the use of open science.
The data are in the Brain Imaging Data Structure (BIDS) format. For more information, please read the COG-BCI_info.pdf file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To identify clusters of orthologous genes (COGs) that correlate with nutrient limitation in the modern ocean, we examined the Ocean Microbial Reference Catalog v2 (OM-RGC.v2) from the Tara Oceans Project. The OM-RGC.v2 includes relative gene abundances of all COGs (n = 4,787) in 139 Tara Oceans metagenomic samples, along with metadata information including phosphate, oxygen, and nitrate/nitrite concentrations. (Nitrate/nitrite values were reported together for OM-RGC v2.) Iron concentrations for Tara Oceans samples were not available and were thus estimated using the PISCES2 model based on iron concentration model predictions for Tara Oceans sampling locations as described in Table S1 of Caputi et al., 2019. Iron concentrations were predicted for surface and the deep chlorophyll maximum (DCM) only; iron concentrations for samples from the mesopelagic zone were not available under the PISCES2 model. All other metadata for Tara Oceans samples were directly obtained from Salazar et al., 2019.Estimation of correlations between COGs and metadata information was performed using regression models. Compound poisson linear models were fitted in bulk using the MaAsLin2 software package (v. 1.18.0). Separate models were fit for each COG to analyze the effect of metadata variables on individual COG abundances. While the main focus was to investigate correlation with nutrient abundance, environmental metadata was included in the model to control for as many potential confounding effects as the data allowed. The following predictors were included in the final model (based on variables available from the Tara Oceans dataset): the size fraction at which the sample was taken, mean temperature, depth, salinity, mean oxygen concentration, PO4 concentration, NO2 + NO3 concentration, iron concentration, and absolute latitude. Of these, the following predictors were log-transformed to allow greater model fit: depth, PO4 concentration, NO2 + NO3 concentration. To the same end, the iron concentration was transformed by taking the square root, and the absolute value of the latitude was taken. Otherwise, no transformations or normalization was performed. No abundance cutoff was applied, but COGs present in less than one-third of the Tara Oceans samples were discarded in order to ensure that the COGs identified by the statistical model were meaningful.
Facebook
TwitterThe towns of Connecticut (CT) Parcels and Computer-Assisted Mass Appraisal (CAMA) data for 2022 are part of a zipped file containing two items: CT parcels in geodatabases organized by COGs and associated CAMA files. The parcel information includes 169 out of 169 town organized with geodatabases for each of the 9 Council of Governments. Most of the parcel data sets can be linked to the CAMA data which has attribute information (e.g. value of house, number of bedrooms) about real property. The parcel features for each town are in shape files, feature classes, or within a geodatabase. Most parcels are organized by town and COG and placed within a geodatabases. The CAMA data sets have information about real property within the towns of CT. It may be linked to the parcels using a join process within a GIS package like ArcGIS Pro or QGIS. 154 out of 169 towns have complete CAMA information. Of the remaining 15 towns, four have no information and the remaining have some limited information mixed into the parcel attribute tables. These files were gathered from the CT towns by the COGs and then submitted to CT OPM. Town data is organized by COG. Attribute names, primary key, secondary key, naming conventions, and file formats are not fully consistent but some cleaning and reorganization was conducted to improve quality. This file was created on 03/08/2023 from data collected in 2021-2022.
Facebook
TwitterBackground: Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. Results: A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Conclusions: Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Worksheet includes the mapping of both tRNA modification-relevant and -irrelevant K numbers to their respective overlapping COGs. Representative sequences of Object 4-S2 informed overlap at the sequence-level, maintaining the theme of data being generated and curated corresponding to support provided by published data. Additional tabs include the same data with expanded names as well as other KEGG K number and representative sequence entry-sourced data (e.g., EC numbers).
Facebook
TwitterData elements available in COG and PHIS.
Facebook
TwitterNon-traditional data signals from social media and employment platforms for COG stock analysis
Facebook
TwitterAnnotation of Acipenser sinensis unigenes in the NR, NT, SwissProt, KEGG, COG, InterPro and GO database.
Facebook
TwitterCog Services Trans Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Representative list of 124 genomes sampled from the 711 genomes of the current COG database release [2]. Table S2. Representative list of 27 eukaryotic genomes sampled manually. Table S3. Results of the similarity assessment for the homologs of catalytic β-subunit of the bacterial FOF1-type ATP synthase by applying the HHpred algorithm [19]. The top hits for the α- and β-subunits of the F-type ATP synthase of E.coli and the B- and A- subunits of the A-type ATP synthase of Methanosarcina mazei (cf with Table 1) are colored red. (XLSX 29 kb)
Facebook
Twitterhttps://publichealthscotland.scot/services/data-research-and-innovation-services/electronic-data-research-and-innovation-service-edris/services-we-offer/https://publichealthscotland.scot/services/data-research-and-innovation-services/electronic-data-research-and-innovation-service-edris/services-we-offer/
File contains basic public metadata, including sequence_name, location, date, pangolin lineage assignment, version and associated scores, scorpio VOC/VUI constellation call and associated scores, key spike protein mutations calls and a list of all nucleotide mutations found.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterA database for phylogenetic classification for proteins encoded in complete genomes. Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Please be aware that COGs hasn't been updated in many years and will not be.