33 datasets found
  1. f

    Data_Sheet_1_Contamination in Reference Sequence Databases: Time for...

    • figshare.com
    • frontiersin.figshare.com
    pdf
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valérian Lupo; Mick Van Vlierberghe; Hervé Vanderschuren; Frédéric Kerff; Denis Baurain; Luc Cornet (2023). Data_Sheet_1_Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics.pdf [Dataset]. http://doi.org/10.3389/fmicb.2021.755101.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Frontiers
    Authors
    Valérian Lupo; Mick Van Vlierberghe; Hervé Vanderschuren; Frédéric Kerff; Denis Baurain; Luc Cornet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.

  2. f

    The GenBank Non-Redundant Protein Sequence Database (NRDB)

    • fungidb.org
    • piroplasmadb.org
    Updated Aug 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). The GenBank Non-Redundant Protein Sequence Database (NRDB) [Dataset]. https://fungidb.org/fungidb/app/record/dataset/DS_a7163a9f0d
    Explore at:
    Dataset updated
    Aug 16, 2019
    Description

    The GenBank non-redundant protein sequence database (NRDB) is a component of the NCBI BLAST databases and contains entries from GenPept, Swissprot, PIR, PDF, PDB and NCBI RefSeq.

  3. f

    Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and...

    • frontiersin.figshare.com
    pdf
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Shamimuzzaman; Jack M. Gardiner; Amy T. Walsh; Deborah A. Triant; Justin J. Le Tourneau; Aditi Tayal; Deepak R. Unni; Hung N. Nguyen; John L. Portwood; Ethalinda K. S. Cannon; Carson M. Andorf; Christine G. Elsik (2023). Data_Sheet_2_MaizeMine: A Data Mining Warehouse for the Maize Genetics and Genomics Database.PDF [Dataset]. http://doi.org/10.3389/fpls.2020.592730.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Md Shamimuzzaman; Jack M. Gardiner; Amy T. Walsh; Deborah A. Triant; Justin J. Le Tourneau; Aditi Tayal; Deepak R. Unni; Hung N. Nguyen; John L. Portwood; Ethalinda K. S. Cannon; Carson M. Andorf; Christine G. Elsik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.

  4. r

    DNA sequence data collected during the SIPEX II voyage of the Aurora...

    • researchdata.edu.au
    • data.aad.gov.au
    • +2more
    Updated Mar 20, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MARTIN, ANDREW; BOWMAN, JOHN; MCMINN, ANDREW (2014). DNA sequence data collected during the SIPEX II voyage of the Aurora Australis, 2012 [Dataset]. http://doi.org/10.4225/15/59a4ecb67a216
    Explore at:
    Dataset updated
    Mar 20, 2014
    Dataset provided by
    Australian Antarctic Data Centre
    Authors
    MARTIN, ANDREW; BOWMAN, JOHN; MCMINN, ANDREW
    Time period covered
    Sep 14, 2012 - Nov 16, 2012
    Area covered
    Description

    Purpose of experiments:

    Sequence data obtained to determine community structure of pack sea-ice microbial communities and whether it is effected by exposures to elevated CO2 levels.

    Summary of Methods:

    Cells in sea-ice brines were filtered onto 0.2 micron filters and material extracted using the MoBio Water DNA extraction kit. The DNA was analysed by Research and Testing Laboratories Inc. (Lubbock, Texas, USA) via 454 pyrosequencing. The bacteria were analysed using primers set 10F-519R, which targets 16S rRNA genes. 16S rRNA genes associated with chloroplast and mitochondria are included in this dataset but represent a minority of sequences in most samples. Eukaryotes were analysed using primers set 550F-1055R, which targets 18S rRNA genes. The 454 pyrosequencing analysis with the Titanium GS FLX+ kit used generates on average 3000 reads incorporating custom pyrotags for later stages of the data analysis. The specific steps used for subsequent data analysis are described in the attached PDF file (Data_Analysis_Methodology.PDF). This output was further refined by first determining consensus sequences at the 98% similarity level using Weizhong Li’s online software site CD-HIT (http://weizhongli-lab.org/cd-hit/) Reference: Niu B, Fu L, Sun S, Li W. 2010. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 1:187 doi:10.1186/1471-2105-11-187. The consensus sequences were then checked for errors, manually curated, and aligned against closest matching sequences obtained from the NCBI database (www.ncbi.nlm.nih.gov) to finally obtained a list of consensus operational taxonomic entities and the number of reads obtained for each samples analysed.

    File: SIPEXII_DNA_Sample_information.xlsx provides sampling and analysis information for the detailed results in the other two files File: SCIPEXII_sea_ice_bacteria_OTUs.xlsx contains information on the number of 16S rRNA reads in bacteria Phylum/Class and OTUs File: SCIPEXII_sea_ice_brines_eukaryote_community_OTU_data.xlsx contains information on the number of 16S rRNA reads in eukaryotic microbes: Phylum/Order/Closest taxon and OTUs

  5. B

    Data from: Morphological identification and single-cell genomics of marine...

    • borealisdata.ca
    • open.library.ubc.ca
    • +3more
    Updated May 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan M. R. Gawryluk; Javier del Campo; Noriko Okamoto; Jurgen F. H. Strassert; Julius Lukes; Thomas A. Richards; Alexandra Z. Worden; Alyson E. Santoro; Patrick J. Keeling (2021). Data from: Morphological identification and single-cell genomics of marine diplonemids [Dataset]. http://doi.org/10.5683/SP2/00MM7G
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2021
    Dataset provided by
    Borealis
    Authors
    Ryan M. R. Gawryluk; Javier del Campo; Noriko Okamoto; Jurgen F. H. Strassert; Julius Lukes; Thomas A. Richards; Alexandra Z. Worden; Alyson E. Santoro; Patrick J. Keeling
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Off of California coastline
    Description

    AbstractRecent global surveys of marine biodiversity have revealed that a group of organisms known as “marine diplonemids” constitutes one of the most abundant and diverse planktonic lineages [1]. Though discovered over a decade ago [2 and 3], their potential importance was unrecognized, and our knowledge remains restricted to a single gene amplified from environmental DNA, the 18S rRNA gene (small subunit [SSU]). Here, we use single-cell genomics (SCG) and microscopy to characterize ten marine diplonemids, isolated from a range of depths in the eastern North Pacific Ocean. Phylogenetic analysis confirms that the isolates reflect the entire range of marine diplonemid diversity, and comparisons to environmental SSU surveys show that sequences from the isolates range from rare to superabundant, including the single most common marine diplonemid known. SCG generated a total of ∼915 Mbp of assembled sequence across all ten cells and ∼4,000 protein-coding genes with homologs in the Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology database, distributed across categories expected for heterotrophic protists. Models of highly conserved genes indicate a high density of non-canonical introns, lacking conventional GT-AG splice sites. Mapping metagenomic datasets [4] to SCG assemblies reveals virtually no overlap, suggesting that nuclear genomic diversity is too great for representative SCG data to provide meaningful phylogenetic context to metagenomic datasets. This work provides an entry point to the future identification, isolation, and cultivation of these elusive yet ecologically important cells. The high density of nonconventional introns, however, also portends difficulty in generating accurate gene models and highlights the need for the establishment of stable cultures and transcriptomic analyses. Usage notesSingle-cell genomic scaffolds from 10 'wild-caught' marine diplonemidsFASTA format single-cell genomic scaffolds of 10 marine diplonemid (protist) cells are presented. Scaffolds were generated with the SPAdes assembler; contaminating sequences were removed, as described in the publication. Each FASTA file is derived from a single cell. Cells are referred to by the numbers used in the publication (i.e., cells 3, 13, 21, 27, 37, 47, 1sb, 4sb, 9sb, 21sb) as no species names exist.marine_diplonemid_SAGs.zipFigure S1 (related to Figure 1). Taxon-annotated GC plots demonstrate the effectiveness of our decontamination procedure.Plots were generated using blobtools (https://github.com/DRL/blobtools) for each SCG assembly before and after decontamination using the megablast/blastx protocol described in Experimental Procedures. Plots are based on megablast queries of the NCBI nt database according to taxonomic Order.FigS1.pdf

  6. Z

    Dominant contribution of Asgard archaea to eukaryogenesis (2024) Tobiasson,...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobiasson, Victor (2025). Dominant contribution of Asgard archaea to eukaryogenesis (2024) Tobiasson, V., Koonin, E. PROCESSED DATA AND METADATA [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14002644
    Explore at:
    Dataset updated
    Mar 22, 2025
    Dataset authored and provided by
    Tobiasson, Victor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Main data deposit for "Dominant contribution of Asgard archaea to eukaryogenesis".

    Victor Tobiasson, Jacob Luo, Yuri I Wolf, Eugene V Koonin

    Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

    The Origin of eukaryotes is one of the key problems in evolutionary biology. The demonstration that the Last Eukaryotic Common Ancestor (LECA) already contained the mitochondrion, an endosymbiotic organelle derived from an alphaproteobacterium, and the discovery of Asgard archaea, the closest archaeal relatives of eukaryotes inform and constrain evolutionary scenarios of eukaryogenesis. We undertook a comprehensive analysis of the origins of the core eukaryotic genes tracing to the LECA within a rigorous statistical framework centered around evolutionary hypotheses testing using constrained phylogenetic trees. The results reveal dominant contributions of Asgard archaea to the origin of most of the conserved eukaryotic functional systems and pathways. A limited contribution from Alphaproteobacteria was identified, primarily relating to the energy transformation systems and Fe-S cluster biogenesis, whereas ancestry from other bacterial phyla was scattered across the eukaryotic functional landscape, without consistent trends. These findings suggest a model of eukaryogenesis in which key features of eukaryotic cell organization evolved in the Asgard ancestor, followed by the capture of the Alphaproteobacterial endosymbiont, and augmented by numerous but sporadic horizontal acquisition of genes from other bacteria both before and after endosymbiosis.

    Version 0.3, updated 180325

    Main data repository for:

    Dominant contribution of Asgard archaea to eukaryogenesis (2024)

    Tobiasson, V., Koonin, E.

    Contains all final parsed data from the main Eukaryogenesis project

    investigating the evolutionary ancetries of eukaryotic protein families.

    Currently (non-static) available at:

    https://www.biorxiv.org/content/10.1101/2024.10.14.618318v2

    https://assets-eu.researchsquare.com/files/rs-5352492/v1/2f9c68ae-cf3e-420a-8d29-867b6fb1a878.pdf

    All code used to generate the data present within this repository available at:

    https://github.com/VictorTobiasson/eukgen

    General information

    To identify associations between prokaryotic and eukaryotic protein families, separate

    hidden Markov model (HMM) databases for prokaryotes and eukaryotes were constructed

    using a custom, cascaded, sequence-to-profile clustering pipeline, implemented using

    mmseqs2, followed by a multistep data-reduction and multiple sequence alignment (MSA)

    procedure to generate HMM profiles using hhsuite.

    A prokaryotic database of 37 million protein sequences was curated from prokaryotic

    genomes obtained from the NCBI GenBank in November 2023 and supplemented with proteins

    extracted from 146 Asgard genome assemblies. To avoid inclusion of genes present only

    within a narrow subset of species, possibly resulting from horizontal transfer from

    eukaryotes post LECA, we reconstructed the “soft-core” pangenome for each of the 26

    curated prokaryotic taxonomic classes. These pangenomes include only those genes that

    are present in at least 67% of the families within each class of Bacteria and Archaea.

    The initial eukaryotic database consisted of 30 million protein sequences from 993

    species taken from EukprotV3 and cleaned using mmseqs2 to remove likely prokaryotic

    contaminants.

    Both databases were clustered and MSAs constructed for all non, singleton clusters

    and HMM profiles created. The resulting eukaryotic HMM dataset was queried against

    the prokaryotic dataset using hhblits to identify sets of homologous protein sequences.

    Each eukaryotic cluster and all its significant prokaryotic hits constituted an individual

    sequence set, hereinafter referred to as an Eukaryotic/Prokaryotic Orthologous Cluster

    (EPOC). The EPOCs constitute groups of homologous proteins from eukaryotes and prokaryotes

    (each EPOC contains a unique set of eukaryotic proteins, but some clusters of prokaryotic

    proteins can be present in multiple EPOCs) that were used for phylogenetic tree

    construction, annotation, and evolutionary hypothesis testing.

    To infer the most likely prokaryotic ancestry of the eukaryotic proteins in each EPOC,

    rather than relying on the tree topology directly, we employed a probabilistic approach

    for evolutionary hypothesis testing using constraint trees. We exhaustively sampled all

    arrangements of likely sister clades and obtained Expected Likelihood Weights (ELW) for

    the set of possible sister clade models. As the ELW metric is analogous to model selection

    confidence, here we take it to be proportional to the probability of a sampled prokaryotic

    clade to be the true sister group of the given eukaryotic clade among a set of competing

    sister clades. For each EPOC, our analysis dynamically accounts for long branch outliers

    and is robust to phylogenetically non-homogenous clades. This analysis is further capable

    of resolving eukaryotic paraphyly, treating each eukaryotic clade within a EPOC as a

    single datapoint for downstream analysis. Our resulting data contains EPOCs annotated

    using profiles generated from KEGG Orthology Groups (KOGs), each with an MSA generated

    using muscle5, a maximum likelihood tree inferred using IQtree2 and associated ELW values

    for all candidate prokaryotic sister phyla. The analysis of prokaryotic ancestry was

    performed only for those eukaryotic clades that included more than 5 distinct taxonomic

    labels, with at least one coming from Amorphea and one from Diaphoretickes, the two

    expansive eukaryotic clades considered to represent either the first or the second

    bifurcation in the evolution of eukaryotes. Thus, these clades likely represent genes

    mapping back to the LECA.

    For further details please see main publication or contact

    victor.tobiasson@nih.gov

    eugene.koonin@nih.gov

    Included files

    Unless otherwise stated all files contained are tab separated and utf-8 encoded

    with the first row containing header information.

    All data entries encoding lists are “|” (pipe) separated.

    Fields without data values are filled with string entries of “none”.

    --- Databases ---

    euk72_ep.tar.gz

    prok2311_as.tar.gz

    Prok2311As_final_clusters.tsv

    Euk72Ep_final_clusters.tsv

    prok2311_as.hmmDB.tar.gz

    euk72_ep.hmmDB.tar.gz

    --- Annotation and Curation ---

    NCBI_taxonomy_species_addendum.tsv

    NCBI_taxonomy_class_addendum.tsv

    Euk72Ep_Prok2311As_final_classes.tsv

    Euk72Ep_Prok2311As_final_classes.GTDB.tsv

    KEGG_category_mapping.tsv

    KEGG_metadata.tsv

    --- EPOC data ---

    EPOC_data.tar.gz

    EPOC_annotation_KEGG.tsv

    EPOC_data.tsv

    EPOC_data.pangenomes_s10.tsv

    EPOC_data.pangenomes_s25.tsv

    EPOC_data.pangenomes_s67.tsv

    EPOC_data.GTDB.tsv

    euk72_ep.tar.gz

    Gunzip-ed .tar archive containing a single directory with 10 files

    constituting the initial eukaryotic mmseqs2 database with taxonomy annotation.

    Constructed from a pre-selected list of 72 eukaryotic proteomes downloaded from

    NCBI as well as a “clean” version of Eukprot, lacking highly prokaryotic-like

    contaminant sequences.

    prok2311_as.tar.gz

    Gunzip-ed .tar archive containing a single directory with 10 files constituting the

    initial prokaryotic mmseqs2 database with taxonomy annotation. Constructed from

    47545 complete genomes retrieved from NCBI in November 2023.

    prok2311_as.hmmDB.tar.gz

    Gunzip-ed .tar archive containing 6 files. Comprises an HHSuite Databse formatted

    from prok2311_as non--singleton clusters, contains 26286 profiles.

    euk72_ep.hmmDB.tar.gz

    Gunzip-ed .tar archive containing 6 files. Comprises an HHSuite Databse formatted

    from euk72_ep non-singleton clusters, contains 1631704 profiles.

    NCBI_taxonomy_species_addendum.tsv

    Taxonomy mapping file with manually curated ‘class’ level annotation for poorly

    annotated species.

    taxid: NCBI taxid

    proposed_class_id: Manually assigned NCBI taxid

    proposed_class_label: NCBI class name

    org_name: NCBI organism name

    NCBI_taxonomy_class_addendum.tsv

    Class revision file mapping poorly populated class level entries to higher order

    manually curated labels. Also includes information for small classes with shallow

    taxonomy which are deleted from the EPOC analysis at the level of tree construction.

    taxid: NCBI taxid

    ncbi_class: NCBI taxid of rank corresponding to ‘class’ following manual

    amendment as per NCBI_taxonomy_species_addendum.tsv

    revised_class_id: Manually assigned NCBI taxid of rank corresponding to ‘class’

    revised_class_label: Proposed cleartext name of manually revised revised_class_id

    Euk72Ep_Prok2311As_final_classes.tsv

    Final taxonomy at NCBI rank ‘class’ following revisions for all sequences in Euk72Ep or

    Prok2311As. These taxonomic labels are used for EPOC tree annotation.

    acc: mmseqs database header in either prok2311_as or euk72_ep databases

    taxid: NCBI taxid for organism

    superkingdom: Top level NCBI taxonomy classification Bacteria, Archaea or Eukarya,

    used to define Eukaryotic outgroups in EPOC analysis

    class: Cleartext name of manually revised NCBI rank ‘class’ identifier for annotation

    Euk72Ep_Prok2311As_final_classes.GTDB.tsv

    Final taxonomy at GTDB rank ‘phylum’ transferred using marker genes from GTDB release 220

    acc: mmseqs database header in either prok2311_as or euk72_ep databases

    taxid: NCBI taxid for organism

    superkingdom: Top level NCBI taxonomy classification Bacteria, Archaea or Eukarya,

    used to define Eukaryotic outgroups in EPOC analysis

    class: Cleartext name of assigne GTDB phylum

    Prok2311As_final_clusters.tsv

    Cluster mapping file for accessions within the initial Prok2311A database to the

    final clusters used for HMM creation

  7. DNA loss model explains the evolution of the neuropeptide LWamide,...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian E. Cadena-Caballero; Cristian E. Cadena-Caballero; Nestor Munive-Argüelles; Nestor Munive-Argüelles; Lina M. Vera-Cala; Lina M. Vera-Cala; Carlos Barrios-Hernandez; Carlos Barrios-Hernandez; Ruben O. Duarte-Bernal; Ruben O. Duarte-Bernal; Viviana L. Ayus-Ortiz; Viviana L. Ayus-Ortiz; Luis A. Pardo-Diaz; Luis A. Pardo-Diaz; Mayra Agudelo-Rodríguez; Mayra Agudelo-Rodríguez; Lola X. Bautista-Rozo; Lola X. Bautista-Rozo; Laura R. Jimenez-Gutierrez; Laura R. Jimenez-Gutierrez; Francisco Martinez-Perez; Francisco Martinez-Perez (2025). DNA loss model explains the evolution of the neuropeptide LWamide, APGWamide, APGW/AKH, RPCH, AKH, ACP, CRZ, and GnRH families [Dataset]. http://doi.org/10.5281/zenodo.8092804
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cristian E. Cadena-Caballero; Cristian E. Cadena-Caballero; Nestor Munive-Argüelles; Nestor Munive-Argüelles; Lina M. Vera-Cala; Lina M. Vera-Cala; Carlos Barrios-Hernandez; Carlos Barrios-Hernandez; Ruben O. Duarte-Bernal; Ruben O. Duarte-Bernal; Viviana L. Ayus-Ortiz; Viviana L. Ayus-Ortiz; Luis A. Pardo-Diaz; Luis A. Pardo-Diaz; Mayra Agudelo-Rodríguez; Mayra Agudelo-Rodríguez; Lola X. Bautista-Rozo; Lola X. Bautista-Rozo; Laura R. Jimenez-Gutierrez; Laura R. Jimenez-Gutierrez; Francisco Martinez-Perez; Francisco Martinez-Perez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R1: Establishment and purification of neuropeptide sequences

    The LW, APGW, RPCH, AKH, CRZ, and GnRH neuropeptide families were searched in the GenBank database using 10 keywords: the neuropeptide name, the precursor abbreviation, the full name of the precursor, the full name of the precursor with the word “prepropeptide,” and the combinations of these terms. The candidate sequences were downloaded in FASTA format using the appropriate commands in the GenBank database. The AKH neuropeptide family was classified according to the groups published in the literature, as well as the amino acid number and sequence. Furthermore, the ACP hybrid family was identified in the GenBank database using BLAST alignments.

    C00: Neuropeptide Precursor. Eight folders were named with the initials of each neuropeptide family. The AKH family folder was the only one containing four subfolders. All of the folders contained the same type of files: three text files named after the neuropeptide initials and the obtained result. The files identified with the words “with codes” contained the sequences with the codes generated for this study, whereas the documents with the word “Full” contained the GenBank database search results obtained with the 10 aforementioned keywords. These files were located in a folder named “Fasta Keywords.” Each file contained the results from each respective keyword. The files with the words “selected EA” contained the sequences that were selected for evolutionary analyses.

    C01: BLAST ACP. The text file named “00 BLAST ACP” contains the BLAST alignment results obtained from the NCBI database generated with the Adipokinetic Hormone/Corazonin-related peptide from the transcriptome of Callinectes toxotes. The file named “01 ACP Selected” contains the precursors selected for this study. All sequences were in FASTA format and contained the codes summarized in Supplementary Material 3 “Database Sequences.

    The file named “02 ACP selected EA” contains the ACP precursors of other species, which were used for the evolutionary analyses of C. toxotes ACP. The PDF file titled “03 ACP ProP 1.0 Serv” contains the results of the proteolytic cleavage sites of the precursors indicated in the file named “02 ACP selected EA,” which were generated using the aforementioned software.

    C02: BLAST VP. The folder contains the results of the BLAST alignment against the NCBI database, which were generated with the virtual peptide sequences reported by Martinez-Perez et al. (2007). This folder contains seven text files. The name of each file corresponds to the precursor and species in which it was identified. Moreover, the PDF document named “Virtual peptides ProP 1.0 Serv” contains the results of the proteolytic cleavage sites generated with the aforementioned software.

    C03: Debugging sequences with software. This folder contains three subfolders containing the results obtained with each software used in this study for the detection of each of the neuropeptide sequences using the appropriate keywords.

    The folder named “BioDataToolKit” contains six subfolders with the abbreviated name of each neuropeptide. Additionally, there is a file containing the sequences downloaded from the GenBank database, as well as a Microsoft Excel file containing the details generated by the software. The name of each file corresponds to the keywords used for each search. The software used in this study can be found in the following repository: https://github.com/rduarte24/BiodataToolkit.

    The folder named “Pro1.0Server” was organized in the same way as the results derived for the “BioDataToolKit” for each neuropeptide family. However, each of the neuropeptide folders contained a file with the pertinent sequences whereas another file contained the endoproteolytic cleavage sites of the neuropeptide precursors obtained with the software.

    The folder named “Proteios” contains seven files. The file names indicate the precursor analyzed with the software and the identified sequences in FASTA format. The Proteios software is available in the following website: https://github.com/Martin-Munive/Proteios.

    C04: Neuropeptide precursors for evolutionary analysis. Files with the sequences of the neuropeptide precursors used for the generation of the phylogenetic trees in Supplementary Materials 4 and 7. The name of each file corresponds to the name of each of the analyzed neuropeptides.

    R2: Transcriptome BLAST

    Microsoft Excel file containing the BLAST alignments conducted using the sequences of the AKH/CRZ-related peptide (ACP) from C. toxotes and Corazonin (CRZ) from C. arcuatus. The following information is summarized in the spreadsheets named C. toxotes and C. arcuatus: Column A, neuropeptide name; Column B, species name; Columns C–G, BLAST alignment results; Column H, GenBank protein accession number; Column I, precursor sequence.

    R3: Construction of neuropeptide database

    Microsoft Excel file with information pertaining to the database and a detailed description of each of the neuropeptide precursors analyzed in this study. The Excel file contains seven spreadsheet tabs. Each of the tabs contains the following columns:

    Neuropeptides. Column A, sequence numbering in descending order; Column B, neuropeptide name; Column C, identification code used in this study; Column D, accession number; Columns E–G, species taxonomy; Columns H–L, GenBank sequence description; Columns M–N, literature reference and link. Taxonomy. Taxonomic description of each of the examined species derived from the NCBI database. Sequences evolutionary anal. This tab contains the code developed for this work in Column C; the GenBank accession codes of each neuropeptide are summarized in Column D and species taxonomy details are summarized in Columns E y F. Table of differences. Column B shows the codes of identical sequences and Column C shows the code of the sequence selected for this study. Codes deleted. This tab contains the accession codes of the species and the species name but contains no details on the properties of the neuropeptide precursors. Sequences Paper. Neuropeptide sequences reported in previous studies that were later reported in the GenBank database. The sequences marked with asterisks have not been previously reported in public databases. The codes used in this study to designate the sequences are also included. Keywords. Keywords used to conduct the GenBank database searches to obtain the members of each neuropeptide family.

    R4: In silico validation, alignments, and phylogenetic relationships

    Generated phylogenetic trees and results obtained from individual runs for each of the neuropeptide families with the DNA-LM and Kalign parameters using the IQ-TREE software.

    The folder named “RUN” contains the “DNALM and kalign 2.0 default parameters” subfolder. Both folders contain 11 subfolders with the names of each of the neuropeptide families, as well as the results obtained with the IQ-TREE software. The folder named “Trees” contains the folder “DNALM and kalign 2.0 default parameters” containing the phylogenetic trees for each of the neuropeptide families, which were created with the Itol software.

    R5: BLAST alignment of the virtual peptide precursors

    Results of the BLAST alignment of the virtual peptides described by Martinez-Perez et al. (2007) with respect to the sequences in the GenBank database. The files follow the same nomenclature as in the folder named “Carpeta 02 BLAST VP in Repository 1.

    R6: Alignment of neuropeptide precursors

    DNALM and Kalign 2.0 default parameter” folders. Each of these folders contains the alignments of the examined neuropeptide precursors from each family and each folder is named after the corresponding neuropeptide. The remaining files contain the alignments in ascending order in the evolutionary scale and are appropriately named after the corresponding neuropeptide. The file named “All Sequence FASTA” contains the sequences used in our study in FASTA format.

    R7: Phylogenetic clustering of the precursors

    DNALM and Kalign 2.0 default parameter” folders. Both folders contain the phylogenetic tree clustering results from Supplementary Material 6, which were obtained using the DNA-LM y Kalign parameters and the IQ-TREE software. All analyses were conducted using the GUANE-1 supercomputer (Universidad Industrial de Santander). The phylogenetic clustering results of all of the precursors are contained in the folders with the respective precursor name. The folder also contains Figure 6, which was included in our main manuscript.

    Additionally, a folder entitled "Orthofinder and Robinson-Foulds" is included, which corresponds to the analyses carried out for: the Robinson-Foulds metric and the Orthofinder software.

  8. Data from: Genomic evidence for the parallel regression of melatonin...

    • zenodo.org
    bin, pdf, txt
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher A. Emerling; Christopher A. Emerling; Mark S. Springer; John Gatesy; Zachary Jones; Deana Hamilton; David Xia-Zhu; Matthew A. Collin; Frédéric Delsuc; Frédéric Delsuc; Mark S. Springer; John Gatesy; Zachary Jones; Deana Hamilton; David Xia-Zhu; Matthew A. Collin (2024). Genomic evidence for the parallel regression of melatonin synthesis and signaling pathways in placental mammals [Dataset]. http://doi.org/10.5281/zenodo.4894212
    Explore at:
    pdf, bin, txtAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher A. Emerling; Christopher A. Emerling; Mark S. Springer; John Gatesy; Zachary Jones; Deana Hamilton; David Xia-Zhu; Matthew A. Collin; Frédéric Delsuc; Frédéric Delsuc; Mark S. Springer; John Gatesy; Zachary Jones; Deana Hamilton; David Xia-Zhu; Matthew A. Collin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary Material for:

    Emerling C.A., Springer M.S., Gatesy J., Jones Z., Hamilton D., Xia-Zhu D., Collin M.A., and Delsuc F. (2021). Genomic evidence for the parallel regression of melatonin synthesis and signaling pathways in placental mammals. Open Research Europe.

    Supplementary File Legends:

    - Supplementary_Figure_S1.pdf: RAxML AANAT gene tree.

    - Supplementary_Figure_S2.pdf: RAxML ASMT gene tree.

    - Supplementary_Figure_S3.pdf: RAxML MTNR1A+MTNR1B tree.

    - Supplementary_Figure_S4.pdf: PAML AANAT results, model 1 (see Supplementary Table S7).

    - Supplementary_Figure_S5.pdf: PAML ASMT results, model 2 (see Supplementary Table S8).

    - Supplementary_Figure_S6.pdf: PAML MTNR1A results, model 1 (see Supplementary Table S9).

    - Supplementary_Figure_S7.pdf: PAML MTNR1B results, model 1 (see Supplementary Table S10).

    - Supplementary_Table_S1.xlsx: List of species examined in this study and the sources of the genes. Source key: WGS: Sequences derived from NCBI's Whole Genome Shotgun database; Whole Genome Sequencing of Short Reads: whole genomes were sequenced using short-read technologies. The methodologies varied for the species, and will be published with other projects, so please contact the author(s) for information on the specific methodology and samples used; SRA: sequences derived from NCBI's Sequence Read Archive; GenBank: sequences derived from NCBI's nucleotide collection; Bowhead Whale Genome Resource: sequences derived from http://www.bowhead-whale.org; Ensembl: sequences derived from Ensembl genome browser (www.ensembl.org)l; Discovar de novo: sequences derived genomes assembled via Discovar de novo (https://software.broadinstitute.org/software/discovar/blog/).

    - Supplementary_Table_S2.xlsx: Accession numbers and functionality of AANAT in species examined. Parentheses after accession number indicates coordinates for sequence on the contig / scaffold. Exon colors code for the following: green = putatively functional; yellow = missing; pink = one or more inactivating mutations found. Abbreviations for mutations are as follows: del = deletion; ins = insertion; start = start codon mutation; stop = premature stop codon; ? = ambiguity whether the mutation is shared among all members of the clade. Abbreviations in brackets following an inactivating mutation indicate shared inactivating mutation. Key for each abbreviation follows: Bacu = Balaenoptera acutorostrata; BALA = Balaenidae; BALAEN = Balaenopteridae; Bbon = Balaenoptera bonaerensis; CAB = Cabassous; Ccap = Cebus capucinus; CETA = Cetacea; CHLAM = Chlamyphoridae; CHOL = Choloepus; Cjac = Callithrix jacchus; CING = Cingulata; DASY = Dasypodidae; DELP = Delphinidae; DERM = Dermoptera; Erob = Eschrichtius robustus; INIA = Inia; FOLI = Folivora; GALE = Galeopterus; LIPO = Lipotes; Lobl = Lagenorhynchus obliquidens; MANI = Manidae; MONO = Monodontidae; MYRM = Myrmecophagidae; MYST = Mysticeti; NPP = Not present in Platanista or Physeteroidea, but present in other Odontocetes; NPZ = Not present in Ziphiidae, but present in other Odontocetes; Oorc = Orcinus orca; PEUT = Tolypeutinae; PHOC = Phocoenidae; PHOL = Pholidota; PHOR = Chlamyphorinae; PILO = Pilosa; PHYS = Physeteroidea; PONT = Pontoporia; Schi = Sousa chinensis; SIRE = Sirenia; Tadu = Tursiops aduncus; TOLY = Tolypeutes; VERM = Vermilingua; XEN = Xenarthra.


    - Supplementary_Table_S3.xlsx: Accession numbers and functionality of ASMT in species examined. See Table S2 caption for details.

    - Supplementary_Table_S4.xlsx: Accession numbers and functionality of MTNR1A in species examined. See Table S2 caption for details.

    - Supplementary_Table_S5.xlsx: Accession numbers and functionality of MTNR1B in species examined. See Table S2 caption for details.

    - Supplementary_Table_S6.xlsx: Codon frequency model selection. These are the results from one ratio dN/dS analyses using different codon frequency models.

    - Supplementary_Table_S7.xlsx: Results of AANAT PAML dN/dS analyses. Model: BG = branch(es) grouped with background; fixed 1 = branch(es) fixed at 1. p-value: specific p-value only shown if lower than 0.05. Model Comparison: if model comparison yields statistically significant differences (p < 0.05), model comparison bolded and given green background. For most models, w only shown for branch(es) of interest.

    - Supplementary_Table_S8.xlsx: Results of ASMT PAML dN/dS analyses. Refer to Table S7 caption for additional details.

    - Supplementary_Table_S9.xlsx: Results of MTNR1A PAML dN/dS analyses. Refer to Table S7 caption for additional details.

    - Supplementary_Table_S10.xlsx: Results of MTNR1B PAML dN/dS analyses. Refer to Table S7 caption for additional details.

    - Supplementary_Table_S11.xlsx: Results of BLASTing and mapping short reads from Alligator mississippiensis RNA sequencing experiments.

    - Supplementary_Dataset_S1_all_ali_fasta.txt: Genomic alignments in fasta format used to determine the pseudogene/functional status of the different genes in different taxonomic groups.

    - Supplementary_Dataset_S2_AANAT_RAxML_ali.phy: Alignment of AANAT in phylip format used in maximum likelihood phylogenetic reconstruction with RAxML.

    - Supplementary_Dataset_S3_ASMT_RAxML_ali.phy: Alignment of ASMT in phylip format used in maximum likelihood phylogenetic reconstruction with RAxML.

    - Supplementary_Dataset_S4_MTNR1A_MTNR1B_RAxML_ali.phy: Alignment of MTNR1A and MTNR1B in phylip format used in maximum likelihood phylogenetic reconstruction with RAxML.

    - Supplementary_Dataset_S5_AANAT_PAML_alig.fasta: Codon alignment of AANAT in fasta format used in selection pressure analyses with PAML.

    - Supplementary_Dataset_S6_ASMT_PAML_ali.fasta: Codon alignment of ASMT in fasta format used in selection pressure analyses with PAML.

    - Supplementary_Dataset_S7_MTNR1A_PAML_ali.fasta: Codon alignment of MTNR1A in fasta format used in selection pressure analyses with PAML.

    - Supplementary_Dataset_S8_MTNR1B_PAML_ali.fasta: Codon alignment of MTNR1B in fasta format used in selection pressure analyses with PAML.

    - Supplementary_Dataset_S9_PAML_topology.tre: Tree topology in newick format used in selection pressure analyses with PAML.

  9. Dataset for: Pre-pandemic artificial MERS analog of polyfunctional...

    • zenodo.org
    bin, pdf, txt
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Martin Lisewski; Andreas Martin Lisewski (2024). Dataset for: Pre-pandemic artificial MERS analog of polyfunctional SARS-CoV-2 S1/S2 furin cleavage site domain is unique among spike proteins of genus Betacoronavirus [Dataset]. http://doi.org/10.5281/zenodo.13148895
    Explore at:
    bin, txt, pdfAvailable download formats
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andreas Martin Lisewski; Andreas Martin Lisewski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 1, 2024
    Description

    Data File Descriptions and Methods

    1. [betacov_matching_IPR042578.fasta]: Representative set of 2,465 betacoronavirus S protein overlapping homologous superfamily sequences retreived in fasta format on 4 December 2022 from the InterPro repository at https://www.ebi.ac.uk/interpro/entry/InterPro/IPR042578/.

    2. [betacov_matching_IPR042578_motif.fasta]: Extracted 98,122 furin cleavage site (FCS) motifs of 20 amino acid length, including overlapping sequences, using the FindFur algorithm as described by (Gu, 2020) and deposited on 15 December 2020 at the GitHub software repository at https://github.com/chwisteeng/FindFur. These sequences were individually checked for The/Ser O-glycosite residue pairs with the standard prediction software NetOGlyc4.0 (Steentoft et al., 2013) as available at https://services.healthtech.dtu.dk/services/NetOGlyc-4.0/. The bioinformatics nuclear localization signal (NLS) predictions, specifically including the positive hits for pat7 in SARS-CoV-2 and in MERS_MA30 CoV, used the PSORT algorithm available as a webservice at https://wolfpsort.hgc.jp/ which is based on the work of Nakai and Horton (Nakai and Horton, 1999).

    3. [betacov_s1s2_nls_pat7_furin_blastp.txt]: Comprehensive sequence database searches using were performed using the NCBI protein BLAST (BLASTP) algorithm with webservice available at https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins. The following BLASTP search parameters and settings were used: Word size=2; Expect value=200000; Hitlist size=500; Gapcosts=9,1; Matrix=PAM30; Filter string=F; Genetic Code=1;Window Size=40; Threshold=11; Composition-based stats=0; Database Posted date=Jan 19, 2023 2:59 AM; Number of letters=17,117,563; Number of sequences=10,766; Entrez query: Includes: Betacoronavirus (taxid:694002); Excludes: SARS-CoV-2 (taxid:2697049). The six polyfunctional input query consensus motif sequences were TXXPR(K/H/R)XRSX and TXXPRX(K/H/R)RSX.

    4. [table_s1s2_hits_betacov_polyf.pdf]: Compiled summary table of hits (PDF) representing S1/S2 spike domains across genus Betacoronavirus.

    5. [table_s1s2_hits_betacov_polyf.xlsx]: Compiled summary table of hits (MS Excel) representing S1/S2 spike domains across genus Betacoronavirus.

    References

    Gu, C., 2020. FindFur: A Tool for Predicting Furin Cleavage Sites of Viral Envelope Substrates. Master’s Thesis, San Jose State University, CA, USA. doi: 10.31979/etd.4ahv-9jya

    Nakai, K., Horton, P., 1999. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24, 34–36. doi: 10.1016/s0968-0004(98)01336-x

    Steentoft, C., Vakhrushev, S.Y., Joshi, H.J., Kong, Y., Vester-Christensen, M.B., Schjoldager, K.T.-B.G., Lavrsen, K., Dabelsteen, S., Pedersen, N.B., Marcos-Silva, L., Gupta, R., Bennett, E.P., Mandel, U., Brunak, S., Wandall, H.H., Levery, S.B., Clausen, H., 2013. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J 32, 1478–1488. doi: 10.1038/emboj.2013.79

  10. f

    Data_Sheet_1_Heuristic and Hierarchical-Based Population Mining of...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joao Carlos Gomes-Neto; Natasha Pavlovikj; Carmen Cano; Baha Abdalhamid; Gabriel Asad Al-Ghalith; John Dustin Loy; Dan Knights; Peter C. Iwen; Byron D. Chaves; Andrew K. Benson (2023). Data_Sheet_1_Heuristic and Hierarchical-Based Population Mining of Salmonella enterica Lineage I Pan-Genomes as a Platform to Enhance Food Safety.PDF [Dataset]. http://doi.org/10.3389/fsufs.2021.725791.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Joao Carlos Gomes-Neto; Natasha Pavlovikj; Carmen Cano; Baha Abdalhamid; Gabriel Asad Al-Ghalith; John Dustin Loy; Dan Knights; Peter C. Iwen; Byron D. Chaves; Andrew K. Benson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The recent incorporation of bacterial whole-genome sequencing (WGS) into Public Health laboratories has enhanced foodborne outbreak detection and source attribution. As a result, large volumes of publicly available datasets can be used to study the biology of foodborne pathogen populations at an unprecedented scale. To demonstrate the application of a heuristic and agnostic hierarchical population structure guided pan-genome enrichment analysis (PANGEA), we used populations of S. enterica lineage I to achieve two main objectives: (i) show how hierarchical population inquiry at different scales of resolution can enhance ecological and epidemiological inquiries; and (ii) identify population-specific inferable traits that could provide selective advantages in food production environments. Publicly available WGS data were obtained from NCBI database for three serovars of Salmonella enterica subsp. enterica lineage I (S. Typhimurium, S. Newport, and S. Infantis). Using the hierarchical genotypic classifications (Serovar, BAPS1, ST, cgMLST), datasets from each of the three serovars showed varying degrees of clonal structuring. When the accessory genome (PANGEA) was mapped onto these hierarchical structures, accessory loci could be linked with specific genotypes. A large heavy-metal resistance mobile element was found in the Monophasic ST34 lineage of S. Typhimurium, and laboratory testing showed that Monophasic isolates have on average a higher degree of copper resistance than the Biphasic ones. In S. Newport, an extra sugE gene copy was found among most isolates of the ST45 lineage, and laboratory testing of multiple isolates confirmed that isolates of S. Newport ST45 were on average less sensitive to the disinfectant cetylpyridimium chloride than non-ST45 isolates. Lastly, data-mining of the accessory genomic content of S. Infantis revealed two cryptic Ecotypes with distinct accessory genomic content and distinct ecological patterns. Poultry appears to be the major reservoir for Ecotype 1, and temporal analysis further suggested a recent ecological succession, with Ecotype 2 apparently being displaced by Ecotype 1. Altogether, the use of a heuristic hierarchical-based population structure analysis that includes bacterial pan-genomes (core and accessory genomes) can (1) improve genomic resolution for mapping populations and accessing epidemiological patterns; and (2) define lineage-specific informative loci that may be associated with survival in the food chain.

  11. f

    Data Sheet 1_Evaluation of 16S rRNA genes sequences and genome-based...

    • frontiersin.figshare.com
    pdf
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelina A. Kislichkina; Angelika A. Sizova; Yury P. Skryabin; Svetlana V. Dentovskaya; Andrey P. Anisimov (2025). Data Sheet 1_Evaluation of 16S rRNA genes sequences and genome-based analysis for identification of non-pathogenic Yersinia.pdf [Dataset]. http://doi.org/10.3389/fmicb.2024.1519733.s005
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Frontiers
    Authors
    Angelina A. Kislichkina; Angelika A. Sizova; Yury P. Skryabin; Svetlana V. Dentovskaya; Andrey P. Anisimov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    16S rRNA genes sequencing has been used for routine species identification and phylogenetic studies of bacteria. However, the high sequence similarity between some species and heterogeneity within copies at the intragenomic level could be a limiting factor of discriminatory ability. In this study, we aimed to compare 16S rRNA genes sequences and genome-based analysis (core SNPs and ANI) for identification of non-pathogenic Yersinia. We used complete and draft genomes of 373 Yersinia strains from the NCBI Genome database. The taxonomic affiliations of 34 genomes based on core SNPs and the ANI results did not match those specified in the GenBank database (NCBI). The intragenic homology of the 16S rRNA gene copies exceeded 99.5% in complete genomes, but above 50% of genomes have four or more variants of the 16S rRNA gene. Among 327 draft genomes of non-pathogenic Yersinia, 11% did not have a full-length 16S rRNA gene. Most of draft genomes has one copy of gene and it is not possible to define the intragenomic heterogenicity. The average homology of 16S rRNA gene was 98.76%, and the maximum variability was 2.85%. The low degree of genetic heterogenicity of the gene (0.36%) was determined in group Y. pekkanenii/Y. proxima/Y. aldovae/Y. intermedia/Y. kristensenii/Y. rochesterensis. The identical gene sequences were found in the genomes of the Y. intermedia and Y. rochesterensis strains identified using ANI and core SNPs analyses. The phylogenetic tree based on 16S rRNA genes differed from the tree based on core SNPs of the genomes and did not represent phylogenetic relationship between the Yersinia species. These findings will help to fill the data gaps in genome characteristics of deficiently studied non-pathogenic Yersinia.

  12. f

    DataSheet3_Identification of GGT5 as a Novel Prognostic Biomarker for...

    • frontiersin.figshare.com
    pdf
    Updated Jun 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuli Wang; Yuan Fang; Fanchen Zhao; Jiefei Gu; Xiang Lv; Rongzhong Xu; Bo Zhang; Zhihong Fang; Yan Li (2023). DataSheet3_Identification of GGT5 as a Novel Prognostic Biomarker for Gastric Cancer and its Correlation With Immune Cell Infiltration.PDF [Dataset]. http://doi.org/10.3389/fgene.2022.810292.s004
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Frontiers
    Authors
    Yuli Wang; Yuan Fang; Fanchen Zhao; Jiefei Gu; Xiang Lv; Rongzhong Xu; Bo Zhang; Zhihong Fang; Yan Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gastric cancer (GC) is a common malignant tumor of the digestive system. Recent studies revealed that high gamma-glutamyl-transferase 5 (GGT5) expression was associated with a poor prognosis of gastric cancer patients. In the present study, we aimed to confirm the expression and prognostic value of GGT5 and its correlation with immune cell infiltration in gastric cancer. First, we compared the differential expression of GGT5 between gastric cancer tissues and normal gastric mucosa in the cancer genome atlas (TCGA) and GEO NCBI databases using the most widely available data. Then, the Kaplan-Meier method, Cox regression, and univariate logistic regression were applied to explore the relationships between GGT5 and clinical characteristics. We also investigated the correlation of GGT5 with immune cell infiltration, immune-related genes, and immune checkpoint genes. Finally, we estimated enrichment of gene ontologies categories and relevant signaling pathways using GO annotations, KEGG, and GSEA pathway data. The results showed that GGT5 was upregulated in gastric cancer tissues compared to normal tissues. High GGT5 expression was significantly associated with T stage, histological type, and histologic grade (p < 0.05). Moreover, gastric cancer patients with high GGT5 expression showed worse 10-years overall survival (p = 0.008) and progression-free intervals (p = 0.006) than those with low GGT5 expression. Multivariate analysis suggested that high expression of GGT5 was an independent risk factor related to the worse overall survival of gastric cancer patients. A nomogram model for predicting the overall survival of GC was constructed and computationally validated. GGT5 expression was positively correlated with the infiltration of natural killer cells, macrophages, and dendritic cells but negatively correlated with Th17 infiltration. Additionally, we found that GGT5 was positively co-expressed with immune-related genes and immune checkpoint genes. Functional analysis revealed that differentially expressed genes relative to GGT5 were mainly involved in the biological processes of immune and inflammatory responses. In conclusion, GGT5 may serve as a promising prognostic biomarker and a potential immunological therapeutic target for GC, since it is associated with immune cell infiltration in the tumor microenvironment.

  13. oggmap/orthomap - example data

    • zenodo.org
    • data.niaid.nih.gov
    tsv, zip
    Updated Jan 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristian K Ullrich; Kristian K Ullrich (2024). oggmap/orthomap - example data [Dataset]. http://doi.org/10.5281/zenodo.10556444
    Explore at:
    zip, tsvAvailable download formats
    Dataset updated
    Jan 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kristian K Ullrich; Kristian K Ullrich
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data for python package oggmap/orthomap.

    OrthoFinder: Ensembl release-105 (-S diamond_ultra_sens)

    Includes OrthoFinder results (-S diamond_ultra_sens) for all translated coding sequences (CDS) from Ensembl release-105 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org:

    Includes a table specifying the OrthoFinder species file names and its corresponding NCBI taxonomic IDs:

    Includes NCBI taxonomic tree for Ensembl release-105 species analysed:

    OrthoFinder: Ensembl release-110 (-S last)

    Includes OrthoFinder results (-S last) for all translated coding sequences (CDS) from Ensembl release-110 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org:

    Includes a table specifying the OrthoFinder species file names and its corresponding NCBI taxonomic IDs:

    Includes NCBI taxonomic tree for Ensembl release-110 species analysed:

    OrthoFinder: Ensembl release-111 (-S last)

    Includes OrthoFinder results (-S last) for all translated coding sequences (CDS) from Ensembl release-111 (keeping only longest isoforms) and Xtropicalisv9.0.Named.primaryTrs.pep.fa from www.xenbase.org:

    Includes a table specifying the OrthoFinder species file names and its corresponding NCBI taxonomic IDs:

    OrthoFinder: WormBase release-WS288 + WormBase ParaSite release-WBPS18 (-S last)

    Includes OrthoFinder results (-S last) for all translated coding sequences (CDS) from WormBase release-WS288, WormBase ParaSite release-WBPS18 (keeping only longest isoforms) and dd_Smed_v6.pcf.contigs.fasta (transdecoder and miniprothint peptides) from https://planmine.mpibpc.mpg.de:

    Includes a table specifying the OrthoFinder species file names and its corresponding NCBI taxonomic IDs:

    Includes NCBI taxonomic tree for WormBase release-WS288 and WormBase ParaSite release-WBPS18 species analysed:

    Pre-calculated orthomaps:

    Includes pre-calculated gene age assignments for C. elegans (Sun et al. 2021), H. vulgaris (Cazet et al. 2022) and D. rerio (Ensembl-105; Ensembl-110):

    Pre-calculated evolutionary indices:

    Includes pre-calculated TajimaD, NormalizedPi, FayWu, Fst for C. elegans (Ma et al. 2021):

    eggNOG database version 6.0 orthomaps:

    Includes extracted orthomaps for all Eukaryota from eggNOG database version 6.0 (Hernández-Plaza et al. 2022):

    myTAI example data:

    Includes example data from the myTAI R package (Drost et al. 2018)

    PLAZA database version 5.0 orthomaps:

    Includes extracted orthomaps for either HOMFAM or ORTHOFAM groups of plants from PLAZA database version 5.0 (Van Bel et al. 2022):

    Mouse synonyms:

    Table of Mus musculus gene synonyms obtained from here https://github.com/mustafapir/geneName/blob/master/data/mouse_synonyms1.rda and converted into a table.

  14. o

    STAT1 transcription factor in Human HeLa S3

    • omicsdi.org
    xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel Rozowsky,Mark Gerstein,Michael P Wilson,Ghia Euskirchen,Michael Snyder, STAT1 transcription factor in Human HeLa S3 [Dataset]. https://www.omicsdi.org/dataset/arrayexpress-repository/E-GEOD-12782
    Explore at:
    xmlAvailable download formats
    Authors
    Joel Rozowsky,Mark Gerstein,Michael P Wilson,Ghia Euskirchen,Michael Snyder
    Variables measured
    Genomics
    Description

    We report the results of chromatin immunoprecipitation following by high-thoughput tag sequencing (ChIP-Seq) using the GA II platform from Illumina for the human transcription factor STAT1 in HeLa S3 cells. The STAT1 ChIP was performed using HeLa S3 cells that are stimulated using gamma-interferon. We have also generated a seqenced input DNA dataset for gamma-interferon stimulated HeLa S3 cells. Raw data for this study is available for download from the Short Read Archive database at: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP000703. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Examination of the STAT1 transcription factor in Human HeLa S3.

  15. f

    DataSheet5_Identification of GGT5 as a Novel Prognostic Biomarker for...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuli Wang; Yuan Fang; Fanchen Zhao; Jiefei Gu; Xiang Lv; Rongzhong Xu; Bo Zhang; Zhihong Fang; Yan Li (2023). DataSheet5_Identification of GGT5 as a Novel Prognostic Biomarker for Gastric Cancer and its Correlation With Immune Cell Infiltration.PDF [Dataset]. http://doi.org/10.3389/fgene.2022.810292.s006
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Yuli Wang; Yuan Fang; Fanchen Zhao; Jiefei Gu; Xiang Lv; Rongzhong Xu; Bo Zhang; Zhihong Fang; Yan Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gastric cancer (GC) is a common malignant tumor of the digestive system. Recent studies revealed that high gamma-glutamyl-transferase 5 (GGT5) expression was associated with a poor prognosis of gastric cancer patients. In the present study, we aimed to confirm the expression and prognostic value of GGT5 and its correlation with immune cell infiltration in gastric cancer. First, we compared the differential expression of GGT5 between gastric cancer tissues and normal gastric mucosa in the cancer genome atlas (TCGA) and GEO NCBI databases using the most widely available data. Then, the Kaplan-Meier method, Cox regression, and univariate logistic regression were applied to explore the relationships between GGT5 and clinical characteristics. We also investigated the correlation of GGT5 with immune cell infiltration, immune-related genes, and immune checkpoint genes. Finally, we estimated enrichment of gene ontologies categories and relevant signaling pathways using GO annotations, KEGG, and GSEA pathway data. The results showed that GGT5 was upregulated in gastric cancer tissues compared to normal tissues. High GGT5 expression was significantly associated with T stage, histological type, and histologic grade (p < 0.05). Moreover, gastric cancer patients with high GGT5 expression showed worse 10-years overall survival (p = 0.008) and progression-free intervals (p = 0.006) than those with low GGT5 expression. Multivariate analysis suggested that high expression of GGT5 was an independent risk factor related to the worse overall survival of gastric cancer patients. A nomogram model for predicting the overall survival of GC was constructed and computationally validated. GGT5 expression was positively correlated with the infiltration of natural killer cells, macrophages, and dendritic cells but negatively correlated with Th17 infiltration. Additionally, we found that GGT5 was positively co-expressed with immune-related genes and immune checkpoint genes. Functional analysis revealed that differentially expressed genes relative to GGT5 were mainly involved in the biological processes of immune and inflammatory responses. In conclusion, GGT5 may serve as a promising prognostic biomarker and a potential immunological therapeutic target for GC, since it is associated with immune cell infiltration in the tumor microenvironment.

  16. f

    Data_Sheet_1_Characterization of a Novel Chromosomal Class C β-Lactamase,...

    • frontiersin.figshare.com
    pdf
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danying Zhou; Zhewei Sun; Junwan Lu; Hongmao Liu; Wei Lu; Hailong Lin; Xueya Zhang; Qiaoling Li; Wangxiao Zhou; Xinyi Zhu; Haili Xu; Xi Lin; Hailin Zhang; Teng Xu; Kewei Li; Qiyu Bao (2023). Data_Sheet_1_Characterization of a Novel Chromosomal Class C β-Lactamase, YOC-1, and Comparative Genomics Analysis of a Multidrug Resistance Plasmid in Yokenella regensburgei W13.PDF [Dataset]. http://doi.org/10.3389/fmicb.2020.02021.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Frontiers
    Authors
    Danying Zhou; Zhewei Sun; Junwan Lu; Hongmao Liu; Wei Lu; Hailong Lin; Xueya Zhang; Qiaoling Li; Wangxiao Zhou; Xinyi Zhu; Haili Xu; Xi Lin; Hailin Zhang; Teng Xu; Kewei Li; Qiyu Bao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Yokenella regensburgei, a member of the family Enterobacteriaceae, is usually isolated from environmental samples and generally resistant to early generations of cephalosporins. To characterize the resistance mechanism of Y. regensburgei strain W13 isolated from the sewage of an animal farm, whole genome sequencing, comparative genomics analysis and molecular cloning were performed. The results showed that a novel chromosomally encoded class C β-lactamase gene with the ability to confer resistance to β-lactam antibiotics, designated blaYOC–1, was identified in the genome of Y. regensburgei W13. Kinetic analysis revealed that the β-lactamase YOC-1 has a broad spectrum of substrates, including penicillins, cefazolin, cefoxitin and cefotaxime. The two functionally characterized β-lactamases with the highest amino acid identities to YOC-1 were CDA-1 (71.69%) and CMY-2 (70.65%). The genetic context of the blaYOC–1-ampR-encoding region was unique compared with the sequences in the NCBI nucleotide database. The plasmid pRYW13-125 of Y. regensburgei W13 harbored 11 resistance genes (blaOXA–10, blaLAP–2, dfrA14, tetA, tetR, cmlA5, floR, sul2, ant(3″)-IIa, arr-2 and qnrS1) within an ∼34 kb multidrug resistance region; these genes were all related to mobile genetic elements. The multidrug resistance region of pYRW13-125 shared the highest identities with those of two plasmids from clinical Klebsiella pneumoniae isolates, indicating the possibility of horizontal transfer of these resistance genes between bacteria of various origins.

  17. f

    Data_Sheet_4_New Insights on Streptococcus dysgalactiae subsp. dysgalactiae...

    • frontiersin.figshare.com
    pdf
    Updated Jun 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cinthia Alves-Barroco; João Caço; Catarina Roma-Rodrigues; Alexandra R. Fernandes; Ricardo Bexiga; Manuela Oliveira; Lélia Chambel; Rogério Tenreiro; Rosario Mato; Ilda Santos-Sanches (2023). Data_Sheet_4_New Insights on Streptococcus dysgalactiae subsp. dysgalactiae Isolates.PDF [Dataset]. http://doi.org/10.3389/fmicb.2021.686413.s004
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    Frontiers
    Authors
    Cinthia Alves-Barroco; João Caço; Catarina Roma-Rodrigues; Alexandra R. Fernandes; Ricardo Bexiga; Manuela Oliveira; Lélia Chambel; Rogério Tenreiro; Rosario Mato; Ilda Santos-Sanches
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Streptococcus dysgalactiae subsp. dysgalactiae (SDSD) has been considered a strict animal pathogen. Nevertheless, the recent reports of human infections suggest a niche expansion for this subspecies, which may be a consequence of the virulence gene acquisition that increases its pathogenicity. Previous studies reported the presence of virulence genes of Streptococcus pyogenes phages among bovine SDSD (collected in 2002–2003); however, the identity of these mobile genetic elements remains to be clarified. Thus, this study aimed to characterize the SDSD isolates collected in 2011–2013 and compare them with SDSD isolates collected in 2002–2003 and pyogenic streptococcus genomes available at the National Center for Biotechnology Information (NCBI) database, including human SDSD and S. dysgalactiae subsp. equisimilis (SDSE) strains to track temporal shifts on bovine SDSD genotypes. The very close genetic relationships between humans SDSD and SDSE were evident from the analysis of housekeeping genes, while bovine SDSD isolates seem more divergent. The results showed that all bovine SDSD harbor Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas IIA system. The widespread presence of this system among bovine SDSD isolates, high conservation of repeat sequences, and the polymorphism observed in spacer can be considered indicators of the system activity. Overall, comparative analysis shows that bovine SDSD isolates carry speK, speC, speL, speM, spd1, and sdn virulence genes of S. pyogenes prophages. Our data suggest that these genes are maintained over time and seem to be exclusively a property of bovine SDSD strains. Although the bovine SDSD genomes characterized in the present study were not sequenced, the data set, including the high homology of superantigens (SAgs) genes between bovine SDSD and S. pyogenes strains, may indicate that events of horizontal genetic transfer occurred before habitat separation. All bovine SDSD isolates were negative for genes of operon encoding streptolysin S, except for sagA gene, while the presence of this operon was detected in all SDSE and human SDSD strains. The data set of this study suggests that the separation between the subspecies “dysgalactiae” and “equisimilis” should be reconsidered. However, a study including the most comprehensive collection of strains from different environments would be required for definitive conclusions regarding the two taxa.

  18. f

    DataSheet_1_Genomic features, antimicrobial susceptibility, and...

    • figshare.com
    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanu Saroha; Prashant P. Patil; Rekha Rana; Rajesh Kumar; Sanjeet Kumar; Lipika Singhal; Vikas Gautam; Prabhu B. Patil (2023). DataSheet_1_Genomic features, antimicrobial susceptibility, and epidemiological insights into Burkholderia cenocepacia clonal complex 31 isolates from bloodstream infections in India.pdf [Dataset]. http://doi.org/10.3389/fcimb.2023.1151594.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Tanu Saroha; Prashant P. Patil; Rekha Rana; Rajesh Kumar; Sanjeet Kumar; Lipika Singhal; Vikas Gautam; Prabhu B. Patil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    IntroductionBurkholderia cepacia complex (Bcc) clonal complex (CC) 31, the predominant lineage causing devastating outbreaks globally, has been a growing concern of infections in non-cystic fibrosis (NCF) patients in India. B. cenocepacia is very challenging to treat owing to its virulence determinants and antibiotic resistance. Improving the management of these infections requires a better knowledge of their resistance patterns and mechanisms.MethodsWhole-genome sequences of 35 CC31 isolates obtained from patient samples, were analyzed against available 210 CC31 genomes in the NCBI database to glean details of resistance, virulence, mobile elements, and phylogenetic markers to study genomic diversity and evolution of CC31 lineage in India.ResultsGenomic analysis revealed that 35 isolates belonging to CC31 were categorized into 11 sequence types (ST), of which five STs were reported exclusively from India. Phylogenetic analysis classified 245 CC31 isolates into eight distinct clades (I-VIII) and unveiled that NCF isolates are evolving independently from the global cystic fibrosis (CF) isolates forming a distinct clade. The detection rate of seven classes of antibiotic-related genes in 35 isolates was 35 (100%) for tetracyclines, aminoglycosides, and fluoroquinolones; 26 (74.2%) for sulphonamides and phenicols; 7 (20%) for beta-lactamases; and 1 (2.8%) for trimethoprim resistance genes. Additionally, 3 (8.5%) NCF isolates were resistant to disinfecting agents and antiseptics. Antimicrobial susceptibility testing revealed that majority of NCF isolates were resistant to chloramphenicol (77%) and levofloxacin (34%). NCF isolates have a comparable number of virulence genes to CF isolates. A well-studied pathogenicity island of B. cenocepacia, GI11 is present in ST628 and ST709 isolates from the Indian Bcc population. In contrast, genomic island GI15 (highly similar to the island found in B. pseudomallei strain EY1) is exclusively reported in ST839 and ST824 isolates from two different locations in India. Horizontal acquisition of lytic phage ST79 of pathogenic B. pseudomallei is demonstrated in ST628 isolates Bcc1463, Bcc29163, and BccR4654 amongst CC31 lineage.DiscussionThe study reveals a high diversity of CC31 lineages among B. cenocepacia isolates from India. The extensive information from this study will facilitate the development of rapid diagnostic and novel therapeutic approaches to manage B. cenocepacia infections.

  19. f

    DataSheet2_Comparative de novo transcriptome analysis of flower and root of...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Khodavirdipour; Reza Safaralizadeh; Mehdi Haghi; Mohammad Ali Hosseinpourfeizi (2023). DataSheet2_Comparative de novo transcriptome analysis of flower and root of Oliveria decumbens Vent. to identify putative genes in terpenes biosynthesis pathway.PDF [Dataset]. http://doi.org/10.3389/fgene.2022.916183.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Amir Khodavirdipour; Reza Safaralizadeh; Mehdi Haghi; Mohammad Ali Hosseinpourfeizi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Oliveria decumbens Vent. is a wild, rare, annual medicinal plant and endemic plant of Iran that has metabolites (mostly terpenes) which make it a precious plant in Persian Traditional Medicine and also a potential chemotherapeutic agent. The lack of genetic resources has slowed the discovery of genes involved in the terpenes biosynthesis pathway. It is a wild relative of Daucus carota. In this research, we performed the transcriptomic differences between two samples, flower and root of Oliveria decumbens, and also analyze the expression value of the genes involved in terpenoid biosynthesis by RNA-seq and its essential oil’s phytochemicals analyzed by GC/MS. In total, 136,031,188 reads from two samples of flower and root have been produced. The result shows that the MEP pathway is mostly active in the flower and the MVA in the root. Three genes of GPP, FPPS, and GGPP that are the precursors in the synthesis of mono, di, and triterpenes are upregulated in root and 23 key genes were identified that are involved in the biosynthesis of terpenes. Three genes had the highest upregulation in the root including, and on the other hand, another three genes had the expression only in the flower. Meanwhile, 191 and 185 upregulated genes in the flower and root of the plant, respectively, were selected for the gene ontology analysis and reconstruction of co-expression networks. The current research is the first of its kind on Oliveria decumbens transcriptome and discussed 67 genes that have been deposited into the NCBI database. Collectively, the information obtained in this study unveils the new insights into characterizing the genetic blueprint of Oliveria decumbens Vent. which paved the way for medical/plant biotechnology and the pharmaceutical industry in the future.

  20. f

    Data_Sheet_1_Virulent Epidemic Pneumonia in Sheep Caused by the Human...

    • frontiersin.figshare.com
    pdf
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bodo Linz; Nadia Mukhtar; Muhammad Zubair Shabbir; Israel Rivera; Yury V. Ivanov; Zarfishan Tahir; Tahir Yaqub; Eric T. Harvill (2023). Data_Sheet_1_Virulent Epidemic Pneumonia in Sheep Caused by the Human Pathogen Acinetobacter baumannii.PDF [Dataset]. http://doi.org/10.3389/fmicb.2018.02616.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Frontiers
    Authors
    Bodo Linz; Nadia Mukhtar; Muhammad Zubair Shabbir; Israel Rivera; Yury V. Ivanov; Zarfishan Tahir; Tahir Yaqub; Eric T. Harvill
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The human pathogen Acinetobacter baumannii has emerged as a frequent cause of hospital-acquired infections, but infection of animals has rarely been observed. Here we analyzed an outbreak of epidemic pneumonia killing hundreds of sheep on a farm in Pakistan and identified A. baumannii as the infecting agent. A pure culture of strain AbPK1 isolated from lungs of sick animals was inoculated into healthy sheep, which subsequently developed similar disease symptoms. Bacteria re-isolated from the infected animals were shown to be identical to the inoculum, fulfilling Koch’s postulates. Comparison of the AbPK1 genome against 2283 A. baumannii genomes from the NCBI database revealed that AbPK1 carries genes for unusual surface structures, including a unique composition of iron acquisition genes, genes for O-antigen synthesis and sialic acid-specific acetylases of cell-surface carbohydrates that could enable immune evasion. Several of these unusual and otherwise rarely present genes were also identified in genomes of phylogenetically unrelated A. baumannii isolates from combat-wounded US military from Afghanistan indicating a common gene pool in this geographical region. Based on core genome MLST this virulent isolate represents a newly emerging lineage of Global Clone 2, suggesting a human source for this disease outbreak. The observed epidemic, direct transmission from sheep to sheep, which is highly unusual for A. baumannii, has important consequences for human and animal health. First, direct animal-to-animal transmission facilitates fast spread of pathogen and disease in the flock. Second, it may establish a stable ecological niche and subsequent spread in a new host. And third, it constitutes a serious risk of transmission of this hyper-virulent clone from sheep back to humans, which may result in emergence of contagious disease amongst humans.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Valérian Lupo; Mick Van Vlierberghe; Hervé Vanderschuren; Frédéric Kerff; Denis Baurain; Luc Cornet (2023). Data_Sheet_1_Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics.pdf [Dataset]. http://doi.org/10.3389/fmicb.2021.755101.s001

Data_Sheet_1_Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics.pdf

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Valérian Lupo; Mick Van Vlierberghe; Hervé Vanderschuren; Frédéric Kerff; Denis Baurain; Luc Cornet
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.

Search
Clear search
Close search
Google apps
Main menu