Facebook
TwitterThe GenBank non-redundant protein sequence database (NRDB) is a component of the NCBI BLAST databases and contains entries from GenPept, Swissprot, PIR, PDF, PDB and NCBI RefSeq.
Facebook
Twitter16S rRNA genes sequencing has been used for routine species identification and phylogenetic studies of bacteria. However, the high sequence similarity between some species and heterogeneity within copies at the intragenomic level could be a limiting factor of discriminatory ability. In this study, we aimed to compare 16S rRNA genes sequences and genome-based analysis (core SNPs and ANI) for identification of non-pathogenic Yersinia. We used complete and draft genomes of 373 Yersinia strains from the NCBI Genome database. The taxonomic affiliations of 34 genomes based on core SNPs and the ANI results did not match those specified in the GenBank database (NCBI). The intragenic homology of the 16S rRNA gene copies exceeded 99.5% in complete genomes, but above 50% of genomes have four or more variants of the 16S rRNA gene. Among 327 draft genomes of non-pathogenic Yersinia, 11% did not have a full-length 16S rRNA gene. Most of draft genomes has one copy of gene and it is not possible to define the intragenomic heterogenicity. The average homology of 16S rRNA gene was 98.76%, and the maximum variability was 2.85%. The low degree of genetic heterogenicity of the gene (0.36%) was determined in group Y. pekkanenii/Y. proxima/Y. aldovae/Y. intermedia/Y. kristensenii/Y. rochesterensis. The identical gene sequences were found in the genomes of the Y. intermedia and Y. rochesterensis strains identified using ANI and core SNPs analyses. The phylogenetic tree based on 16S rRNA genes differed from the tree based on core SNPs of the genomes and did not represent phylogenetic relationship between the Yersinia species. These findings will help to fill the data gaps in genome characteristics of deficiently studied non-pathogenic Yersinia.
Facebook
TwitterMaizeMine is the data mining resource of the Maize Genetics and Genome Database (MaizeGDB; http://maizemine.maizegdb.org). It enables researchers to create and export customized annotation datasets that can be merged with their own research data for use in downstream analyses. MaizeMine uses the InterMine data warehousing system to integrate genomic sequences and gene annotations from the Zea mays B73 RefGen_v3 and B73 RefGen_v4 genome assemblies, Gene Ontology annotations, single nucleotide polymorphisms, protein annotations, homologs, pathways, and precomputed gene expression levels based on RNA-seq data from the Z. mays B73 Gene Expression Atlas. MaizeMine also provides database cross references between genes of alternative gene sets from Gramene and NCBI RefSeq. MaizeMine includes several search tools, including a keyword search, built-in template queries with intuitive search menus, and a QueryBuilder tool for creating custom queries. The Genomic Regions search tool executes queries based on lists of genome coordinates, and supports both the B73 RefGen_v3 and B73 RefGen_v4 assemblies. The List tool allows you to upload identifiers to create custom lists, perform set operations such as unions and intersections, and execute template queries with lists. When used with gene identifiers, the List tool automatically provides gene set enrichment for Gene Ontology (GO) and pathways, with a choice of statistical parameters and background gene sets. With the ability to save query outputs as lists that can be input to new queries, MaizeMine provides limitless possibilities for data integration and meta-analysis.
Facebook
TwitterAbstractRecent global surveys of marine biodiversity have revealed that a group of organisms known as “marine diplonemids” constitutes one of the most abundant and diverse planktonic lineages [1]. Though discovered over a decade ago [2 and 3], their potential importance was unrecognized, and our knowledge remains restricted to a single gene amplified from environmental DNA, the 18S rRNA gene (small subunit [SSU]). Here, we use single-cell genomics (SCG) and microscopy to characterize ten marine diplonemids, isolated from a range of depths in the eastern North Pacific Ocean. Phylogenetic analysis confirms that the isolates reflect the entire range of marine diplonemid diversity, and comparisons to environmental SSU surveys show that sequences from the isolates range from rare to superabundant, including the single most common marine diplonemid known. SCG generated a total of ∼915 Mbp of assembled sequence across all ten cells and ∼4,000 protein-coding genes with homologs in the Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology database, distributed across categories expected for heterotrophic protists. Models of highly conserved genes indicate a high density of non-canonical introns, lacking conventional GT-AG splice sites. Mapping metagenomic datasets [4] to SCG assemblies reveals virtually no overlap, suggesting that nuclear genomic diversity is too great for representative SCG data to provide meaningful phylogenetic context to metagenomic datasets. This work provides an entry point to the future identification, isolation, and cultivation of these elusive yet ecologically important cells. The high density of nonconventional introns, however, also portends difficulty in generating accurate gene models and highlights the need for the establishment of stable cultures and transcriptomic analyses., Usage notesSingle-cell genomic scaffolds from 10 'wild-caught' marine diplonemidsFASTA format single-cell genomic scaffolds of 10 marine diplonemid (protist) cells are presented. Scaffolds were generated with the SPAdes assembler; contaminating sequences were removed, as described in the publication. Each FASTA file is derived from a single cell. Cells are referred to by the numbers used in the publication (i.e., cells 3, 13, 21, 27, 37, 47, 1sb, 4sb, 9sb, 21sb) as no species names exist.marine_diplonemid_SAGs.zipFigure S1 (related to Figure 1). Taxon-annotated GC plots demonstrate the effectiveness of our decontamination procedure.Plots were generated using blobtools (https://github.com/DRL/blobtools) for each SCG assembly before and after decontamination using the megablast/blastx protocol described in Experimental Procedures. Plots are based on megablast queries of the NCBI nt database according to taxonomic Order.FigS1.pdf
Facebook
TwitterPurpose of experiments:
Sequence data obtained to determine community structure of pack sea-ice microbial communities and whether it is effected by exposures to elevated CO2 levels.
Summary of Methods:
Cells in sea-ice brines were filtered onto 0.2 micron filters and material extracted using the MoBio Water DNA extraction kit. The DNA was analysed by Research and Testing Laboratories Inc. (Lubbock, Texas, USA) via 454 pyrosequencing. The bacteria were analysed using primers set 10F-519R, which targets 16S rRNA genes. 16S rRNA genes associated with chloroplast and mitochondria are included in this dataset but represent a minority of sequences in most samples. Eukaryotes were analysed using primers set 550F-1055R, which targets 18S rRNA genes. The 454 pyrosequencing analysis with the Titanium GS FLX+ kit used generates on average 3000 reads incorporating custom pyrotags for later stages of the data analysis. The specific steps used for subsequent data analysis are described in the attached PDF file (Data_Analysis_Methodology.PDF). This output was further refined by first determining consensus sequences at the 98% similarity level using Weizhong Li’s online software site CD-HIT (http://weizhongli-lab.org/cd-hit/) Reference: Niu B, Fu L, Sun S, Li W. 2010. Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 1:187 doi:10.1186/1471-2105-11-187. The consensus sequences were then checked for errors, manually curated, and aligned against closest matching sequences obtained from the NCBI database (www.ncbi.nlm.nih.gov) to finally obtained a list of consensus operational taxonomic entities and the number of reads obtained for each samples analysed.
File: SIPEXII_DNA_Sample_information.xlsx provides sampling and analysis information for the detailed results in the other two files File: SCIPEXII_sea_ice_bacteria_OTUs.xlsx contains information on the number of 16S rRNA reads in bacteria Phylum/Class and OTUs File: SCIPEXII_sea_ice_brines_eukaryote_community_OTU_data.xlsx contains information on the number of 16S rRNA reads in eukaryotic microbes: Phylum/Order/Closest taxon and OTUs
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The recent incorporation of bacterial whole-genome sequencing (WGS) into Public Health laboratories has enhanced foodborne outbreak detection and source attribution. As a result, large volumes of publicly available datasets can be used to study the biology of foodborne pathogen populations at an unprecedented scale. To demonstrate the application of a heuristic and agnostic hierarchical population structure guided pan-genome enrichment analysis (PANGEA), we used populations of S. enterica lineage I to achieve two main objectives: (i) show how hierarchical population inquiry at different scales of resolution can enhance ecological and epidemiological inquiries; and (ii) identify population-specific inferable traits that could provide selective advantages in food production environments. Publicly available WGS data were obtained from NCBI database for three serovars of Salmonella enterica subsp. enterica lineage I (S. Typhimurium, S. Newport, and S. Infantis). Using the hierarchical genotypic classifications (Serovar, BAPS1, ST, cgMLST), datasets from each of the three serovars showed varying degrees of clonal structuring. When the accessory genome (PANGEA) was mapped onto these hierarchical structures, accessory loci could be linked with specific genotypes. A large heavy-metal resistance mobile element was found in the Monophasic ST34 lineage of S. Typhimurium, and laboratory testing showed that Monophasic isolates have on average a higher degree of copper resistance than the Biphasic ones. In S. Newport, an extra sugE gene copy was found among most isolates of the ST45 lineage, and laboratory testing of multiple isolates confirmed that isolates of S. Newport ST45 were on average less sensitive to the disinfectant cetylpyridimium chloride than non-ST45 isolates. Lastly, data-mining of the accessory genomic content of S. Infantis revealed two cryptic Ecotypes with distinct accessory genomic content and distinct ecological patterns. Poultry appears to be the major reservoir for Ecotype 1, and temporal analysis further suggested a recent ecological succession, with Ecotype 2 apparently being displaced by Ecotype 1. Altogether, the use of a heuristic hierarchical-based population structure analysis that includes bacterial pan-genomes (core and accessory genomes) can (1) improve genomic resolution for mapping populations and accessing epidemiological patterns; and (2) define lineage-specific informative loci that may be associated with survival in the food chain.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gastric cancer (GC) is a common malignant tumor of the digestive system. Recent studies revealed that high gamma-glutamyl-transferase 5 (GGT5) expression was associated with a poor prognosis of gastric cancer patients. In the present study, we aimed to confirm the expression and prognostic value of GGT5 and its correlation with immune cell infiltration in gastric cancer. First, we compared the differential expression of GGT5 between gastric cancer tissues and normal gastric mucosa in the cancer genome atlas (TCGA) and GEO NCBI databases using the most widely available data. Then, the Kaplan-Meier method, Cox regression, and univariate logistic regression were applied to explore the relationships between GGT5 and clinical characteristics. We also investigated the correlation of GGT5 with immune cell infiltration, immune-related genes, and immune checkpoint genes. Finally, we estimated enrichment of gene ontologies categories and relevant signaling pathways using GO annotations, KEGG, and GSEA pathway data. The results showed that GGT5 was upregulated in gastric cancer tissues compared to normal tissues. High GGT5 expression was significantly associated with T stage, histological type, and histologic grade (p < 0.05). Moreover, gastric cancer patients with high GGT5 expression showed worse 10-years overall survival (p = 0.008) and progression-free intervals (p = 0.006) than those with low GGT5 expression. Multivariate analysis suggested that high expression of GGT5 was an independent risk factor related to the worse overall survival of gastric cancer patients. A nomogram model for predicting the overall survival of GC was constructed and computationally validated. GGT5 expression was positively correlated with the infiltration of natural killer cells, macrophages, and dendritic cells but negatively correlated with Th17 infiltration. Additionally, we found that GGT5 was positively co-expressed with immune-related genes and immune checkpoint genes. Functional analysis revealed that differentially expressed genes relative to GGT5 were mainly involved in the biological processes of immune and inflammatory responses. In conclusion, GGT5 may serve as a promising prognostic biomarker and a potential immunological therapeutic target for GC, since it is associated with immune cell infiltration in the tumor microenvironment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R1: Establishment and purification of neuropeptide sequences
The LW, APGW, RPCH, AKH, CRZ, and GnRH neuropeptide families were searched in the GenBank database using 10 keywords: the neuropeptide name, the precursor abbreviation, the full name of the precursor, the full name of the precursor with the word “prepropeptide,” and the combinations of these terms. The candidate sequences were downloaded in FASTA format using the appropriate commands in the GenBank database. The AKH neuropeptide family was classified according to the groups published in the literature, as well as the amino acid number and sequence. Furthermore, the ACP hybrid family was identified in the GenBank database using BLAST alignments.
C00: Neuropeptide Precursor. Eight folders were named with the initials of each neuropeptide family. The AKH family folder was the only one containing four subfolders. All of the folders contained the same type of files: three text files named after the neuropeptide initials and the obtained result. The files identified with the words “with codes” contained the sequences with the codes generated for this study, whereas the documents with the word “Full” contained the GenBank database search results obtained with the 10 aforementioned keywords. These files were located in a folder named “Fasta Keywords.” Each file contained the results from each respective keyword. The files with the words “selected EA” contained the sequences that were selected for evolutionary analyses.
C01: BLAST ACP. The text file named “00 BLAST ACP” contains the BLAST alignment results obtained from the NCBI database generated with the Adipokinetic Hormone/Corazonin-related peptide from the transcriptome of Callinectes toxotes. The file named “01 ACP Selected” contains the precursors selected for this study. All sequences were in FASTA format and contained the codes summarized in Supplementary Material 3 “Database Sequences.”
The file named “02 ACP selected EA” contains the ACP precursors of other species, which were used for the evolutionary analyses of C. toxotes ACP. The PDF file titled “03 ACP ProP 1.0 Serv” contains the results of the proteolytic cleavage sites of the precursors indicated in the file named “02 ACP selected EA,” which were generated using the aforementioned software.
C02: BLAST VP. The folder contains the results of the BLAST alignment against the NCBI database, which were generated with the virtual peptide sequences reported by Martinez-Perez et al. (2007). This folder contains seven text files. The name of each file corresponds to the precursor and species in which it was identified. Moreover, the PDF document named “Virtual peptides ProP 1.0 Serv” contains the results of the proteolytic cleavage sites generated with the aforementioned software.
C03: Debugging sequences with software. This folder contains three subfolders containing the results obtained with each software used in this study for the detection of each of the neuropeptide sequences using the appropriate keywords.
The folder named “BioDataToolKit” contains six subfolders with the abbreviated name of each neuropeptide. Additionally, there is a file containing the sequences downloaded from the GenBank database, as well as a Microsoft Excel file containing the details generated by the software. The name of each file corresponds to the keywords used for each search. The software used in this study can be found in the following repository: https://github.com/rduarte24/BiodataToolkit.
The folder named “Pro1.0Server” was organized in the same way as the results derived for the “BioDataToolKit” for each neuropeptide family. However, each of the neuropeptide folders contained a file with the pertinent sequences whereas another file contained the endoproteolytic cleavage sites of the neuropeptide precursors obtained with the software.
The folder named “Proteios” contains seven files. The file names indicate the precursor analyzed with the software and the identified sequences in FASTA format. The Proteios software is available in the following website: https://github.com/Martin-Munive/Proteios.
C04: Neuropeptide precursors for evolutionary analysis. Files with the sequences of the neuropeptide precursors used for the generation of the phylogenetic trees in Supplementary Materials 4 and 7. The name of each file corresponds to the name of each of the analyzed neuropeptides.
R2: Transcriptome BLAST
Microsoft Excel file containing the BLAST alignments conducted using the sequences of the AKH/CRZ-related peptide (ACP) from C. toxotes and Corazonin (CRZ) from C. arcuatus. The following information is summarized in the spreadsheets named C. toxotes and C. arcuatus: Column A, neuropeptide name; Column B, species name; Columns C–G, BLAST alignment results; Column H, GenBank protein accession number; Column I, precursor sequence.
R3: Construction of neuropeptide database
Microsoft Excel file with information pertaining to the database and a detailed description of each of the neuropeptide precursors analyzed in this study. The Excel file contains seven spreadsheet tabs. Each of the tabs contains the following columns:
Neuropeptides. Column A, sequence numbering in descending order; Column B, neuropeptide name; Column C, identification code used in this study; Column D, accession number; Columns E–G, species taxonomy; Columns H–L, GenBank sequence description; Columns M–N, literature reference and link. Taxonomy. Taxonomic description of each of the examined species derived from the NCBI database. Sequences evolutionary anal. This tab contains the code developed for this work in Column C; the GenBank accession codes of each neuropeptide are summarized in Column D and species taxonomy details are summarized in Columns E y F. Table of differences. Column B shows the codes of identical sequences and Column C shows the code of the sequence selected for this study. Codes deleted. This tab contains the accession codes of the species and the species name but contains no details on the properties of the neuropeptide precursors. Sequences Paper. Neuropeptide sequences reported in previous studies that were later reported in the GenBank database. The sequences marked with asterisks have not been previously reported in public databases. The codes used in this study to designate the sequences are also included. Keywords. Keywords used to conduct the GenBank database searches to obtain the members of each neuropeptide family.
R4: In silico validation, alignments, and phylogenetic relationships
Generated phylogenetic trees and results obtained from individual runs for each of the neuropeptide families with the DNA-LM and Kalign parameters using the IQ-TREE software.
The folder named “RUN” contains the “DNALM and kalign 2.0 default parameters” subfolder. Both folders contain 11 subfolders with the names of each of the neuropeptide families, as well as the results obtained with the IQ-TREE software. The folder named “Trees” contains the folder “DNALM and kalign 2.0 default parameters” containing the phylogenetic trees for each of the neuropeptide families, which were created with the Itol software.
R5: BLAST alignment of the virtual peptide precursors
Results of the BLAST alignment of the virtual peptides described by Martinez-Perez et al. (2007) with respect to the sequences in the GenBank database. The files follow the same nomenclature as in the folder named “Carpeta 02 BLAST VP” in Repository 1.
R6: Alignment of neuropeptide precursors
“DNALM and Kalign 2.0 default parameter” folders. Each of these folders contains the alignments of the examined neuropeptide precursors from each family and each folder is named after the corresponding neuropeptide. The remaining files contain the alignments in ascending order in the evolutionary scale and are appropriately named after the corresponding neuropeptide. The file named “All Sequence FASTA” contains the sequences used in our study in FASTA format.
R7: Phylogenetic clustering of the precursors
“DNALM and Kalign 2.0 default parameter” folders. Both folders contain the phylogenetic tree clustering results from Supplementary Material 6, which were obtained using the DNA-LM y Kalign parameters and the IQ-TREE software. All analyses were conducted using the GUANE-1 supercomputer (Universidad Industrial de Santander). The phylogenetic clustering results of all of the precursors are contained in the folders with the respective precursor name. The folder also contains Figure 6, which was included in our main manuscript.
Facebook
TwitterStreptococcus dysgalactiae subsp. dysgalactiae (SDSD) has been considered a strict animal pathogen. Nevertheless, the recent reports of human infections suggest a niche expansion for this subspecies, which may be a consequence of the virulence gene acquisition that increases its pathogenicity. Previous studies reported the presence of virulence genes of Streptococcus pyogenes phages among bovine SDSD (collected in 2002–2003); however, the identity of these mobile genetic elements remains to be clarified. Thus, this study aimed to characterize the SDSD isolates collected in 2011–2013 and compare them with SDSD isolates collected in 2002–2003 and pyogenic streptococcus genomes available at the National Center for Biotechnology Information (NCBI) database, including human SDSD and S. dysgalactiae subsp. equisimilis (SDSE) strains to track temporal shifts on bovine SDSD genotypes. The very close genetic relationships between humans SDSD and SDSE were evident from the analysis of housekeeping genes, while bovine SDSD isolates seem more divergent. The results showed that all bovine SDSD harbor Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas IIA system. The widespread presence of this system among bovine SDSD isolates, high conservation of repeat sequences, and the polymorphism observed in spacer can be considered indicators of the system activity. Overall, comparative analysis shows that bovine SDSD isolates carry speK, speC, speL, speM, spd1, and sdn virulence genes of S. pyogenes prophages. Our data suggest that these genes are maintained over time and seem to be exclusively a property of bovine SDSD strains. Although the bovine SDSD genomes characterized in the present study were not sequenced, the data set, including the high homology of superantigens (SAgs) genes between bovine SDSD and S. pyogenes strains, may indicate that events of horizontal genetic transfer occurred before habitat separation. All bovine SDSD isolates were negative for genes of operon encoding streptolysin S, except for sagA gene, while the presence of this operon was detected in all SDSE and human SDSD strains. The data set of this study suggests that the separation between the subspecies “dysgalactiae” and “equisimilis” should be reconsidered. However, a study including the most comprehensive collection of strains from different environments would be required for definitive conclusions regarding the two taxa.
Facebook
TwitterWe report the results of chromatin immunoprecipitation following by high-thoughput tag sequencing (ChIP-Seq) using the GA II platform from Illumina for the human transcription factor STAT1 in HeLa S3 cells. The STAT1 ChIP was performed using HeLa S3 cells that are stimulated using gamma-interferon. We have also generated a seqenced input DNA dataset for gamma-interferon stimulated HeLa S3 cells. Raw data for this study is available for download from the Short Read Archive database at: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP000703. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Examination of the STAT1 transcription factor in Human HeLa S3.
Facebook
TwitterTrueperella pyogenes (T. pyogenes) is an important opportunistic animal pathogen that causes huge economic losses to the animal husbandry industry. The emergence of bacterial resistance and the unsatisfactory effect of the vaccine have prompted investigators to explore alternative strategies for controlling T. pyogenes infection. Due to the ability of phages to kill multidrug-resistant bacteria, the use of phage therapy to combat multidrug-resistant bacterial infections has attracted attention. In this study, a T. pyogenes phage, vB-ApyS-JF1 (JF1), was isolated from sewage samples, and its whole genome and biological characteristics were elucidated. Moreover, the protective effect of phage JF1 on a mouse bacteremic model caused by T. pyogenes was studied. JF1 harbors a double-stranded DNA genome with a length of 90,130 bp (30.57% G + C). The genome of JF1 lacked bacterial virulence–, antibiotic resistance– and lysogenesis-related genes. Moreover, the genome sequence of JF1 exhibited low coverage (<6%) with all published phages in the NCBI database, and a phylogenetic analysis of the terminase large subunits and capsid indicated that JF1 was evolutionarily distinct from known phages. In addition, JF1 was stable over a wide range of pH values (3 to 11) and temperatures (4 to 50°C) and exhibited strong lytic activity against T. pyogenes in vitro. In murine experiments, a single intraperitoneal administration of JF1 30 min post-inoculation provided 100% protection for mice against T. pyogenes infection. Compared to the phosphate-buffered saline (PBS) treatment group, JF1 significantly (P < 0.01) reduced the bacterial load in the blood and tissues of infected mice. Meanwhile, treatment with phage JF1 relieved the pathological symptoms observed in each tissue. Furthermore, the levels of the inflammatory cytokines tumour necrosis factor-α (TNF-α), interferon-γ (IFN-γ), and interleukin-6 (IL-6) in the blood of infected mice were significantly (P < 0.01) decreased in the phage-treated group. Taken together, these results indicate that phage JF1 demonstrated great potential as an alternative therapeutic treatment against T. pyogenes infection.
Facebook
TwitterSequence based phylogenetic analyses of 34 vertebrate gene families identified in an analysis of conserved synteny in chromosome regions containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage gated calcium channels (CACNA1-L). For each gene family amino acid sequences were predicted from the Ensembl genome browser (http://www.ensembl.org) and used to create sequence alignments and phylogenetic trees. Vertebrate gene families were defined based on Ensembl protein family predictions. Database identifiers, location data, genome assembly information and annotation notes for all identified protein families and sequences are included in 'Supplemental Table 705852.xlsx' (Excel spreadsheet). This spreadsheet also includes informaction on 7 gene families that were discarded from the analyses. Gene families are identified by unique abbreviations based on approved HUGO Gene Nomenclature Committe (HGNC) gene symbols, or known aliases from the NCBI Entrez Gene database. File information: For each gene family an alignment file '...align.fasta', a neighbor joining tree '...NJ.phb' and a phylogenetic maximum likelihood tree '...PhyML.phb' are included. Alignments are included in FASTA format with the extension '.fasta'. This file format can be opened by most sequence analysis applications as well as text editors. Phylogenetic tree files are included in Phylip/Newick format with the extension '.phb'. This file format can be opened by freely available phylogenetic tree viewers such as FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and TreeView (http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/). Corresponding figures for all phylogenetic trees are also included as PDF files. Sequence names/leaf names include species abbreviations (see below) as well as chromosome/linkage group/genomic scaffold numbers, with lowercase letters to distinguish sequences located on the same chromomosome, linkage group or scaffold. For the human sequences the full HGNC gene symbol is included. The species included in these analyses were (abbreviations and common names in parenthesis): Homo sapiens (Hsa, human), Mus musculus (Mmu, mouse), Monodelphis domestica (Mdo, grey short-tailed opossum), Gallus gallus (Gga, chicken), Danio rerio (Dre, zebrafish), Oryzias latipes (Ola, medaka), Gasterosteus aculeatus (Gac, three-spined stickleback), Tetraodon nigroviridis (Tni, green spotted pufferfish), Ciona intestinalis (Cin, tunicate), Ciona savignyi (Csa, tunicate) and Drosophila melanogaster (Dme, fruit fly). In some analyses the following additional species were used: Sarcophilus harrisii (Sha, Tasmanian devil), Taeniopygia guttata (Tgu, zebra finch), Anolis carolinensis (Aca, Carolina anole lizard), Xenopus (Silurana) tropicalis (Xtr, Western clawed frog), Takifugu rubripes (Tru, Japanese pufferfish), Branchiostoma floridae (Bfl, Florida lancelet) and Caenorhabditis elegans (Cel, nematode). The following vertebrate gene families are included in this file set: ATP2B: ATPase, Ca++ transporting, plama membraneB4GALNT: Beta-1,4-N-acetyl-galactosaminyl transferaseCACNA2D: Calcium channel, voltage-dependent, alpha 2/delta subunitCAMK1: Calcium/calmodulin dependent protein kinaseCDK: Cyclin-dependent kinase, members 16, 17 and 18CELSR: Cadherin, EGF LAG seven-pass G-type receptor (flamingo homolog, Drosophila)CNTN: Contactin precursorCOPG: Coatomer protein complex, subunit gammaERC: ELKS/RAB6-interacting/CAST familyFLN: FilaminGXYLT: Glucoside xylosyltransferaseIKBKE: Kinase epsilon and TANK-binding kinaseIQSEC: IQ motif and Sec7 domain containingKDM: Lysine specific demethylase 5KLHDC: Kelch domain containing 8L1CAM: L1 cell adehesion moleculeLRRN: Leucine rich repeat neuronalMAGI: Membrane associated guanylate kinase, WW and PDZ domain containingPHTF: Putative homeodomain transcription factorPLG: Plaminogen orthologPLXNA: Plexin APPM1: Protein phosphatase, Mg2+/Mn2+ dependentPRICKLE: Prickle homologPTPN: Protein tyrosine phosphatase, non-receptor typeRBM: RNA binding motif proteinRSBN: Round spermatid basic proteinSEMA3: Sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin)SRGAP: SLIT-ROBO Rho GTPase activating proteinSYP: SynaptophysinTIMM: Translocase of inner mitochondrial membrane 17TWF: TwinfilinUBA: Ubiquitin-like modifier activating enzyme, members 1 and 7USP: Ubiquitin specific peptidase, members 4, 11, 15 and 19WNK: WNK lysine deficient protein kinase Method details: Alignments were created using the ClustalW sequence alignment algorithm with the following settings: Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20. Phylogenetic analyses were carried out based on the included alignments using bootstrap-supported neighbor joining (NJ) as well as phylogenetic maximum likelihood (PhyML) methods supported by approximate likelihood ratio tests (aLRT). Phylogenetic trees are rooted with identified Drosophila melanogaster (fruit fly) sequences, if possible. Alternatively some phylogenetic trees are rooted with other identified invertebrate sequences (see Supplemental Table 1). The B4GALNT, PLG, PTPN, RBM, SEMA3 and USP trees are presented as mindpoint-rooted trees in the figures (PDF), however the phylogenetic tree files (.phb) are unrooted. NJ trees were made using standard settings in ClustalX 2.0.12 (http://www.clustal.org/clustal2/), supported by a non-parametric bootstrap analysis with 1000 replicates. PhyML trees were made using the PhyML3.0 algorithm (http://www.atgc-montpellier.fr/phyml/) with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites (with optimised p-invar) and gamma shape parameters were estimated from the alignments, the number of substitution rate categories was set to 8, BIONJ was chosen to create the starting tree, both NNI and SPR tree optimization methods were considered and both tree topology and branch length optimization were chosen. The amino acid substitution model was chosen based on ProtTest3.2 (http://code.google.com/p/prottest3/) results. The JTT model was applied for all gene families except B4GALNT, CACNA2D, COL, L1CAM, PLG, PPP, QSOX and UBA where the WAG model was chosen, and RPL and TWF where the LG model was chosen. PhyML trees are supported by approximate likelihood ratio tests (aLRT) with SH-like branch upports applied through PhyML. For the CAMK and GXYLT gene families the PhyML trees were repeated (same settings) using a non-parametric bootstrap analysis with 100 replicates rather that aLRT in PhyML. These trees did not improve on the aLRT-supported tree topologies.
Facebook
TwitterBackgroundThe pks island and its production of the bacterial secondary metabolite genotoxin, colibactin, have attracted increasing attention. However, genomic articles focusing on pks islands in Klebsiella pneumoniae, as well as comparative genomic studies of mobile genetic elements, such as prophages, plasmids, and insertion sequences, are lacking. In this study, a large-scale analysis was conducted to understand the prevalence and evolution of pks islands, differences in mobile genetic elements between pks-negative and pks-positive K. pneumoniae, and clinical characteristics of infection caused by pks-positive K. pneumoniae.MethodsThe genomes of 2,709 K. pneumoniae were downloaded from public databases, among which, 1,422 were from NCBI and 1,287 were from the China National GeneBank DataBase (CNGBdb). Screening for virulence and resistance genes, phylogenetic tree construction, and pan-genome analysis were performed. Differences in mobile genetic elements between pks-positive and pks-negative strains were compared. The clinical characteristics of 157 pks-positive and 157 pks-negative K. pneumoniae infected patients were investigated.ResultsOf 2,709 K. pneumoniae genomes, 245 pks-positive genomes were screened. The four siderophores, type VI secretion system, and nutritional factor genes were present in at least 77.9% (191/245), 66.9% (164/245), and 63.3% (155/245) of pks-positive strains, respectively. The number and fragment length of prophage were lower in pks-positive strains than in pks-negative strains (p < 0.05). The prevalence of the IS6 family was higher in pks-negative strains than in pks-positive strains, and the prevalence of multiple plasmid replicon types differed between the pks-positive and pks-negative strains (p < 0.05). The detection rate of pks-positive K. pneumoniae in abscess samples was higher than that of pks-negative K. pneumoniae (p < 0.05).ConclusionThe pks-positive strains had abundant virulence genes. There were differences in the distribution of mobile genetic elements between pks-positive and pks-negative isolates. Further analysis of the evolutionary pattern of pks island and epidemiological surveillance in different populations are needed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionBurkholderia cepacia complex (Bcc) clonal complex (CC) 31, the predominant lineage causing devastating outbreaks globally, has been a growing concern of infections in non-cystic fibrosis (NCF) patients in India. B. cenocepacia is very challenging to treat owing to its virulence determinants and antibiotic resistance. Improving the management of these infections requires a better knowledge of their resistance patterns and mechanisms.MethodsWhole-genome sequences of 35 CC31 isolates obtained from patient samples, were analyzed against available 210 CC31 genomes in the NCBI database to glean details of resistance, virulence, mobile elements, and phylogenetic markers to study genomic diversity and evolution of CC31 lineage in India.ResultsGenomic analysis revealed that 35 isolates belonging to CC31 were categorized into 11 sequence types (ST), of which five STs were reported exclusively from India. Phylogenetic analysis classified 245 CC31 isolates into eight distinct clades (I-VIII) and unveiled that NCF isolates are evolving independently from the global cystic fibrosis (CF) isolates forming a distinct clade. The detection rate of seven classes of antibiotic-related genes in 35 isolates was 35 (100%) for tetracyclines, aminoglycosides, and fluoroquinolones; 26 (74.2%) for sulphonamides and phenicols; 7 (20%) for beta-lactamases; and 1 (2.8%) for trimethoprim resistance genes. Additionally, 3 (8.5%) NCF isolates were resistant to disinfecting agents and antiseptics. Antimicrobial susceptibility testing revealed that majority of NCF isolates were resistant to chloramphenicol (77%) and levofloxacin (34%). NCF isolates have a comparable number of virulence genes to CF isolates. A well-studied pathogenicity island of B. cenocepacia, GI11 is present in ST628 and ST709 isolates from the Indian Bcc population. In contrast, genomic island GI15 (highly similar to the island found in B. pseudomallei strain EY1) is exclusively reported in ST839 and ST824 isolates from two different locations in India. Horizontal acquisition of lytic phage ST79 of pathogenic B. pseudomallei is demonstrated in ST628 isolates Bcc1463, Bcc29163, and BccR4654 amongst CC31 lineage.DiscussionThe study reveals a high diversity of CC31 lineages among B. cenocepacia isolates from India. The extensive information from this study will facilitate the development of rapid diagnostic and novel therapeutic approaches to manage B. cenocepacia infections.
Facebook
TwitterSome Brucella spp. are important pathogens. According to the latest prokaryotic taxonomy, the Brucella genus consists of facultative intracellular parasitic Brucella species and extracellular opportunistic or environmental Brucella species. Intracellular Brucella species include classical and nonclassical types, with different species generally exhibiting host preferences. Some classical intracellular Brucella species can cause zoonotic brucellosis, including B. melitensis, B. abortus, B. suis, and B. canis. Extracellular Brucella species comprise opportunistic or environmental species which belonged formerly to the genus Ochrobactrum and thus nowadays renamed as for example Brucella intermedia or Brucella anthropi, which are the most frequent opportunistic human pathogens within the recently expanded genus Brucella. The cause of the diverse phenotypic characteristics of different Brucella species is still unclear. To further investigate the genetic evolutionary characteristics of the Brucella genus and elucidate the relationship between its genomic composition and prediction of phenotypic traits, we collected the genomic data of Brucella from the NCBI Genome database and conducted a comparative genomics study. We found that classical and nonclassical intracellular Brucella species and extracellular Brucella species exhibited differences in phylogenetic relationships, horizontal gene transfer and distribution patterns of mobile genetic elements, virulence factor genes, and antibiotic resistance genes, showing the close relationship between the genetic variations and prediction of phenotypic traits of different Brucella species. Furthermore, we found significant differences in horizontal gene transfer and the distribution patterns of mobile genetic elements, virulence factor genes, and antibiotic resistance genes between the two chromosomes of Brucella, indicating that the two chromosomes had distinct dynamics and plasticity and played different roles in the survival and evolution of Brucella. These findings provide new directions for exploring the genetic evolutionary characteristics of the Brucella genus and could offer new clues to elucidate the factors influencing the phenotypic diversity of the Brucella genus.
Facebook
TwitterLaser capture microdissection (LCM) coupled with RNA-seq is a powerful tool to identify genes that are differentially expressed in specific histological tumor subtypes. To better understand the role of single tumor cell populations in the complex heterogeneity of glioblastoma, we paired microdissection and NGS technology to study intra-tumoral differences into specific histological regions and cells of human GBM FFPE tumors. We here isolated astrocytes, neurons and endothelial cells in 6 different histological contexts: tumor core astrocytes, pseudopalisading astrocytes, perineuronal astrocytes in satellitosis, neurons with satellitosis, tumor blood vessels, and normal blood vessels. A customized protocol was developed for RNA amplification, library construction, and whole transcriptome analysis of each single portion. We first validated our protocol comparing the obtained RNA expression pattern with the gene expression levels of RNA-seq raw data experiments from the BioProject NCBI database, using Spearman's correlation coefficients calculation. We found a good concordance for pseudopalisading and tumor core astrocytes compartments (0.5 Spearman correlation) and a high concordance for perineuronal astrocytes, neurons, normal, and tumor endothelial cells compartments (0.7 Spearman correlation). Then, Principal Component Analysis and differential expression analysis were employed to find differences between tumor compartments and control tissue and between same cell types into distinct tumor contexts. Data consistent with the literature emerged, in which multiple therapeutic targets significant for glioblastoma (such as Integrins, Extracellular Matrix, transmembrane transport, and metabolic processes) play a fundamental role in the disease progression. Moreover, specific cellular processes have been associated with certain cellular subtypes within the tumor. Our results are promising and suggest a compelling method for studying glioblastoma heterogeneity in FFPE samples and its application in both prospective and retrospective studies.
Facebook
TwitterBackgroundSpontaneous preterm birth (sPTB) is a global disease that is a leading cause of death in neonates and children younger than 5 years of age. However, the etiology of sPTB remains poorly understood. Recent evidence has shown a strong association between metabolic disorders and sPTB. To determine the metabolic alterations in sPTB patients, we used various bioinformatics methods to analyze the abnormal changes in metabolic pathways in the preterm placenta via existing datasets.MethodsIn this study, we integrated two datasets (GSE203507 and GSE174415) from the NCBI GEO database for the following analysis. We utilized the “Deseq2” R package and WGCNA for differentially expressed genes (DEGs) analysis; the identified DEGs were subsequently compared with metabolism-related genes. To identify the altered metabolism-related pathways and hub genes in sPTB patients, we performed multiple functional enrichment analysis and applied three machine learning algorithms, LASSO, SVM-RFE, and RF, with the hub genes that were verified by immunohistochemistry. Additionally, we conducted single-sample gene set enrichment analysis to assess immune infiltration in the placenta.ResultsWe identified 228 sPTB-related DEGs that were enriched in pathways such as arachidonic acid and glutathione metabolism. A total of 3 metabolism-related hub genes, namely, ANPEP, CKMT1B, and PLA2G4A, were identified and validated in external datasets and experiments. A nomogram model was developed and evaluated with 3 hub genes; the model could reliably distinguish sPTB patients and term labor patients with an area under the curve (AUC) > 0.75 for both the training and validation sets. Immune infiltration analysis revealed immune dysregulation in sPTB patients.ConclusionThree potential hub genes that influence the occurrence of sPTB through shadow participation in placental metabolism were identified; these results provide a new perspective for the development and targeting of treatments for sPTB.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The human pathogen Acinetobacter baumannii has emerged as a frequent cause of hospital-acquired infections, but infection of animals has rarely been observed. Here we analyzed an outbreak of epidemic pneumonia killing hundreds of sheep on a farm in Pakistan and identified A. baumannii as the infecting agent. A pure culture of strain AbPK1 isolated from lungs of sick animals was inoculated into healthy sheep, which subsequently developed similar disease symptoms. Bacteria re-isolated from the infected animals were shown to be identical to the inoculum, fulfilling Koch’s postulates. Comparison of the AbPK1 genome against 2283 A. baumannii genomes from the NCBI database revealed that AbPK1 carries genes for unusual surface structures, including a unique composition of iron acquisition genes, genes for O-antigen synthesis and sialic acid-specific acetylases of cell-surface carbohydrates that could enable immune evasion. Several of these unusual and otherwise rarely present genes were also identified in genomes of phylogenetically unrelated A. baumannii isolates from combat-wounded US military from Afghanistan indicating a common gene pool in this geographical region. Based on core genome MLST this virulent isolate represents a newly emerging lineage of Global Clone 2, suggesting a human source for this disease outbreak. The observed epidemic, direct transmission from sheep to sheep, which is highly unusual for A. baumannii, has important consequences for human and animal health. First, direct animal-to-animal transmission facilitates fast spread of pathogen and disease in the flock. Second, it may establish a stable ecological niche and subsequent spread in a new host. And third, it constitutes a serious risk of transmission of this hyper-virulent clone from sheep back to humans, which may result in emergence of contagious disease amongst humans.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Oliveria decumbens Vent. is a wild, rare, annual medicinal plant and endemic plant of Iran that has metabolites (mostly terpenes) which make it a precious plant in Persian Traditional Medicine and also a potential chemotherapeutic agent. The lack of genetic resources has slowed the discovery of genes involved in the terpenes biosynthesis pathway. It is a wild relative of Daucus carota. In this research, we performed the transcriptomic differences between two samples, flower and root of Oliveria decumbens, and also analyze the expression value of the genes involved in terpenoid biosynthesis by RNA-seq and its essential oil’s phytochemicals analyzed by GC/MS. In total, 136,031,188 reads from two samples of flower and root have been produced. The result shows that the MEP pathway is mostly active in the flower and the MVA in the root. Three genes of GPP, FPPS, and GGPP that are the precursors in the synthesis of mono, di, and triterpenes are upregulated in root and 23 key genes were identified that are involved in the biosynthesis of terpenes. Three genes had the highest upregulation in the root including, and on the other hand, another three genes had the expression only in the flower. Meanwhile, 191 and 185 upregulated genes in the flower and root of the plant, respectively, were selected for the gene ontology analysis and reconstruction of co-expression networks. The current research is the first of its kind on Oliveria decumbens transcriptome and discussed 67 genes that have been deposited into the NCBI database. Collectively, the information obtained in this study unveils the new insights into characterizing the genetic blueprint of Oliveria decumbens Vent. which paved the way for medical/plant biotechnology and the pharmaceutical industry in the future.
Facebook
TwitterBackgroundBladder cancer continues to pose a substantial global health challenge, marked by a high mortality rate despite advances in treatment options. Therefore, in-depth understanding of molecular mechanisms related to disease onset, progression, and patient survival is of utmost importance in bladder cancer research. Here, we aimed to investigate the underlying mechanisms using a stringent differential expression and survival analyses-based pipeline.MethodsGene and miRNA expression data from TCGA and NCBI GEO databases were analyzed. Differentially expressed genes between normal vs tumor, among tumor aggressiveness groups and between early vs advanced stage were identified using Student's t-test and ANOVA. Kaplan-Meier survival analyses were conducted using R. Functional annotation, miRNA target and transcription factor prediction, network construction, random walk analysis and gene set enrichment analyses were performed using DAVID, miRDIP, TransmiR, Cytoscape, Java and GSEA respectively.ResultsWe identified elevated endoplasmic reticulum (ER) stress response as key culprit, as an eight-gene unfolded protein response (UPR)-related gene signature (UPR-GS) drives aggressive disease and poor survival in bladder cancer patients. This elevated UPR-GS is linked to the downregulation of two miRNAs from the miR-29 family (miR-29b-2-5p and miR-29c-5p), which can limit UPR-driven tumor aggressiveness and improve patient survival. At further upstream, the inflammation-related NFKB transcription factor inhibits miR-29b/c expression, driving UPR-related tumor progression and determining poor survival in bladder cancer patients.ConclusionThese findings highlight that the aberrantly activated UPR, regulated by the NFKB-miR-29b/c axis, plays a crucial role in tumor aggressiveness and disease progression in bladder cancer, highlighting potential targets for therapeutic interventions and prognostic markers in bladder cancer management.
Facebook
TwitterThe GenBank non-redundant protein sequence database (NRDB) is a component of the NCBI BLAST databases and contains entries from GenPept, Swissprot, PIR, PDF, PDB and NCBI RefSeq.