Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DNA sequences used to identify fungi cultured from human faeces.The ITS1‑5.8s‑ITS2 region of the extracted rDNA of fungal isolates was chosen to be amplified based on its success in identifying a wide range of fungal species [53]. For DNA amplification, 10.0 mL of REDExtract-N-Amp™ PCR Ready Mix; 7.8 mL of PCR-grade H2O; 0.8 mL of 10 mM forward primer (ITS1, sequence TCCGTAGGTGAACCTGCGG); 0.8 mL of 10 mM reverse primer (ITS4, sequence TCCTCCGCTTATTGATATGC); and 1.0 mL of extracted fungal DNA sample were added to a 200 mL Eppendorf PCR tube. The same method was used to prepare the negative control. PCR amplification was performed with a preliminary step of polymerase activation at 94 oC for 2 minutes; 35 cycles of denaturation at 94 oC for 30 seconds, annealing at 51 oC for 20 seconds, and extension at 77 oC for 1 minute; and a final extension step at 72 oC for 8 minutes, using the Eppendorf Vapo. Protect ™ Mastercycler® Pro S.
To confirm a successful fungal DNA extraction and amplification, 4 mL of the amplified fungal rDNA product of the PCR reaction was loaded onto a 1 % (w/v) agarose gel in a 1x Tris/Borate/EDTA (TBE) buffer, and 1 mL cyanide dye SYBR® DNA gel stain was added for visualisation purposes. One kilobase (1kb) plus DNA ladder (5 mL) and 5 mL of the negative control were also loaded onto the agarose gel. Following the completion of gel electrophoresis, PCR products were visualised with the GelDocTM XR Plus System (BIO‑RAD, USA). The 1kb plus DNA ladder was used to determine the size of the amplified fungal DNA fragments using the Gelanalyzer 2010a quantification programme. The fungal rDNA fragments of the ITS1‑5.8s‑ITS2 region obtained from PCR were then transferred to the Centre of Genomics, Proteomics and Metabolomics DNA sequencing facility for sequencing.
Capillary Electrophoresis DNA Sequencing (Sanger Sequencing) was used to obtain the DNA sequences of the amplified ITS1‑5.8s‑ITS2 region. Each sample containing fungal DNA template had two reactions performed, one for each primer and were mixed with the ABI PRISMTM BIG DYE Terminator Sequencing Kit version 3.1 (ThermoFisher Scientific) containing DNA polymerase enzyme, a buffer, four DNA nucleotides and four chain-terminating dideoxy nucleotides with fluorescent dyes. The samples were then subjected to cycle sequencing on the thermal cycler Applied Biosystems GeneAmp® PCR System 9700 using standard cycling conditions: a preliminary step of polymerase activation at 96 oC for 1 minute; 25 cycles of denaturation at 96 oC for 10 seconds, annealing at 50 oC for 5 seconds, and extension at 60 oC for 4 minutes. Following the cycle sequencing, the samples were purified using Agencourt® CleanSEQ® magnetic beads in order to remove the excess fluorescent dyes, nucleotides, salts and other contaminants. The remaining purified DNA samples were then separated by size by capillary electrophoresis with the ABI PRISMTM 3130XL Genetic Analyzer using 50 cm capillaries and POP7 polymer. The final data output of the ITS‑5.8s‑ITS2 region DNA sequences was based on the detection of the attached fluorescent dyes excited by a laser.
Geneious programme version 11.1.5 (www.geneious.com) was used to analyse the raw data [54]. The data included both forward and reverse rDNA sequences for each fungal isolate. These sequences were aligned and ends showing poor quality reads were trimmed, to obtain a consensus sequence. A tool within the Geneious programme, BLAST (Basic Local Alignment Search Tool) developed by Altschul et al. [55], optimised for fast and high similarity search (MegaBLAST version), was used to compare the consensus query sequence with known DNA sequences in GenBank (NCBI genetic sequence database), EMBL (European Molecular Biology Laboratory), DDBJ (DNA DataBank of Japan) and PDB (Protein Data Bank, Worldwide). The search results included: grade percentage score showing combinatorial results of the query input sequence coverage, expectation-value (e-value) and identity value for each hit against the database; identities match and percentage score indicating the extent to which the query DNA sequence matched the database nucleotide sequence; and bit-score showing the quality of alignment and measuring sequence similarity [56]. The higher the score of each result, the higher the certainty of identification of the fungal species. Grade percentage score of >98 % was considered as correct genomic identification.
Facebook
TwitterA relational database with dynamic querying and data integration that can be used by researchers to identify genetic sequences with a high probability of being associated with aflatoxin accumulation resistance, according to multiple lines of evidence. CFRAS-DB integrates genomic, proteomic, and genetic data from multiple studies in maize dealing with aflatoxin accumulation or Aspergillus flavus resistance., THIS RESOURCE IS NO LONGER IN SERVICE. Documented on September 16,2025.
Facebook
TwitterFungal genomes available from the Sanger Institute. Data are accessible in a number of ways; for each organism there is a BLAST server, allowing search of the sequences. Sequences can also be down-loaded directly by FTP. In addition, for those organisms being sequenced using a cosmid approach, finished and annotated cosmids are submitted to EMBL and other public databases.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
UNITE is a rDNA sequence database designed to provide a stable and reliable platform for sequence-borne identification of all fungal species. UNITE provides a unified way for delimiting, identifying, communicating, and working with DNA-based Species Hypotheses (SH). All fungal ITS sequences in the International Nucleotide Sequence Databases (INSD: GenBank, ENA, DDBJ) are clustered to approximately the species level by applying a set of dynamic distance values (0.5 - 3.0%). All species hypotheses are given a unique, stable name in the form of a DOI, and their taxonomic and ecological annotations are verified through distributed, web-based third-party annotation efforts. SHs are connected to a taxon name and its classification as far as possible (phylum, class, order, etc.) by taking into account identifications for all sequences in the SH. An automatically or manually designated sequence is chosen to represent each such SH. These sequences are released (https://unite.ut.ee/repository.php) for use by the scientific community in, for example, local sequence similarity searches and next-generation sequencing analysis pipelines. The system and the data are updated automatically as the number of public fungal ITS sequences grows.
Facebook
Twitterhttps://www.rioxx.net/licenses/all-rights-reservedhttps://www.rioxx.net/licenses/all-rights-reserved
Expressed sequence tags (ESTs) have been obtained from eighteen species of plant pathogenic fungi, two species of phytopathogenic oomycete and three species of saprophytic fungi. Hierarchical clustering software was used to classify together ESTs representing the same gene and produce a single contig, or consensus sequence. The unisequence set for each pathogen therefore represents a set of unique gene sequences, each one consisting of either a single EST or a contig sequence made from a group of ESTs. Unisequences were annotated based on top hits against the NCBI non-redundant protein database using blastx.
Facebook
TwitterUNITE is a fungal rDNA internal transcribed spacer (ITS) sequence database. It focuses on high-quality ITS sequences generated from fruiting bodies collected and identified by experts and deposited in public herbaria. Entries may be supplemented with metadata on describing locality, habitat, soil, climate, and interacting taxa.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
• Incompleteness of reference sequence databases and unresolved taxonomic relationships complicates taxonomic placement of fungal sequences. We developed PROTAX-fungi, a general tool for taxonomic placement of fungal ITS sequences, and implemented it into the PlutoF platform of the UNITE database for molecular identification of fungi. • PROTAX-fungi outperformed the SINTAX and RDB classifiers in terms of increased accuracy and decreased calibration error when applied to data on mock communities representing species groups with poor sequence database coverage. • With empirical data on root- and wood-associated fungi, PROTAX-fungi identified reliably (with at least 90% identification probability) the majority of sequences to the order level but only ca. one fifth of them to the species level, reflecting the current limited coverage of the databases. • When applied to examine the internal consistencies of the Index Fungorum and UNITE databases, PROTAX-fungi revealed inconsistencies in the taxonomy database as well as mislabelling and sequence quality problems in the reference database. The according improvements were implemented in both databases. • PROTAX-fungi provides a robust tool for performing statistically reliable identifications of fungi in spite of the incompleteness of extant reference sequence databases and unresolved taxonomic relationships.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 18S ribosomal RNA targeted loci project is a RefSeq curated data set sourced from INSDC records. At a minimum the sequence contains most of the variable V4 region and part of the V5 region and each record contain a collection identifier (predominantly type material) from a public collection. The presence of the 18S signature has been verified by the ribovore pipeline (https://github.com/nawrockie/ribovore) using hidden Markov and covariance models. Other verification steps for example checking for vector sequences, too many ambiguous nucleotides, and misassembled sequences are also included. SSU RefSeq accessions (NG_ ) include sequences mostly obtained from type specimens and a few from reference specimens. Type and reference identifiers are curated by NCBI Taxonomy. The collection source of type material is indicated in each record and collection acronyms follows the collection codes maintained at https://www.ncbi.nlm.nih.gov/biocollections/. All sequences will have the same project ID and can be found as such. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA39195.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this study we demonstrate the utility of whole genome shotgun (WGS) metagenomics in study organisms with small genomes to improve upon amplicon-based estimates of biodiversity and microbial diversity in environmental samples for the purpose of understanding ecological and evolutionary processes. We generated a database of full-length and near-full-length ribosomal DNA sequence complexes from 273 lichenized fungal species and used this database to facilitate fungal species identification in the southern Appalachian Mountains using low coverage WGS at higher resolution and without the biases of amplicon-based approaches. Using this new database and methods herein developed, we detected between 2.8 and 11 times as many species from lichen fungal propagules by aligning reads from WGS-sequenced environmental samples compared to a traditional amplicon-based approach. We then conducted complete taxonomic diversity inventories of the lichens in each one-hectare plot to assess overlap between standing taxonomic diversity and diversity detected based on propagules present in environmental samples (i.e., the “potential” of diversity). From the environmental samples, we detected 94 species not observed in organism-level sampling in these ecosystems with high confidence using both WGS and amplicon-based methods. This study highlights the utility of WGS sequence-based approaches in detecting hidden species diversity and demonstrates that amplicon-based methods likely miss important components of fungal diversity. We suggest that the adoption of this method will not only improve understanding of biotic constraints on the distributions of biodiversity but will also help to inform important environmental policy.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data set consists of Illumina sequences derived from 48 sediment samples, collected in 2015 from Lake Michigan and Lake Superior for the purpose of inventorying the fungal diversity in these two lakes. DNA was extracted from ca. 0.5g of sediment using the MoBio PowerSoil DNA isolation kits following the Earth Microbiome protocol. PCR was completed with the fungal primers ITS1F and fITS7 using the Fluidigm Access Array. The resulting amplicons were sequenced using the Illumina Hi-Seq2500 platform with rapid 2 x 250nt paired-end reads. The enclosed data sets contain the forward read files for both primers, both fixed-header index files, and the associated map files needed to be processed in QIIME. In addition, enclosed are two rarefied OTU files used to evaluate fungal diversity. All decimal latitude and decimal longitude coordinates of our collecting sites are also included. File descriptions: Great_lakes_Map_coordinates.xlsx = coordinates of sample sites QIIME Processing ITS1 region: These are the raw files used to process the ITS1 Illumina reads in QIIME. ***only forward reads were processed GL_ITS1_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS1F_Miller_Fludigm_I1_fixedheader.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS1F_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS1 region. QIIME Processing ITS2 region: These are the raw files used to process the ITS2 Illumina reads in QIIME. ***only forward reads were processed GL_ITS2_HW_mapFile_meta.txt = This is the map file used in QIIME. ITS7_Miller_Fludigm_I1_Fixedheaders.fastq = Index file from Illumina. Headers were fixed to match the forward reads (R1) file in order to process in QIIME ITS7_Miller_Fludigm_R1.fastq = Forward Illumina reads for the ITS2 region. Resulting OTU Table and OTU table with taxonomy ITS1 Region wahl_ITS1_R1_otu_table.csv = File contains Representative OTUs based on ITS1 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS1_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS1 region for all the R1 and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev ITS2 Region wahl_ITS2_R1_otu_table.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample. wahl_ITS2_R1_otu_table_w_tax.csv = File contains Representative OTUs based on ITS2 region for all the R1 data and the number of each OTU found in each sample along with taxonomic determination based on the following database: sh_taxonomy_qiime_ver7_97_s_31.01.2016_dev Rarified illumina dataset for each ITS Region ITS1_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS1 region. ITS2_R1_nosing_rare_5000.csv = Environmental parameters and rarefied OTU dataset for ITS2 region. Column headings: #SampleID = code including researcher initials and sequential run number BarcodeSequence = LinkerPrimerSequence = two sequences used CTTGGTCATTTAGAGGAAGTAA or GTGARTCATCGAATCTTTG ReversePrimer = two sequences used GCTGCGTTCTTCATCGATGC or TCCTCCGCTTATTGATATGC run_prefix = initials of run operator Sample = location code, see thesis figures 1 and 2 for mapped locations and Great_lakes_Map_coordinates.xlsx for exact coordinates. DepthGroup = S= shallow (50-100 m), MS=mid-shallow (101-150 m), MD=mid-deep (151-200 m), and D=deep (>200 m)" Depth_Meters = Depth in meters Lake = lake name, Michigan or Superior Nitrogen % Carbon % Date = mm/dd/yyyy pH = acidity, potential of Hydrogen (pH) scale SampleDescription = Sample or control X = sequential run number OTU ID = Operational taxonomic unit ID
Facebook
TwitterA fungal rDNA internal transcribed spacer (ITS) sequence database (although additional genes and genetic markers are also welcome) to facilitate identification of environmental samples of fungal DNA. Additional important features include user annotation of INSD sequences to add metadata on, e.g., locality, habitat, soil, climate, and interacting taxa. The user can furthermore annotate INSD sequences with additional species identifications that will appear in the results of any analyses done. UNITE focuses on high-quality ITS sequences generated from fruiting bodies collected and identified by experts and deposited in public herbaria. In addition, it also holds all fungal ITS sequences in the International Nucleotide Sequence Databases (INSD: NCBI, EMBL, DDBJ). Both sets of sequences may be used in any analyses carried out. UNITE is accompanied by a project management system called PlutoF, where users can store field data, document the sequencing lab procedures, manage sequences, and make analyses. PlutoF intends to make it possible for taxonomists, ecologists, and biogeographers to use a common platform for data storage, handling, and analyses, with the intent of facilitating an integration of these disciplines. A user can have an unlimited number of projects but still make analyses across any project data available to him.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Three versions of the database are provided:
MycoMobilome_v1.0-allConsensus_TE_library.fasta: All known and unknown TE consensus sequences detected across fungal diversity. Most useful for most use cases.MycoMobilome_v1.0-proteinEvidence_TE_library.fasta: All TE consensus sequences with ORF hits to known TE proteins. Note the evidence markers in sequence headers and that this subset will not contain any non-autonomous TEs (i.e. SINEs, MITEs, solo LTRs, etc).MycoMobilome_v1.0-unknown_TE_library.fasta: All TE consensus sequences with NO protein evidence supporting their status as true TEs. These have the potential to be real given little existing knowledge of TE diversity across the kingdom. Many of these are likely non-autonomous elements, such as MITEs (non-autonomous DNA elements), solo LTRs, and SINEs, which will NOT be found in the proteinEvidence subset. However, some sequences are also likely to be erroneous, so use carefully.In addition to these three database files, the following files are also provided:
MycoMobilome_v1.0_assemblyRecord.xlsx: A record of all publicly available genome assemblies used to generate MycoMobilome. Here, you will find information on assembly length, N50, L50, GC content, species phylogenetic information, genome assembly source and ID, publication, and BUSCO scores.MycoMobilome-hitsToKnownTransposonProteins-repetPfam35.txt: A TAB-separated file showing hmmscan hits for each MycoMobilome consensus sequence open reading frame to TE domains from the REPET Pfam 35.0 and Gypsy DB curated TE domain dataset. Here, qseqid ends with _n, where n is the ORF number. The query sequence to match to MycoMobilome sequence headers can be found in the column named qseqid_noFrame.MycoMobilome-hitsToKnownTransposonProteins-rmRepeatPeps.txt: A TAB-separated file showing BLASTp hits for each MycoMobilome consensus sequence open reading frame to TE domains from the RepeatMasker RepeatPeps.lib file supplied with RepeatMasker v4.1.9.MycoMobilome_v1.0 in the file MycoMobilome_v1.0_assemblyRecord.xlsx.Each genome was used to generate putative TE consensus sequences using earlGreyLibConstruct in Earl Grey (v4.4.0)[1], configured with Dfam curated elements (v3.7)[2], using default settings. All putative consensus sequences were combined into a single FASTA file containing 773,843 entries. A non-redundant TE library was constructed using a scalable cascaded clustering approach using MMseqs2[3] easy-cluster with --min-seq-id 0.8 -c 0.8 --cov-mode 1 --cluster-reassign, resulting in 354,315 non-redundant sequences. Representative sequences for each cluster were extracted and labelled with the species name from which the representative originated.
Open reading frames (ORFs) were detected in all six frames of each consensus sequence using transeq in EMBOSS (v6.6.0)[4] with -clean -frame 6. Matches to known host proteins were identified using the Fungi RefSeq[5] database (Release 228) and Diamond BLASTp[6] with --sensitive --matrix BLOSUM62 --evalue 1e-3. Potential hits were combined for each query sequence. Sequences with hits to RefSeq, and either no hits to known TE protein domains, or partial hits to known TE protein domains that do not overlap with RefSeq hit coordinates, were labelled as potential host genes and removed from the MycoMobilome dataset. Any hits to proteins labelled as uncharacterized|hypothetical|low quality|predicted protein were kept due to the potential to be TE-derived.
Matches to known TE proteins were identified using two complementary approaches: (i) Using HMMscan in HMMER (v3.4)[7] to detect homology to known TE protein domains curated by the REPET group. Matches were identified using hmmscan -E 10 --noali. Hits were filtered to retain those where fseq_evalue <=0.001 and fseq_bitscore >= 50. Hits were retained as potential TEs unless the query also matched RefSeq proteins, in which case they were removed to avoid including host genes or chimeric TE–host gene models.
(ii) Using BLASTp to detect homology to known TE protein domains supplied with RepeatMasker (v4.1.5) RepeatPeps.lib.(repeatmasker.org). Matches were identified using blastp -evalue 1e-3. Nested hits were removed to retain the highest quality protein hit for each query, followed by combining adjacent and overlapping hits. Hits were retained as potential TEs unless non-overlapping hits to the same query were also found in the RefSeq hits set, in which case these were removed due to the potential that these hits could be host genes, or chimeric TE-host gene models.
A total of 24,571 consensus sequences were identified as putative host genes and removed from the database, resulting in a potential TE consensus set containing 329,744 sequences. This set was further filtered to remove all putative TE consensus sequences <120bp in length, as these are likely to be poor quality and incomplete. In addition, the base composition of each consensus was calculated using seqtk comp (https://github.com/lh3/seqtk) and all sequences with an N content >=5% were removed due to being poor quality, reducing the final MycoMobilome library to 276,641 sequences.
For each consensus sequence, if there are hits to known TE protein domains, the sequences were labelled as "supported". Following this, the identity of each protein domain hit was evaluated to determine whether the consensus sequence classification is supported by protein hits from the REPET profiles bank or RepeatMasker RepeatPeps. If the identified domains support the consensus classification, the consensus sequence is labelled with _PE for protein evidence. If the identified domains conflict with the consensus classification, the consensus sequence is labelled with _DA for disagreement. If there are no identified domains, the consensus sequence is labelled with _NE for no evidence. The appropriate domains for each classification are defined in the table below:
| High level TE classification | Appropriate Domain Hits from REPET | RepeatMasker RepeatPeps |
|---|---|---|
| DNA | Tase,Tase*,DDE,HTH,[ATP,INT,AP for crypton,maverick] | DNA |
| RC | HEL,EN,RPA | RC |
| LTR | RT,INT,RH,GAG,AP,VirusRelated,LTRrelated,Caulimovirus,ClassIrelated,ENV | LTR |
| LINE | RT,EN,RH,GAG,ClassIrelated,LINErelated | LINE |
| PLE | RT,EN,ClassIrelated | PLE |
| Retroposon | RT,INT,RH,GAG,AP,VirusRelated,LTRrelated,Caulimovirus,ClassIrelated,ENV,EN,LINErelated | Retroposon |
Sequences are named with the convention MycMob1.0_family-[n]-[six digit species code]_[protein evidence]#[high level classification]/[sub level classification] @[genus species]. Protein hits to known TE proteins are provided with MycoMobilome to support further investigation in specific use cases. No changes were made to classifications assigned during automated curation, therefore this database should be treated as uncurated and caution should be used to check important or interesting TE loci on a case-by-case basis. Please note that all nonautonomous elements will have the label _NE as they do not contain any intact protein domains. This does not mean they are not real TEs. As such, for most use cases we suggest using the complete MycoMobilome v1.0 dataset, unless you are specifically interested in autonomous TEs only.
Bibliography
Baril T, Galbraith J, Hayward A. Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline. Molecular Biology and Evolution. 2024 Apr;41(4):msae068.
Hubley R, Finn RD, Clements J, Eddy SR, Jones TA, Bao W, Smit AF, Wheeler TJ. The Dfam database of repetitive DNA families. Nucleic acids research. 2016 Jan 4;44(D1):D81-9.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Markers not separated by semi-colons are located on the same linkage group.The ∼ symbol indicates the presence of the allele in the genome at an unknown location and/or copy number.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In a recent manuscript, we report a draft genome of the ascomycotal fungal species Pseudopithomyces maydicus (isolate name SBW1) obtained using a culture isolate from brewery wastewater. From a 22 contig assembly, we predict 13502 protein coding gene models, of which 4389 (32.5%) were annotated to KEGG Orthology and identify 39 biosynthetic gene clusters. Here we provide supplementary data from our analysis:
Supplementary Figure 1 Sequence alignment between Sanger-sequenced partial 28S LSU-rRNA sequence and the top ranked BLASTN hit from NCBI nr/nt database.
Supplementary Figure 2 Pairs plot for contig GC-content, contig coverage and contig length from the P. maydicus assembly.
Supplementary Data File 1 Table listing properties of contigs from the P. maydicus assembly.
Supplementary Data File 2 Summary of taxonomic classification analysis of recovered 18S SSU-rRNA sequences to the SILVA 138 database.
Supplementary Data File 3 Alignment of Sanger-sequenced partial 28S LSU-rRNA sequence against three 28S LSU-rRNA gene sequences recovered from the P. maydicus long read genome assembly and a set of 62 28S LSU-rRNA sequences from members of genus Psuedopithomyces (NCBI Nucleotide searched for “Pseudopithomyces AND 28S" on 30th May 2022).
Supplementary Data File 4 MASH similarity statistics obtained by comparing the P. maydicus long read genome assembly sequence to 9563 fungal genomes obtained from NCBI. The reference genomes from NCBI were downloaded using the NCBI ‘dataset’ (version 13.6.0) command line tool (datasets_13.6.0 download genome taxon 4751 --filename fungi.zip --assembly-level complete_genome,chromosome,scaffold,contig --exclude-gff3 --exclude-protein --exclude-rna).
Supplementary Data File 5 BlastKOALA annotation data for all proteins predicted from P. maydicus long read assembly.
Supplementary Results Complete output from the antiSMASH6 analysis of the P. maydicus long read assembly.
Facebook
TwitterThe MIPS Ustilago maydis Genome Database aims to present information on the molecular structure and functional network of the entirely sequenced, filamentous fungus Ustilago maydis. The underlying sequence is the initial release of the high quality draft sequence of the Broad Institute. The goal of the MIPS database is to provide a comprehensive genome database in the Genome Research Environment in parallel with other fungal genomes to enable in depth fungal comparative analysis. The specific aims are to: 1. Generate and assemble Whole Genome Shotgun sequence reads yielding 10X coverage of the U. maydis genome 2. Integrate the genomic sequence assembly with physical maps generated by Bayer CropScience 3. Perform automated annotation of the sequence assembly 4. Align the strain 521 assembly with the FB1 assembly provided by Exelixis 5. Release the sequence assembly and results of our annotation and analysis to public Ustilago maydis is a basidiomycete fungal pathogen of maize and teosinte. The genome size is approximately 20 Mb. The fungus induces tumors on host plants and forms masses of diploid teliospores. These spores germinate and form haploid meiotic products that can be propagated in culture as yeast-like cells. Haploid strains of opposite mating type fuse and form a filamentous, dikaryotic cell type that invades plant tissue to reinitiate infection. Ustilago maydis is an important model system for studying pathogen-host interactions and has been studied for more than 100 years by plant pathologists. Molecular genetic research with U. maydis focuses on recombination, the role of mating in pathogenesis, and signaling pathways that influence virulence. Recently, the fungus has emerged as an excellent experimental model for the molecular genetic analysis of phytopathogenesis, particularly in the characterization of infection-specific morphogenesis in response to signals from host plants. Ustilago maydis also serves as an important model for other basidiomycete plant pathogens that are more difficult to work with in the laboratory, such as the rust and bunt fungi. Genomic sequence of U. maydis will also be valuable for comparative analysis of other fungal genomes, especially with respect to understanding the host range of fungal phytopathogens. The analysis of U. maydis would provide a framework for studying the hundreds of other Ustilago species that attack important crops, such as barley, wheat, sorghum, and sugarcane. Comparisons would also be possible with other basidiomycete fungi, such as the important human pathogen C. neoformans. Commercially, U. maydis is an excellent model for the discovery of antifungal drugs. In addition, maize tumors caused by U. maydis are prized in Hispanic cuisine and there is interest in improving commercial production. The complete putative gene set of the Broad Institute''s second release is loaded into the database and in addition all deviating putative genes from a putative gene set produced by MIPS with different gene prediction parameters are also loaded. The complete dataset will then be analysed, gene predictions will be manually corrected due to combined information derived from different gene prediction algorithms and, more important, protein and EST comparisons. Gene prediction will be restricted to ORFs larger than 50 codons; smaller ORFs will be included only if similarities to other proteins or EST matches confirm their existence or if a coding region was postulated by all prediction programs used. The resulting proteins will be annotated. They will be classified according to the MIPS classification catalogue receiving appropriate descriptions. All proteins with a known, characterized homolog will be automatically assigned to functional categories using the MIPS functional catalog. All extracted proteins are in addition automatically analysed and annotated by the PEDANT suite.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data were collected from fungal sequences in the NCBI IPG and the UniParc databases, and from numerous curated published fungal genomes.fungal_seqs.tsv.gz contains mappings of the non-redundant sequence names to identifiers in original databases.fungal_seqs_curated_genomes.tsv.gz contains details of which sequences came from the curated genomes.fungal_seqs_uniparc_signatures.tsv.gz gives details about InterPro terms present for sequences in the UniParc dataset.fungal_seqs_uniparc_xrefs.tsv.gz maps ids from uniparc to references in other databases.fungal_seqs_ipg.tsv gives details about sequences taken from the IPG database, mapping to other database identifiers.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This version 2 dataset contains 34 files in total with one (1) additional file, called "Culture-dependent Isolate table with taxonomic determination and sequence data.csv". The remaining files (33) are identical to version 1. The following is the information about the new file and its variables: Culture-dependent Isolate table with taxonomic determination and sequence data.csv: Culture table with assigned taxonomy from NCBI. Single direction sequence for each isolate is include if one could be obtained. Sequence is derived from ITS1F-ITS4 PCR amplicons, with Sanger sequencing in one direction using ITS5. The files contains 20 variables with explanation as below: IsolateNumber : unique number identify each isolate cultured Time: season in which the sample was collected Location: the specific name of the location Habitat: type of habitat : either stream or peatland State: state in the USA in which the specific location is located Incubation_pH ID: pH of the medium during isolation of fungal cultures Genus: phylogenetic genus of the fungal isolates (determined by sequence similarity) Sequence_quality: base call quality of the entire sequence used for blast analysis, if known %_coverage: sequence coverage reported from GenBank %_ID: sequence similarity reported from GenBank Life_style : ecological life style if known Phylum: phylogenetic phylum as indicated by Index Fungorum Subphylum: phylogenetic subphylum as indicated by Index Fungorum Class: phylogenetic class as indicated by Index Fungorum Subclass: phylogenetic subclass as indicated by Index Fungorum Order: phylogenetic order as indicated by Index Fungorum Family: phylogenetic Family as indicated by Index Fungorum ITS5_Sequence: single direction sequence used for sequence similarity match using blastn. Primer ITS5 Fasta: sequence with nomenclature in a fasta format for easy cut and paste into phylogenetic software Note: blank cells mean no data is available or unknown.
Facebook
TwitterAttribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Draft genome sequences of Fungi isolated from Mars 2020 Spacecraft assembly facility are reported. The fungal strains were isolated from samples collected from cleanroom surfaces of Kennedy Space Center-Payload Hazardous Servicing Facility and Jet Propulsion Laboratory-Spacecraft Assembly Facility. Whole genome sequencing (WGS) of these isolates was carried out.
Facebook
TwitterProduces and analyzes sequence data from fungal organisms that are important to medicine, agriculture and industry. The FGI is a partnership between the Broad Institute and the wider fungal research community, with the selection of target genomes governed by a steering committee of fungal scientists. Organisms are selected for sequencing as part of a cohesive strategy that considers the value of data from each organism, given their role in basic research, health, agriculture and industry, as well as their value in comparative genomics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 28S ribosomal RNA targeted loci project is a RefSeq curated data set sourced from INSDC records. At a minimum the sequence contains the hyper variable D1/D2 region and each record contain a collection identifier (predominantly type material) from a public collection. The presence of the 28S signature has been verified by the ribovore pipeline (https://github.com/nawrockie/ribovore) using hidden Markov and covariance models. Other verification steps for example checking for vector sequences, too many ambiguous nucleotides, and misassembled sequences are also included. LSU RefSeq accessions (NG_ ) include sequences mostly obtained from type specimens and a few from reference specimens. Type and reference identifiers are curated by NCBI Taxonomy. The collection source of type material is indicated in each record and collection acronyms follows the collection codes maintained at https://www.ncbi.nlm.nih.gov/biocollections/. All sequences will have the same project ID and can be found as such. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA51803.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DNA sequences used to identify fungi cultured from human faeces.The ITS1‑5.8s‑ITS2 region of the extracted rDNA of fungal isolates was chosen to be amplified based on its success in identifying a wide range of fungal species [53]. For DNA amplification, 10.0 mL of REDExtract-N-Amp™ PCR Ready Mix; 7.8 mL of PCR-grade H2O; 0.8 mL of 10 mM forward primer (ITS1, sequence TCCGTAGGTGAACCTGCGG); 0.8 mL of 10 mM reverse primer (ITS4, sequence TCCTCCGCTTATTGATATGC); and 1.0 mL of extracted fungal DNA sample were added to a 200 mL Eppendorf PCR tube. The same method was used to prepare the negative control. PCR amplification was performed with a preliminary step of polymerase activation at 94 oC for 2 minutes; 35 cycles of denaturation at 94 oC for 30 seconds, annealing at 51 oC for 20 seconds, and extension at 77 oC for 1 minute; and a final extension step at 72 oC for 8 minutes, using the Eppendorf Vapo. Protect ™ Mastercycler® Pro S.
To confirm a successful fungal DNA extraction and amplification, 4 mL of the amplified fungal rDNA product of the PCR reaction was loaded onto a 1 % (w/v) agarose gel in a 1x Tris/Borate/EDTA (TBE) buffer, and 1 mL cyanide dye SYBR® DNA gel stain was added for visualisation purposes. One kilobase (1kb) plus DNA ladder (5 mL) and 5 mL of the negative control were also loaded onto the agarose gel. Following the completion of gel electrophoresis, PCR products were visualised with the GelDocTM XR Plus System (BIO‑RAD, USA). The 1kb plus DNA ladder was used to determine the size of the amplified fungal DNA fragments using the Gelanalyzer 2010a quantification programme. The fungal rDNA fragments of the ITS1‑5.8s‑ITS2 region obtained from PCR were then transferred to the Centre of Genomics, Proteomics and Metabolomics DNA sequencing facility for sequencing.
Capillary Electrophoresis DNA Sequencing (Sanger Sequencing) was used to obtain the DNA sequences of the amplified ITS1‑5.8s‑ITS2 region. Each sample containing fungal DNA template had two reactions performed, one for each primer and were mixed with the ABI PRISMTM BIG DYE Terminator Sequencing Kit version 3.1 (ThermoFisher Scientific) containing DNA polymerase enzyme, a buffer, four DNA nucleotides and four chain-terminating dideoxy nucleotides with fluorescent dyes. The samples were then subjected to cycle sequencing on the thermal cycler Applied Biosystems GeneAmp® PCR System 9700 using standard cycling conditions: a preliminary step of polymerase activation at 96 oC for 1 minute; 25 cycles of denaturation at 96 oC for 10 seconds, annealing at 50 oC for 5 seconds, and extension at 60 oC for 4 minutes. Following the cycle sequencing, the samples were purified using Agencourt® CleanSEQ® magnetic beads in order to remove the excess fluorescent dyes, nucleotides, salts and other contaminants. The remaining purified DNA samples were then separated by size by capillary electrophoresis with the ABI PRISMTM 3130XL Genetic Analyzer using 50 cm capillaries and POP7 polymer. The final data output of the ITS‑5.8s‑ITS2 region DNA sequences was based on the detection of the attached fluorescent dyes excited by a laser.
Geneious programme version 11.1.5 (www.geneious.com) was used to analyse the raw data [54]. The data included both forward and reverse rDNA sequences for each fungal isolate. These sequences were aligned and ends showing poor quality reads were trimmed, to obtain a consensus sequence. A tool within the Geneious programme, BLAST (Basic Local Alignment Search Tool) developed by Altschul et al. [55], optimised for fast and high similarity search (MegaBLAST version), was used to compare the consensus query sequence with known DNA sequences in GenBank (NCBI genetic sequence database), EMBL (European Molecular Biology Laboratory), DDBJ (DNA DataBank of Japan) and PDB (Protein Data Bank, Worldwide). The search results included: grade percentage score showing combinatorial results of the query input sequence coverage, expectation-value (e-value) and identity value for each hit against the database; identities match and percentage score indicating the extent to which the query DNA sequence matched the database nucleotide sequence; and bit-score showing the quality of alignment and measuring sequence similarity [56]. The higher the score of each result, the higher the certainty of identification of the fungal species. Grade percentage score of >98 % was considered as correct genomic identification.