100+ datasets found
  1. SILVA v132 + v138, NR99, in ARB+UDB11 format

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kasper Skytte Andersen; Morten Simonsen Dueholm (2023). SILVA v132 + v138, NR99, in ARB+UDB11 format [Dataset]. http://doi.org/10.6084/m9.figshare.9994568.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kasper Skytte Andersen; Morten Simonsen Dueholm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SILVA release 132 and 138 non-redundant (clustered at 99%) database including typestrains in both ARB and UDB (usearch11) formats. For use with https://github.com/KasperSkytte/AutoTax

  2. metadata and silva classifier

    • figshare.com
    txt
    Updated Aug 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apiwat Sangphukieo (2022). metadata and silva classifier [Dataset]. http://doi.org/10.6084/m9.figshare.20430963.v5
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 10, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Apiwat Sangphukieo
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    metadata and silva classifier

  3. SILVA ver. 132 database

    • figshare.com
    bin
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Genomica Microbiana; Bruno Gomez-Gil (2023). SILVA ver. 132 database [Dataset]. http://doi.org/10.6084/m9.figshare.6297371.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Genomica Microbiana; Bruno Gomez-Gil
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    SILVA ver 132 16S rRNA database formatted for the mg_pipeline; it has been dereplicated and now has 649,326 sequences.It is compressed with MFCompress from the original 983 Mb file. To uncompress it, get the software (http://sweet.ua.pt/ap/software/mfcompress/MFCompress-src-1.01.tgz) and run:MFCompressD SILVA_132_derep.fasta.mfc

  4. (high-temp) No 1. DADA2 Workflow (16S rRNA/ITS) Output

    • smithsonian.figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarrod Scott (2023). (high-temp) No 1. DADA2 Workflow (16S rRNA/ITS) Output [Dataset]. http://doi.org/10.25573/data.14687184.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Smithsonian Tropical Research Institute
    Authors
    Jarrod Scott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Output files from the No 1. DADA2 Workflow page of the SWELTR high-temp study. 16S rRNA Data ssu18_seqtab.rds: Sequence table before chimera checking. ssu18_seqtab.nochim.rds: Sequence table after chimera checking. ssu18_tax_silva.rds: Silva (v138) taxonomy table of ssu18_seqtab.nochim.rds. ssu18_tax_id.rds: IDTAXA taxonomy table of ssu18_seqtab.nochim.rds.

    ITS Data its18_seqtab.rds: Sequence table before chimera checking.

    File names and descriptions:

    its18_seqtab.nochim.rds: Sequence table after chimera checking. its18_tax.rds: UNITE general FASTA release for Fungi (v. 04.02.2020) taxonomy table of its18_seqtab.nochim.rds.

    Source code for the workflow can be found here: https://github.com/sweltr/high-temp/blob/main/dada2.Rmd

  5. f

    Phylogenetic tree

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Le Moigne, Alizee (2022). Phylogenetic tree [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000387018
    Explore at:
    Dataset updated
    Jul 14, 2022
    Authors
    Le Moigne, Alizee
    Description

    Maximum Likelihood Phylogenetic tree generated from all the raw sequences from FastTree (Price et al, 2010, Plos one). The sequences were aligned to the SILVA NR99 database with SINA v1.2.11 (Pruesse et al, 2012, Bioinformatics). This was done in the software Silva ACT (Alignement, Classification and Tree service). Model used : GTR, rates model for likelihoods: Gamma

  6. f

    Table_1_Comparison of Bioinformatics Pipelines and Operating Systems for the...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Jun 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirabelli, Peppino; Soricelli, Andrea; Mombelli, Elisa; Festari, Cristina; Greub, Gilbert; Gurry, Thomas; Mazzelli, Monica; Lopizzo, Nicola; Ribaldi, Federica; Cattaneo, Annamaria; Marizzoni, Moira; Frisoni, Giovanni B.; Provasi, Stefania; Salvatore, Marco; Franzese, Monica (2020). Table_1_Comparison of Bioinformatics Pipelines and Operating Systems for the Analyses of 16S rRNA Gene Amplicon Sequences in Human Fecal Samples.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000509442
    Explore at:
    Dataset updated
    Jun 17, 2020
    Authors
    Mirabelli, Peppino; Soricelli, Andrea; Mombelli, Elisa; Festari, Cristina; Greub, Gilbert; Gurry, Thomas; Mazzelli, Monica; Lopizzo, Nicola; Ribaldi, Federica; Cattaneo, Annamaria; Marizzoni, Moira; Frisoni, Giovanni B.; Provasi, Stefania; Salvatore, Marco; Franzese, Monica
    Description

    Amplicon high-throughput sequencing of 16S ribosomal RNA (rRNA) gene is currently the most widely used technique to investigate complex gut microbial communities. Microbial identification might be influenced by several factors, including the choice of bioinformatic pipelines, making comparisons across studies difficult. Here, we compared four commonly used pipelines (QIIME2, Bioconductor, UPARSE and mothur) run on two operating systems (OS) (Linux and Mac), to evaluate the impact of bioinformatic pipeline and OS on the taxonomic classification of 40 human stool samples. We applied the SILVA 132 reference database for all the pipelines. We compared phyla and genera identification and relative abundances across the four pipelines using the Friedman rank sum test. QIIME2 and Bioconductor provided identical outputs on Linux and Mac OS, while UPARSE and mothur reported only minimal differences between OS. Taxa assignments were consistent at both phylum and genus level across all the pipelines. However, a difference in terms of relative abundance was identified for all phyla (p < 0.013) and for the majority of the most abundant genera (p < 0.028), such as Bacteroides (QIIME2: 24.5%, Bioconductor: 24.6%, UPARSE-linux: 23.6%, UPARSE-mac: 20.6%, mothur-linux: 22.2%, mothur-mac: 21.6%, p < 0.001). The use of different bioinformatic pipelines affects the estimation of the relative abundance of gut microbial community, indicating that studies using different pipelines cannot be directly compared. A harmonization procedure is needed to move the field forward.

  7. The use of Foundational Ontologies in Bioinformatics - Supplementary...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    txt
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    César H. Bernabé; César H. Bernabé; Núria Queralt-Rosinach; Núria Queralt-Rosinach; Vítor E. Silva Souza; Vítor E. Silva Souza; Luiz Olavo Bonino da Silva Santos; Luiz Olavo Bonino da Silva Santos; Annika Jacobsen; Annika Jacobsen; Barend Mons; Barend Mons; Marco Roos; Marco Roos (2024). The use of Foundational Ontologies in Bioinformatics - Supplementary Material [Dataset]. http://doi.org/10.5281/zenodo.6961846
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    César H. Bernabé; César H. Bernabé; Núria Queralt-Rosinach; Núria Queralt-Rosinach; Vítor E. Silva Souza; Vítor E. Silva Souza; Luiz Olavo Bonino da Silva Santos; Luiz Olavo Bonino da Silva Santos; Annika Jacobsen; Annika Jacobsen; Barend Mons; Barend Mons; Marco Roos; Marco Roos
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Supplementary material for the paper "The use of Foundational Ontologies in Bioinformatics".

  8. Datasets from An Atlas of Plant Transposable Elements

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Nov 8, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Longhi Fernandes Pedro; Tharcisio Soares Amorim; Alessandro de Mello Varani; Alessandro de Mello Varani; Romain Guyot; Romain Guyot; Doulgas Silva Domingues; Doulgas Silva Domingues; Alexandre Rossi Paschoal; Alexandre Rossi Paschoal; Daniel Longhi Fernandes Pedro; Tharcisio Soares Amorim (2021). Datasets from An Atlas of Plant Transposable Elements [Dataset]. http://doi.org/10.5281/zenodo.5574528
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 8, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel Longhi Fernandes Pedro; Tharcisio Soares Amorim; Alessandro de Mello Varani; Alessandro de Mello Varani; Romain Guyot; Romain Guyot; Doulgas Silva Domingues; Doulgas Silva Domingues; Alexandre Rossi Paschoal; Alexandre Rossi Paschoal; Daniel Longhi Fernandes Pedro; Tharcisio Soares Amorim
    License

    Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    In this repository, we deposited support data for the article "An Atlas of Plant Transposable Elements", available at http://apte.cp.utfpr.edu.br/.

    Here, we included:

    1.) Supplementary material data:
    A) SuppMat_1.xlsx: The genome assembly reference access from Ensembl Plants species used.
    B) SuppMat_2.docx: A brief transposable elements annotation steps used in this work.

    2.) Code and software: all script code create, third part software, how we used it, are detailed using Arabidopsis thaliana genome as an example in the GitHub: https://github.com/daniellonghi/te_pipeline under the MIT license (please see details in licence.txt file). For third part-software, consult their terms.

    To report bugs, to ask for help, and to give any feedback, please contact Alexandre R. Paschoal (paschoal@utfpr.edu.br) or Douglas S. Domingues (douglas.domingues@unesp.br).

  9. d

    (16s) No 1. DADA2 Workflow (16S rRNA) Output

    • search.dataone.org
    • smithsonian.figshare.com
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jarrod Scott (2024). (16s) No 1. DADA2 Workflow (16S rRNA) Output [Dataset]. https://search.dataone.org/view/urn%3Auuid%3A7856bbb4-d93f-43b4-9cf5-7b64dc88ff61
    Explore at:
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    Smithsonian Research Data Repository
    Authors
    Jarrod Scott
    Description

    Output files from the No 1. DADA2 Workflow page of the Bocas Hypoxia study.

    File names and descriptions:

    RUN01_read_changes.txt : Tracking changes in read counts (per sample) from the beginning to end of the DADA2 workflow.

    RUN02_read_changes.txt : Tracking changes in read counts (per sample) from the beginning to end of the DADA2 workflow.

    combo_pipeline.rdata: contains sequence and taxonomy tables from the DADA2 pipeline needed for subsequent analyses. To see the Objects , in R run _load("combo_pipeline.rdata", verbose=TRUE)
    _

    1) seqtab.1: Sequence table from Run01 before merging with Run02.

    2) seqtab.1: Sequence table from Run02 before merging with Run01.

    3) st.sum: merged sequence table before removing chimeras

    4) st.all: duplicate of st.sum

    5) seqtab: merged sequence table after removing chimeras

    6) tax_silva: Silva (v132) taxonomy table of seqtab

    7) tax_gg: GreenGenes taxonomy table of seqtab

  10. f

    Dataset: The potential of genome-wide RAD sequences for resolving rapid...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khan, Gulzar; Zappi, Daniela C.; Franco, Fernando Faria; Ribolla, Paulo Eduardo Martins; Taylor, Nigel; Silva, Gislaine Angélica Rodrigues; Amaral, Danilo Trabuco; Moraes, Evandro Marsola; da Silva Andrade, Sónia Cristina; Eaton, Deren A. R.; Alonso, Diego Peres; Bombonato, Juliana Rodrigues (2020). Dataset: The potential of genome-wide RAD sequences for resolving rapid radiations: a case study in Cactaceae [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000568613
    Explore at:
    Dataset updated
    Jul 28, 2020
    Authors
    Khan, Gulzar; Zappi, Daniela C.; Franco, Fernando Faria; Ribolla, Paulo Eduardo Martins; Taylor, Nigel; Silva, Gislaine Angélica Rodrigues; Amaral, Danilo Trabuco; Moraes, Evandro Marsola; da Silva Andrade, Sónia Cristina; Eaton, Deren A. R.; Alonso, Diego Peres; Bombonato, Juliana Rodrigues
    Description

    The reconstruction of relationships within recently radiated groups is challenging even when massive amounts of sequencing data are available. The use of restriction site-associated DNA sequencing (RAD-Seq) to this end is promising. Here, we assessed the performance of RAD-Seq to infer the species-level phylogeny of the rapidly radiating genus Cereus (Cactaceae). To examine how the amount of genomic data affects resolution in this group, we used distinct datasets and implemented different analyses. We sampled 52 individuals of Cereus, representing 18 of the 25 species currently recognized, plus members of the closely allied genera Cipocereus and Praecereus, and other 11 Cactaceae genera as outgroups. Three scenarios of permissiveness to missing data were carried out in iPyRAD, assembling datasets with 4330% (333 loci), 45% (1440 loci), and 70% (6141 loci) of missing data. For each dataset, Maximum Likelihood (ML) trees were generated using two supermatrices, i.e., only SNPs and SNPs plus invariant sites. Accuracy and resolution were improved when the dataset with the highest number of loci was used (6141 loci), despite the high percentage of missing data included (70%). Coalescent trees estimated using SVDQuartets and ASTRAL are similar to those obtained by the ML reconstructions. Overall, we reconstruct a well-supported phylogeny of Cereus, which is resolved as monophyletic and composed of four main clades with high support in their internal relationships. Our findings also provide insights into the impact of missing data for phylogeny reconstruction using RAD loci. SamplingOur dataset includes 63 samples spanning 52 ingroups of Cereus and 11 outgroups (Table 1). ddRAD library preparation and sequencing 157Genomic DNA was extracted from root tissues using the DNeasy Plant Mini Kit (Qiagen). ddRAD libraries were prepared using high fidelity EcoRI and HPAII restriction enzymes following Campos et al. (2017) and Khan et al. (2019). Details of library preparation and sequencing are shown in Supplementary materialBioinformatics analyses Raw data were trimmed for adapters and quality filtered before SNPs calling. The quality of sequencing data was checked with FastQC 0.11.2 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc), visualized in MultiQC 1.0 (https://github.com/ewels/MultiQC), and filtered with SeqyClean 1.9.12 (Zhbannikov et al., 2017) using the following settings: minimum quality (Phred Score 20), minimum size (>65 bp), and Illumina contaminants (UniVec.fas). We used the iPyRAD pipeline (available at http://github.com/dereneaton/ipyrad) to identify homology among reads, make SNP calls, and format output files. The following parameter settings were implemented: mindepth_majrule = 6 (minimum depth for majority-rule base calling), clust_threshold = 0.85 (clustering threshold for de novo assembly), filter_adapters = 2 (strict filter), max_Hs_consens = 6 (maximum heterozygotes in consensus), min_samples_locus (minimum percentage of samples per locus 184for output). For the latter, values varied in three distinct scenarios concerning the permissiveness to missing data. These scenarios considered that the final set of loci should have at least 39 samples (scenario 1, approximately 30% of missing data), 26 samples (scenario 2, approximately 45% of missing data), or 13 samples (scenario 3, approximately 70% of missing data). After SNP calling, CD-HIT (Li and Godzik, 2006; Fu et al., 2012) was used to identify reverse-complement duplicates in the loci recovered by iPyRAD.

  11. Data from: Aligner optimization increases accuracy and decreases compute...

    • search.datacite.org
    • data.niaid.nih.gov
    • +1more
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly M. Robinson; Aziah S. Hawkins; Ivette Santana-Cruz; Ricky S. Adkins; Amol C. Shetty; Sushma Nagaraj; Lisa Sadzewicz; Luke J. Tallon; David A. Rasko; Claire M. Fraser; Anup Mahurkar; Joana C. Silva; Julie C. Dunning Hotopp (2018). Data from: Aligner optimization increases accuracy and decreases compute times in multi-species sequence data [Dataset]. http://doi.org/10.5061/dryad.m1m0p
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Dryad
    Authors
    Kelly M. Robinson; Aziah S. Hawkins; Ivette Santana-Cruz; Ricky S. Adkins; Amol C. Shetty; Sushma Nagaraj; Lisa Sadzewicz; Luke J. Tallon; David A. Rasko; Claire M. Fraser; Anup Mahurkar; Joana C. Silva; Julie C. Dunning Hotopp
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    National Science Foundation
    Description

    As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows–Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. Plasmodium falciparum or Brugia malayi) and one minority member (i.e. human or the Wolbachia endosymbiont wBm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In Plasmodium, at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the Plasmodium genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined Plasmodium–human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.

  12. e

    Data from: Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets...

    • ekoizpen-zientifikoa.ehu.eus
    • zenodo.org
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rossi, Mirko; Silva, Mickael Santos Da; Ribeiro-Gonçalves, Bruno Filipe; Silva, Diogo Nuno; Machado, Miguel Paulo; Oleastro, Mónica; Borges, Vítor; Isidro, Joana; Viera, Luis; Halkilahti, Jani; Jaakkonen, Anniina; Palma, Federica; Salmenlinna, Saara; Hakkinen, Marjaana; Garaizar, Javier; Bikandi, Joseba; Hilbert, Friederike; Carriço, João André; Rossi, Mirko; Silva, Mickael Santos Da; Ribeiro-Gonçalves, Bruno Filipe; Silva, Diogo Nuno; Machado, Miguel Paulo; Oleastro, Mónica; Borges, Vítor; Isidro, Joana; Viera, Luis; Halkilahti, Jani; Jaakkonen, Anniina; Palma, Federica; Salmenlinna, Saara; Hakkinen, Marjaana; Garaizar, Javier; Bikandi, Joseba; Hilbert, Friederike; Carriço, João André (2018). Innuendo Whole Genome And Core Genome Mlst Schemas And Datasets For Salmonella Enterica [Dataset]. https://ekoizpen-zientifikoa.ehu.eus/documentos/668fc45cb9e7c03b01bdb000
    Explore at:
    Dataset updated
    2018
    Authors
    Rossi, Mirko; Silva, Mickael Santos Da; Ribeiro-Gonçalves, Bruno Filipe; Silva, Diogo Nuno; Machado, Miguel Paulo; Oleastro, Mónica; Borges, Vítor; Isidro, Joana; Viera, Luis; Halkilahti, Jani; Jaakkonen, Anniina; Palma, Federica; Salmenlinna, Saara; Hakkinen, Marjaana; Garaizar, Javier; Bikandi, Joseba; Hilbert, Friederike; Carriço, João André; Rossi, Mirko; Silva, Mickael Santos Da; Ribeiro-Gonçalves, Bruno Filipe; Silva, Diogo Nuno; Machado, Miguel Paulo; Oleastro, Mónica; Borges, Vítor; Isidro, Joana; Viera, Luis; Halkilahti, Jani; Jaakkonen, Anniina; Palma, Federica; Salmenlinna, Saara; Hakkinen, Marjaana; Garaizar, Javier; Bikandi, Joseba; Hilbert, Friederike; Carriço, João André
    Description

    Dataset

    As reference dataset, 4,307 public available draft or complete genome assemblies and available metadata of Salmonella enterica have been downloaded from public repositories (i.e. EnteroBase, National Center for Biotechnology Information NCBIand The European Bioinformatics Institute EMBL-EBI; accessed April 2017). The collection includes 1,465 S. Enteritidis, 2,442 S.Typhimurium, and 400 of other frequently isolated serovars in Europe. The dataset includes also 153 S.Typhimurium variant 4,[5],12:i:- collected from different Italian regions between 2012 and 2014 during a surveillance study and 129 S. Enteritidis belonging to the INNUENDO sequence dataset (PRJEB27020). The 282 additional genomes were assembled using INNUca v3.1.

    File 'Metadata/Senterica_metadata.txt' contains metadata information for each strain including source classification, host taxa, year and country of isolation, serotype, classical pubMLST 7 genes ST classification, and source/method of the assembly.

    The directory 'Genomes' contains all the 4,589 assemblies of the strains listed in 'Metadata/Senterica_metadata.txt'. Please note that genomes marked as 'Enterobase' have been downloaded from Enterobase webpage http://enterobase.warwick.ac.uk.

    Schema creation and validation

    The wgMLST schema from EnteroBase have been downloaded and curated using chewBBACA AutoAlleleCDSCuration for removing all alleles that are not coding sequences (CDS). The quality of the remain loci have been assessed using chewBBACA Schema Evaluation and loci with single alleles, those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) and those present in less than 0.5% of the Salmonella genomes in EnteroBase at the date of the analysis (April 2017) have been removed. The wgMLST schema have been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the chewBBACA Allele Calling engine in more than 1% of a dataset composed by 4,589 Salmonella genomes.

    File 'Schemas/Senterica_wgMLST_ 8558_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 8,558 loci.

    File 'Schemas/Senterica_cgMLST_ 3255_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 3,255 loci and has been defined as the loci present in at least the 99% of the 4,589 Salmonella genomes. Genomes have no more than 2% of missing loci.

    File 'Allele_Profles/Senterica_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 4,589 Salmonella genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software.

    File 'Allele_Profles/Senterica_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 4,589 Salmonella genomes of the dataset. Please note that missing loci are indicated with a zero.

    Additional citations

    The schema are prepared to be used with chewBBACA. When using the schema in this repository please cite also:

    Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166

    Salmonella enterica schema is a derivation of EnteroBase Salmonella EnteroBase wgMLST schema. When using the schema in this repository please cite also:

    Alikhan N-F, Zhou Z, Sergeant MJ, Achtman M (2018) A genomic overview of the population structure of Salmonella. PLoS Genet 14 (4):e1007261. https://doi.org/10.1371/journal.pgen.1007261

  13. MIC Prediction | CEPID ARIES

    • kaggle.com
    zip
    Updated Aug 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana Julia R Silva (2025). MIC Prediction | CEPID ARIES [Dataset]. https://www.kaggle.com/datasets/anajuliarsilva/mic-prediction-cepid-aries
    Explore at:
    zip(3542933232 bytes)Available download formats
    Dataset updated
    Aug 24, 2025
    Authors
    Ana Julia R Silva
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data available in BV-BRC: Bacterial and Viral Bioinformatics Resource Center: https://www.bv-brc.org

  14. ASV counts and taxonomic assignments PRJNA507590/SRP171602

    • figshare.com
    xlsx
    Updated Jul 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avril von Hoyningen-Huene; Dominik Schneider; Dario Fussmann; Andreas Reimer; Gernot Arp; Rolf Daniel (2019). ASV counts and taxonomic assignments PRJNA507590/SRP171602 [Dataset]. http://doi.org/10.6084/m9.figshare.8832458.v3
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 18, 2019
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Avril von Hoyningen-Huene; Dominik Schneider; Dario Fussmann; Andreas Reimer; Gernot Arp; Rolf Daniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table contains the bacterial 16S rRNA amplicon sequence variant (ASV) count data and taxonomic assignment after bioinformatic processing used in the data descriptor: "Bacterial succession along a sediment porewater gradient at Lake Neusiedl in Austria". Further information on the bioinformatic processing can be found in the material and methods section of the paper.

  15. m

    Database of Peptides with Potential for Pharmacological Intervention in...

    • data.mendeley.com
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Micael da Silva Pirazoli Gonzalez (2023). Database of Peptides with Potential for Pharmacological Intervention in Human Pathogen Molecular Targets [Dataset]. http://doi.org/10.17632/2zhgy9ggdv.1
    Explore at:
    Dataset updated
    Jun 6, 2023
    Authors
    Micael da Silva Pirazoli Gonzalez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Peptides are polymeric chains used as research objects in the search for new drugs with greater efficacy and fewer side effects. Therefore, we created three databases of antimicrobial peptides using PubChem and ChEMBL. First we acquired the Simplified Molecular-Input Line-Entry System (SMILES) of several peptides belonging to different types of pathogens, namely bacteria, viruses, parasites, and fungi. Using the OpenBabel software, these SMILES had their file formats and structures converted to create: one database in one dimension SMI format, and two with three-dimensional MOL2 and PDB file formats. In total the three databases consists of 718 peptides that have been shown to possess inhibitory activity on molecular targets of clinically important pathogens.

  16. R

    RMQS1 16S taxonomy

    • entrepot.recherche.data.gouv.fr
    tsv
    Updated Dec 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Battle Karimi; Battle Karimi; Sébastien Terrat; Sébastien Terrat; Samuel Dequiedt; Samuel Dequiedt; Nicolas P. A. Saby; Nicolas P. A. Saby; Walid Horrigue; Walid Horrigue; Mélanie Lelièvre; Mélanie Lelièvre; Virginie Nowak; Claudy Jolivet; Claudy Jolivet; Dominique Arrouays; Dominique Arrouays; Patrick Wincker; Patrick Wincker; Corinne Cruaud; Corinne Cruaud; Antonio Bispo; Antonio Bispo; Pierre-Alain Maron; Pierre-Alain Maron; Nicolas Chemidlin Prévost-Bouré; Nicolas Chemidlin Prévost-Bouré; Lionel Ranjard; Lionel Ranjard; Virginie Nowak (2023). RMQS1 16S taxonomy [Dataset]. http://doi.org/10.57745/WIRXIC
    Explore at:
    tsv(350834), tsv(2014217), tsv(8326), tsv(401594), tsv(31313), tsv(2953603), tsv(13331214), tsv(821035), tsv(2431), tsv(60097)Available download formats
    Dataset updated
    Dec 5, 2023
    Dataset provided by
    Recherche Data Gouv
    Authors
    Battle Karimi; Battle Karimi; Sébastien Terrat; Sébastien Terrat; Samuel Dequiedt; Samuel Dequiedt; Nicolas P. A. Saby; Nicolas P. A. Saby; Walid Horrigue; Walid Horrigue; Mélanie Lelièvre; Mélanie Lelièvre; Virginie Nowak; Claudy Jolivet; Claudy Jolivet; Dominique Arrouays; Dominique Arrouays; Patrick Wincker; Patrick Wincker; Corinne Cruaud; Corinne Cruaud; Antonio Bispo; Antonio Bispo; Pierre-Alain Maron; Pierre-Alain Maron; Nicolas Chemidlin Prévost-Bouré; Nicolas Chemidlin Prévost-Bouré; Lionel Ranjard; Lionel Ranjard; Virginie Nowak
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Dataset funded by
    French Agency for Ecological Transition (ADEME)
    French National Research Agency (ANR)
    France Génomique
    Description

    RMQS: The French Soil Quality Monitoring Network (RMQS) is a national program for the assessment and long-term monitoring of the quality of French soils. This network is based on the monitoring of 2,240 sites representative of French soils and their land use. These sites are spread over the whole French territory (metropolitan and overseas) along a systematic square grid of 16 km x 16 km cells. The network covers a broad spectrum of climatic, soil and land-use conditions (croplands, permanent grasslands, woodlands, orchards and vineyards, natural or scarcely anthropogenic land and urban parkland). The first sampling campaign in metropolitan France took place from 2000 to 2009. Dataset: This dataset contains taxonomic affiliation (genus;family;order;class;phylum) for 16S rDNA (Archaea + Bacteria) dataset of 1,842 sites of the RMQS. Soil 16S rDNA gene was sequenced using pyrosequecing (GS FLX Titanium - Roche 454) at Genosocope. Bioinformatics analysis was performed using BIOCOM-PIPE (previously named GNS-PIPE) metabarcoding pipeline. Sequences taxonomic affiliation is based on Silva r132 database (see this zenodo repository for details). Taxonomic affiliation was performed on a rarefied dataset (10,000 reads). See associated articles for details, as well as Terrat et.al. (2014). Raw sequencing data are available at EBI. File structure: Taxonomy was splitted across five files with one line per site and one column per taxa (rmqs1_taxonomy_). Each line sums to 10,000 (rarefaction defined threshold). Three supplementary columns are present: Unknown: not matching any reference. Unclassified: missing taxa between genus and phylum. Environmental: matched to sample from environmental study, generally with only a phylum name. Five metadata files describe upper taxonomic level for each taxa (rmqs1_taxonomy_.metadata.tsv). Details: Some sites sample could not be collected, they do not appear in dataset. Some sites did not pass laboratory or bioinformatics step to attain 10k sequences before taxonomic assignation, they dot not appear in the dataset. One can link this dataset with 10.15454/QSXKGA to get each sample physico-chemical property, landuse, coordinates, or filtering sites using its site_officiel column. Sites with ID longer than 4 number are supplementary sites that are not in the center of the cells (e.g. 10797 and 20797 that came from cell 797).

  17. NPOmix 1: antiSMASH results from 1,040 PoDP paired samples

    • zenodo.org
    zip
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiago F Leao; Mingxun Wang; Ricardo Silva; Alexey Gurevich; Anelize Bauermeister; Paulo WP Gomes; Asker Brejnrod; Evgenia Glukhov; Allegra T Aron; Joris JR Louwen; Hyun Woo Kim; Raphael Reher; Marli F Fiore; Justin JJ van der Hooft; Lena Gerwick; William H Gerwick; Nuno Bandeira; Pieter C Dorrestein; Tiago F Leao; Mingxun Wang; Ricardo Silva; Alexey Gurevich; Anelize Bauermeister; Paulo WP Gomes; Asker Brejnrod; Evgenia Glukhov; Allegra T Aron; Joris JR Louwen; Hyun Woo Kim; Raphael Reher; Marli F Fiore; Justin JJ van der Hooft; Lena Gerwick; William H Gerwick; Nuno Bandeira; Pieter C Dorrestein (2022). NPOmix 1: antiSMASH results from 1,040 PoDP paired samples [Dataset]. http://doi.org/10.5281/zenodo.6637083
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tiago F Leao; Mingxun Wang; Ricardo Silva; Alexey Gurevich; Anelize Bauermeister; Paulo WP Gomes; Asker Brejnrod; Evgenia Glukhov; Allegra T Aron; Joris JR Louwen; Hyun Woo Kim; Raphael Reher; Marli F Fiore; Justin JJ van der Hooft; Lena Gerwick; William H Gerwick; Nuno Bandeira; Pieter C Dorrestein; Tiago F Leao; Mingxun Wang; Ricardo Silva; Alexey Gurevich; Anelize Bauermeister; Paulo WP Gomes; Asker Brejnrod; Evgenia Glukhov; Allegra T Aron; Joris JR Louwen; Hyun Woo Kim; Raphael Reher; Marli F Fiore; Justin JJ van der Hooft; Lena Gerwick; William H Gerwick; Nuno Bandeira; Pieter C Dorrestein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used in the NPOmix validation (described in the publication) that includes antiSMASH results from 1,040 PoDP paired samples. The input for antiSMASH were FASTA genomes from NCBI that are listed at the PoDP database (https://pairedomicsdata.bioinformatics.nl). We remove all files from the antiSMASH output folder but the GenBank (.gbk) files for the BGCs.

  18. d

    Data from: Time of activity is a better predictor of the distribution of a...

    • search.dataone.org
    • datadryad.org
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Henrique de Oliveira Caetano; Juan Carlos Santos; Leandro Godinho; Vitor Cavalcante; Luisa Viegas; Pedro Campelo; Lidia Martins; Alan de Oliveira; Júlio Alvarenga; Helga Wiederhecker; Verônica de Novaes e Silva; Fernanda Werneck; Donald Miles; Guarino Colli; Barry Sinervo (2025). Time of activity is a better predictor of the distribution of a tropical lizard than pure environmental temperatures [Dataset]. http://doi.org/10.5061/dryad.b2rbnzsb7
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Gabriel Henrique de Oliveira Caetano; Juan Carlos Santos; Leandro Godinho; Vitor Cavalcante; Luisa Viegas; Pedro Campelo; Lidia Martins; Alan de Oliveira; Júlio Alvarenga; Helga Wiederhecker; Verônica de Novaes e Silva; Fernanda Werneck; Donald Miles; Guarino Colli; Barry Sinervo
    Time period covered
    Jan 1, 2020
    Description

    Environmental temperatures influence ectotherms’ physiology and capacity to perform activities necessary for survival and reproduction. Time available to perform those activities is determined by thermal tolerances and environmental temperatures. Estimates of activity time might enhance our ability to predict suitable areas for species’ persistence in face of climate warming, compared to the exclusive use of environmental temperatures, without considering thermal tolerances. We compare the ability of environmental temperatures and estimates of activity time to predict the geographic distribution of a tropical lizard, Tropidurus torquatus. We compared 105 estimates of activity time, resulting from the combination of four methodological decisions: (1) How to estimate daily environmental temperature variation (modeling a sinusoid wave ranging from monthly minimum to maximum temperature, extrapolating from operative temperatures measured in field or using biophysical projections of microcli...

  19. d

    Data from: Whole genome sequencing of elite rice cultivars as a...

    • search.dataone.org
    • plos.figshare.com
    • +1more
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Duitama; Alexander Silva; Yamid Sanabria; Daniel Felipe Cruz; Constanza Quintero; Carolina Ballen; Mathias Lorieux; Brian Scheffler; Andrew Farmer; Edgar Torres; James Oard; Joe Tohme (2025). Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection [Dataset]. http://doi.org/10.5061/dryad.8hg32
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Jorge Duitama; Alexander Silva; Yamid Sanabria; Daniel Felipe Cruz; Constanza Quintero; Carolina Ballen; Mathias Lorieux; Brian Scheffler; Andrew Farmer; Edgar Torres; James Oard; Joe Tohme
    Time period covered
    Jan 1, 2016
    Description

    Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GB...

  20. silva_nr_v138_train_set_usearch_SINTAX_compatible.fa

    • figshare.com
    txt
    Updated Apr 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Lee (2020). silva_nr_v138_train_set_usearch_SINTAX_compatible.fa [Dataset]. http://doi.org/10.6084/m9.figshare.12226949.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 30, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Michael Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Usearch formatted silva v138 converted from dada2-format (from here: https://zenodo.org/record/3731176#.XqsLVBNKhqU)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kasper Skytte Andersen; Morten Simonsen Dueholm (2023). SILVA v132 + v138, NR99, in ARB+UDB11 format [Dataset]. http://doi.org/10.6084/m9.figshare.9994568.v3
Organization logo

SILVA v132 + v138, NR99, in ARB+UDB11 format

Explore at:
zipAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kasper Skytte Andersen; Morten Simonsen Dueholm
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

SILVA release 132 and 138 non-redundant (clustered at 99%) database including typestrains in both ARB and UDB (usearch11) formats. For use with https://github.com/KasperSkytte/AutoTax

Search
Clear search
Close search
Google apps
Main menu