53 datasets found
  1. n

    European Nucleotide Archive (ENA)

    • neuinfo.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515/resolver?q=&i=rrid
    Explore at:
    Description

    Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

  2. r

    Supplemental data from the genome assembly and annotation of the Clouded...

    • researchdata.se
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Höglund; Guilherme Dias; Remi-André Olsen; André Soares; Ignas Bunikis; Venkat Talla; Niclas Backström (2024). Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne) [Dataset]. http://doi.org/10.17044/SCILIFELAB.25908748
    Explore at:
    Dataset updated
    Jun 26, 2024
    Dataset provided by
    Uppsala University
    Authors
    Jacob Höglund; Guilherme Dias; Remi-André Olsen; André Soares; Ignas Bunikis; Venkat Talla; Niclas Backström
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:

    Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031

    Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269) .

    The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ).

    Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.

    The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISweden).

    • pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).

    • pmne_mtdna.gff.gz contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).

    • pmne_ncRNAs.gff.gz contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.

    • pmne_tRNAs_and_pseudogenes.gff.gz contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).

    • pmne_PacBio_isoseq.sorted.bam contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436) ) aligned to the primary genome assembly.

    • pmne_repeat_library.fa.gz contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).

    Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats.

    The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

    The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdf

    The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformat/

    ContactFor questions about this dataset, please contact: jacob.hoglund@ebc.uu.se niclas.backstrom@ebc.uu.se

  3. o

    Data from: Whole genome sequence and annotation dataset of rare...

    • explore.openaire.eu
    Updated Aug 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sin Yee Chong; Aida Azrina Azmi; Yoke Kqueen Cheah (2023). Whole genome sequence and annotation dataset of rare actinobacteria, Barrientosiimonas humi gen. nov., sp. nov. 39T from Antarctica [Dataset]. http://doi.org/10.5281/zenodo.8265495
    Explore at:
    Dataset updated
    Aug 19, 2023
    Authors
    Sin Yee Chong; Aida Azrina Azmi; Yoke Kqueen Cheah
    Area covered
    Antarctica
    Description

    The present data files are the source files of the annotation output from the whole genome sequencing of rare actinobacteria, Barrientosiimonas humi gen. nov., sp. nov. 39T from Antarctica. The dataset of the whole-genome sequence of B. humi had been deposited in European Nucleotide Archive (ENA) repository under the accession number PRJEB44986 / ERP129097, direct URL to data: https://www.ebi.ac.uk/ena/browser/view/PRJEB44986 {"references": ["European Nucleotide Archive. (2021). Project PRJEB44986: Whole-genome Sequencing and Annotation of Barrientosiimonas humi gen. nov., sp. nov. 39T, a Novel Rare Actinobacteria from Barrientos Island, Antarctica. ENA Browser. PRJEB44986. Retrieved from https://www.ebi.ac.uk/ena/browser/view/PRJEB44986"]}

  4. The OHEJP BeONE Project – Escherichia coli genome assembly dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Escherichia coli genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802728
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 308 Escherichia coli samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7120057), comprising genome assemblies of 1,999 E. coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Ec_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

    The archive “BeONE_Ec_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of E. coli genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57098. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 308 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2.

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  5. The OHEJP BeONE Project – Salmonella enterica genome assembly dataset

    • zenodo.org
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Salmonella enterica genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802723
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 1,540 Salmonella enterica samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7119735), comprising genome assemblies of 1,434 S. enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Se_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

    The archive “BeONE_Se_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of S. enterica genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57179. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,540 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019).

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  6. n

    EBI Genomes

    • neuinfo.org
    Updated Sep 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). EBI Genomes [Dataset]. http://identifiers.org/RRID:SCR_002426/resolver/mentions
    Explore at:
    Dataset updated
    Sep 29, 2024
    Description

    The EBI genomes pages give access to a large number of complete genomes including bacteria, archaea, viruses, phages, plasmids, viroids and eukaryotes. Methods using whole genome shotgun data are used to gain a large amount of genome coverage for an organism. WGS data for a growing number of organisms are being submitted to DDBJ/EMBL/GenBank. Genome entries have been listed in their appropriate category which may be browsed using the website navigation tool bar on the left. While organelles are all listed in a separate category, any from Eukaryota with chromosome entries are also listed in the Eukaryota page. Within each page, entries are grouped and sorted at the species level with links to the taxonomy page for that species separating each group. Within each species, entries whose source organism has been categorized further are grouped and numbered accordingly. Links are made to: * taxonomy * complete EMBL flatfile * CON files * lists of CON segments * Project * Proteomes pages * FASTA file of Proteins * list of Proteins

  7. The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset

    • zenodo.org
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7267487
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 1,426 Listeria monocytogenes samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7116878), comprising genome assemblies of 1,874 L. monocytogenes samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Lm_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers and in-silico Multi Locus Sequence Type.

    The archive “BeONE_Lm_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of L. monocytogenes genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57166. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,426 isolates passed the dataset curation step and were included in the final dataset.

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

  8. d

    Data from: Populations restored using regional seed are genetically diverse...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Nov 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Höfner; Theresa Klein-Raufhake; Christian Lampei; Ondrej Mudrak; Anna Bucharova; Walter Durka; Theresa Klein‐Raufhake (2021). Populations restored using regional seed are genetically diverse and similar to natural populations in the region [Dataset]. http://doi.org/10.5061/dryad.qbzkh18j0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 2, 2021
    Dataset provided by
    Dryad
    Authors
    Johannes Höfner; Theresa Klein-Raufhake; Christian Lampei; Ondrej Mudrak; Anna Bucharova; Walter Durka; Theresa Klein‐Raufhake
    Time period covered
    Oct 19, 2021
    Description

    Please refer to the methods section and supplementary information of: Höfner, J., Klein-Raufhake, T., Lampei, C., Mudrak, O., Bucharova, A. and Durka, A. (2021) ‘Populations restored using regional seed are genetically diverse and similar to natural populations in the region’, accepted in Journal of Applied Ecology

  9. s

    RNA sequencing data from: Aberrant expression of SLAMF6 constitutes a...

    • figshare.scilifelab.se
    application/gzip
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl Sandén; Henrik Lilljebjörn; Thoas Fioretos (2025). RNA sequencing data from: Aberrant expression of SLAMF6 constitutes a targetable immune escape mechanism in acute myeloid leukemia [Dataset]. http://doi.org/10.17044/scilifelab.28033754.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Lund University
    Authors
    Carl Sandén; Henrik Lilljebjörn; Thoas Fioretos
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    This dataset includes RNA sequencing (RNA-seq) data from the HNT-34 AML (acute myeloid leukemia) cell line after knockout of the SLAMF6 gene by CRISPR/Cas9 (SLAMF6-KO) or mock-knockout with a construct targeting the firefly luciferase gene (SLAMF6-WT). Libraries were produced using the Illumina stranded mRNA prep kit and sequenced on an Illumina Novaseq 6000 system (Illumina). The dataset is available as merged transcripts per million (TPM) data for all cases generated using Salmon (salmon.merged.gene_tpm.tsv.gz). Raw sequencing reads (fastq) are available at the European Nucleotide Archive (ENA) under accession ID PRJEB90909: https://www.ebi.ac.uk/ena/browser/view/PRJEB90909. Published in: Sandén et al, Nature Cancer, 2025: https://www.nature.com/articles/s43018-025-01054-6

  10. f

    A list of accession number for samples included in this study.

    • datasetcatalog.nlm.nih.gov
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hall, Matthew; House, Thomas; Lee, Mark R.; Ferretti, Luca; Fraser, Christophe; Piazza, Paolo; Harthern-Flint, Sarah; Fryer, Helen R.; Xhang, Xin; Elstob, Claire J.; Bonsall, David; Hinch, Robert; Dos Santos, Rui Nunes; Lonie, Lorne J; Chapman, Isobel; Richards, Zack; MacIntyre-Cockett, George; Crown, Matthew; Bashton, Matthew; Trebes, Amy; Nurtay, Anel; Tariq, Mohammed Adnan; Green, Angie; Thomson, Laura; Smith, Darren; Hawley, Joseph; Pellis, Lorenzo; Golubchik, Tanya; Nelson, Andrew; Buck, David; Lythgoe, Katrina A.; Carrillo-Barragan, Priscilla; McCann, Clare M. (2023). A list of accession number for samples included in this study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001099459
    Explore at:
    Dataset updated
    Aug 14, 2023
    Authors
    Hall, Matthew; House, Thomas; Lee, Mark R.; Ferretti, Luca; Fraser, Christophe; Piazza, Paolo; Harthern-Flint, Sarah; Fryer, Helen R.; Xhang, Xin; Elstob, Claire J.; Bonsall, David; Hinch, Robert; Dos Santos, Rui Nunes; Lonie, Lorne J; Chapman, Isobel; Richards, Zack; MacIntyre-Cockett, George; Crown, Matthew; Bashton, Matthew; Trebes, Amy; Nurtay, Anel; Tariq, Mohammed Adnan; Green, Angie; Thomson, Laura; Smith, Darren; Hawley, Joseph; Pellis, Lorenzo; Golubchik, Tanya; Nelson, Andrew; Buck, David; Lythgoe, Katrina A.; Carrillo-Barragan, Priscilla; McCann, Clare M.
    Description

    Sequences can be accessed via the the European Nucleotide Archive (ENA) at https://www.ebi.ac.uk/ena/browser/home. (TXT)

  11. Benchmark of 5S, 16S and 23S rRNA Secondary Structures

    • figshare.com
    zip
    Updated Aug 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michela Quadrini; Luca Tesei; Emanuela Merelli (2022). Benchmark of 5S, 16S and 23S rRNA Secondary Structures [Dataset]. http://doi.org/10.6084/m9.figshare.20731783.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 30, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Michela Quadrini; Luca Tesei; Emanuela Merelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Benchmark of 5S, 16S, 23S rRNA
    secondary structures taken from the CRW database https://crw-site.chemistry.gatech.edu/

    Each molecule is available in bpseq, ct and dot-bracket-letter (db) format. For each format a version without header/additional information/comments is available in the corresponding bpseq-nH, ct-nH, db-nH folders.

    In the files Archaea.xlsx, Bacteria.xlsx and Eukaryota.xslx the molecules in the benchmark are listed together with their Organism Name, ID and Phylogenetic classification (up to Order) according to the European Nucleotide Archive (ENA) taxonomy https://www.ebi.ac.uk/ena/browser/home

    The accession number is available from the headers of the bpseq and ct formats.

  12. d

    Data from: A de novo chromosome-level genome assembly of Coregonus sp....

    • datadryad.org
    zip
    Updated May 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philine Feulner; Rishi De-Kayne; Stefan Zoller (2020). A de novo chromosome-level genome assembly of Coregonus sp. “Balchen”: one representative of the Swiss Alpine whitefish radiation [Dataset]. http://doi.org/10.5061/dryad.xd2547ddf
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2020
    Dataset provided by
    Dryad
    Authors
    Philine Feulner; Rishi De-Kayne; Stefan Zoller
    Time period covered
    Apr 30, 2020
    Area covered
    Switzerland
    Description

    For detailed methods please see the associated publication.

  13. u

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • agdatacommons.nal.usda.gov
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

    matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

    *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

    nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  14. Genome Assembly Mycobacterium Bovis GCA_000195835.3

    • figshare.com
    txt
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexessander Couto Alves (2025). Genome Assembly Mycobacterium Bovis GCA_000195835.3 [Dataset]. http://doi.org/10.6084/m9.figshare.29066618.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Alexessander Couto Alves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Genome sequence of the bovine tuberculosis bacillus Mycobacterium bovis AF2122/97https://www.ebi.ac.uk/ena/browser/view/GCA_000195835.3?show=chromosomes

  15. Dataset supporting the tool 'delfies: a Python package for the detection of...

    • zenodo.org
    application/gzip
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brice Letcher; Brice Letcher (2024). Dataset supporting the tool 'delfies: a Python package for the detection of DNA breakpoints with neo-telomere addition' [Dataset]. http://doi.org/10.5281/zenodo.14282333
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Brice Letcher; Brice Letcher
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Purpose


    These data can be used to test my tool delfies on real data, to get a concrete sense of its inputs/outputs and test that it is
    properly installed.

    Description

    Genome

    I downloaded the genome of Oscheius onirici, accession: GCA_932521025.

    I subsampled the genome to the last 2kbp of chromosome I, which contains an elimination breakpoint,
    using `seqkit` v2.8.2, giving the FASTA file in this release.

    Sequencing data

    I then downloaded the following sequencing data for *O. onirici*, from the European Nucleotide Archive:

    • ERR5967937: Illumina NovaSeq 6000 paired end short reads. Reads are 2x150bp with average per-base quality of Q27.
    • ERR10796202: Oxford Nanopore PromethION long reads. Reads have average length 11.9kbp and average per-base quality Q11.4.
    • ERR7979900: Pacific Biosciences (PacBio) Sequel II long reads. Reads have average length 11.1kbp and average per-base quality Q28.

    And aligned them to the above genome with `minimap2` version 2.26-r1175, using the following presets:
    "map-ont" for the Nanopore data, "map-hifi" for the PacBio data, "sr" for the Illumina data.

    After sorting with `samtools`, this gives the BAM files in this release.

    Running delfies

    I then ran `delfies` version 0.6.0 on each BAM and genome, as:

    ```sh
    delfies --threads 16 \
    --telo_forward_seq TTAGGC \
    --breakpoint_type all \
    --min_mapq 20 \
    --min_supporting_reads 6 \
    \${genome} \${bam} \${odirname}
    ```

    The three resulting output directories are in this release, prefixed with `delfies_`.

    A single, identical breakpoint is found using all three BAMs (see files '*breakpoint_locations.bed').

    Data source

    The above raw data were produced and released by the Wellcome Sanger Institute as part of projects
    PRJEB51305 and PRJEB59023.

  16. Z

    Extended rat miRNA repertoire

    • data.niaid.nih.gov
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Canzler, Sebastian (2024). Extended rat miRNA repertoire [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12626179
    Explore at:
    Dataset updated
    Jul 3, 2024
    Dataset provided by
    Helmholtz-Zentrum für Umweltforschung UFZ
    Authors
    Canzler, Sebastian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generally, Rattus norvegicus' miRNA repertoire falls short compared to the other rodent model organism, Mus musculus.

    To extend the miRNA catalogue in Rattus norvegicus, we utilized Infernal v1.1 (Nawrocki and Eddy, 2013) to derive potential rat miRNA candidates starting from all available mammalian miRNA families in miRBase. We utilized MIRfix (Yazbeck et al., 2019) to curate the extended miRNA datasets automatically. Subsequent manual inspection and curation of miRNA alignments resulted in a reliable and comprehensive update to the rat miRNA annotation.

    Key facts of the extended miRNA repertoire

    342 miRNA families (40 novel families)

    549 miRNA sequences (56 novel miRNAs)

    11 corrected annotated miRNAs

    European Nucleotide Archive

    The 56 novel sequences not listed in miRBase before have been submitted to the European Nucleotide Archive at EMBL-EBI.They are accessible with the accession numbers OZ078105 - OZ078160.The sequences will be permanently available from the ENA browser at http://www.ebi.ac.uk/ena/data/view/.

    An overview of all sequences is given here: http://www.ebi.ac.uk/ena/data/view/OZ078105-OZ078160.

  17. Bacterial diversity (16S rRNA gene) in participant collected household...

    • hosted-metadata.bgs.ac.uk
    Updated Nov 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    British Geological Survey (2021). Bacterial diversity (16S rRNA gene) in participant collected household vacuum dust from homes across two bioclimatic regions (UK and Greece), with associated participant questionnaire and trace element data. (NERC Grant NE/T004401/1) [Dataset]. https://hosted-metadata.bgs.ac.uk/geonetwork/srv/api/records/d0869679-6d34-14ce-e054-002128a47908?language=all
    Explore at:
    www:download-1.0-http--downloadAvailable download formats
    Dataset updated
    Nov 1, 2021
    Dataset authored and provided by
    British Geological Surveyhttps://www.bgs.ac.uk/
    License

    http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations

    Time period covered
    Oct 1, 2019 - Jul 1, 2021
    Area covered
    Greece, United Kingdom
    Description

    The <250um fraction of 28 household vacuum dust samples were extracted using high throughput isolation of microbial genomic DNA (21 samples from a national campaign within the UK and 7 samples from Greece, providing samples from two contrasting bioclimatic zones). Both positive and negative reagent controls were included to ensure sterility throughout the processing and sequencing steps, and a randomly selected sample was run in triplicate (DSUK179). These data (raw fastq files: Target_gene 16S and Target_subfragment V4) are available from the European Nucleotide Archive via the study accession PRJEB46920 with individual sample accession numbers ERX6130460 to ERX6130493; https://www.ebi.ac.uk/ena/browser/view/PRJEB46920). A wide range of anthropogenic factors are likely to affect the indoor microbiome and to capture some of this heterogeneity participants were asked to complete a questionnaire. In addition, trace element data were generated using an X-Ray fluorescence spectrometry on the <250um sieved fraction of the household vacuum dust. Sample location data are provided at town/city, Country level. Indoor dust serves as a reservoir for environmental exposure to microbial communities, many of which are benign, some are beneficial, whilst some exhibit pathogenicity. Whilst non-occupational exposure to a range of trace elements and organic contaminants in house dust are a known risk factor for a range of diseases and poor health outcomes, we know far less about the microbial communities associated with our indoor home environments, and their interaction/impacts on human health. Our knowledge of indoor residential bacterial biodiversity, biogeography and their associated drivers are still poorly understood. The data were collected to improve our understanding of the home microbiome.

  18. h

    jolma_subset

    • huggingface.co
    Updated Aug 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    thewall (2023). jolma_subset [Dataset]. https://huggingface.co/datasets/thewall/jolma_subset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2023
    Authors
    thewall
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    PRJEB3289 https://www.ebi.ac.uk/ena/browser/view/PRJEB3289 Data that has been generated by HT-SELEX experiments (see Jolma et al. 2010. PMID: 20378718 for description of method) that has been now used to generate transcription factor binding specificity models for most of the high confidence human transcription factors. Sequence data is composed of reads generated with Illumina Genome Analyzer IIX and HiSeq2000 instruments. Samples are composed of single read sequencing of synthetic DNA fragments with a fixed length randomized region or samples derived from such a initial library by selection with a sequence specific DNA binding protein. Originally multiple samples with different "barcode" tag sequences were run on the same Illumina sequencing lane but the released files have been already de-multiplexed, and the constant regions and "barcodes" of each sequence have been cut out of the sequencing reads to facilitate the use of data. Some of the files are composed of reads from multiple different sequencing lanes and due to this each of the names of the individual reads have been edited to show the flowcell and lane that was used to generate it. Barcodes and oligonucleotide designs are indicated in the names of individual entries. Depending of the selection ligand design, the sequences in each of these fastq-files are either 14, 20, 30 or 40 bases long and had different flanking regions in both sides of the sequence. Each run entry is named in either of the following ways: Example 1) "BCL6B_DBD_AC_TGCGGG20NGA_1", where name is composed of following fields ProteinName_CloneType_Batch_BarcodeDesign_SelectionCycle. This experiment used barcode ligand TGCGGG20NGA, where both of the variable flanking constant regions are indicated as they were on the original sequence-reads. This ligand has been selected for one round of HT-SELEX using recombinant protein that contained the DNA binding domain of human transcription factor BCL6B. It also tells that the experiment was performed on batch of experiments named as "AC". Example 2) 0_TGCGGG20NGA_0 where name is composed of (zero)_BarcodeDesign_(zero) These sequences have been generated from sequencing of the initial non-selected pool. Same initial pools have been used in multiple experiments that were on different batches, thus for example this background sequence pool is the shared background for all of the following samples. BCL6B_DBD_AC_TGCGGG20NGA_1, ZNF784_full_AE_TGCGGG20NGA_3, DLX6_DBD_Y_TGCGGG20NGA_4 and MSX2_DBD_W_TGCGGG20NGA_2

  19. h

    tg2

    • huggingface.co
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    thewall (2023). tg2 [Dataset]. https://huggingface.co/datasets/thewall/tg2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2023
    Authors
    thewall
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    PRJDB9110 https://www.ebi.ac.uk/ena/browser/view/PRJDB9110 To generate RNA aptamers against human transglutaminase 2, we have performed the high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). Of the eight performed rounds, the rounds 0 to 8 have been sequenced.

  20. Z

    Dataset underlying the study "Enhanced Susceptibility to Tomato Chlorosis...

    • data.niaid.nih.gov
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ontiveros, Irene; Fernández-Pozo, Noé; Esteve-Codina, Anna; López-Moya, Juan José; Díaz-Pendón, Juan Antonio (2023). Dataset underlying the study "Enhanced Susceptibility to Tomato Chlorosis Virus (ToCV) in Hsp90- and Sgt1-Silenced Plants: Insights from Gene Expression Dynamics" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10362110
    Explore at:
    Dataset updated
    Dec 12, 2023
    Dataset provided by
    Centro Nacional de Análisis Genómico
    Center for Research in Agricultural Genomics
    Consejo Superior de Investigaciones Científicas
    Authors
    Ontiveros, Irene; Fernández-Pozo, Noé; Esteve-Codina, Anna; López-Moya, Juan José; Díaz-Pendón, Juan Antonio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is underlying the scientific publication titled "Enhanced Susceptibility to Tomato Chlorosis Virus (ToCV) in Hsp90- and Sgt1-Silenced Plants: Insights from Gene Expression Dynamics", published in the Viruses journal. The dataset includes a time-course transcriptome analysis using RNA-seq of naïve (no whitefly and no virus), mock (non-viruliferous whiteflies) and ToCV (ToCV_viruliferous whiteflies)-treated tomato samples at 2, 7, and 14 days post-infection (dpi) and viral small RNAs derived from Tomato plants infected with ToCV at 14 dpi. The dataset provided here has been deposited in full by the authors in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB67704 (https://www.ebi.ac.uk/ena/browser/view/PRJEB67704The provided information in the dataset are further discussed and interpreted in detail, as well as their subsequent results, in the scientific publication. This research was conducted within the VIRTIGATION project, which is part of the EU Open Research Data pilot. This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 101000570.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515/resolver?q=&i=rrid

European Nucleotide Archive (ENA)

RRID:SCR_006515, OMICS_01029, r3d100010527, nif-0000-32981, European Nucleotide Archive (ENA) (RRID:SCR_006515), ENA, ENA, European Nucleotide Archive

Explore at:
Description

Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

Search
Clear search
Close search
Google apps
Main menu