53 datasets found

n
European Nucleotide Archive (ENA)
neuinfo.org
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515/resolver?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_006515 https://identifiers.org/RRID:SCR_006515/resolver?q=&i=rrid
Description
Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.
r
Supplemental data from the genome assembly and annotation of the Clouded...
researchdata.se
Updated Jun 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Höglund; Guilherme Dias; Remi-André Olsen; André Soares; Ignas Bunikis; Venkat Talla; Niclas Backström (2024). Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne) [Dataset]. http://doi.org/10.17044/SCILIFELAB.25908748
Explore at:
Unique identifier
https://doi.org/10.17044/SCILIFELAB.25908748
Dataset updated
Jun 26, 2024
Dataset provided by
Uppsala University
Authors
Jacob Höglund; Guilherme Dias; Remi-André Olsen; André Soares; Ignas Bunikis; Venkat Talla; Niclas Backström
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:

Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031

Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269) .

The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ).

Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.

pmne_functional_edit1.gff.gz contains the functional annotation (protein coding genes) of the primary genome assembly (GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ). This is the original file that was submitted to ENA. A derived version of the file is available from NCBI; the NCBI version was generated from the EMBL records of each annotated gene and differs in that it for instance use a different naming scheme for the seqid column and the locus tags. The NCBI version is available at this link (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/963/668/995/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11_genomic.gff.gz) .

The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISweden).

pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).

pmne_mtdna.gff.gz contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).

pmne_ncRNAs.gff.gz contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.

pmne_tRNAs_and_pseudogenes.gff.gz contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).

pmne_PacBio_isoseq.sorted.bam contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436) ) aligned to the primary genome assembly.

pmne_repeat_library.fa.gz contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).

Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats.

The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdf

The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformat/

ContactFor questions about this dataset, please contact: jacob.hoglund@ebc.uu.se niclas.backstrom@ebc.uu.se
o
Data from: Whole genome sequence and annotation dataset of rare...
explore.openaire.eu
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sin Yee Chong; Aida Azrina Azmi; Yoke Kqueen Cheah (2023). Whole genome sequence and annotation dataset of rare actinobacteria, Barrientosiimonas humi gen. nov., sp. nov. 39T from Antarctica [Dataset]. http://doi.org/10.5281/zenodo.8265495
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8265495
Dataset updated
Aug 19, 2023
Authors
Sin Yee Chong; Aida Azrina Azmi; Yoke Kqueen Cheah
Area covered
Antarctica
Description
The present data files are the source files of the annotation output from the whole genome sequencing of rare actinobacteria, Barrientosiimonas humi gen. nov., sp. nov. 39T from Antarctica. The dataset of the whole-genome sequence of B. humi had been deposited in European Nucleotide Archive (ENA) repository under the accession number PRJEB44986 / ERP129097, direct URL to data: https://www.ebi.ac.uk/ena/browser/view/PRJEB44986 {"references": ["European Nucleotide Archive. (2021). Project PRJEB44986: Whole-genome Sequencing and Annotation of Barrientosiimonas humi gen. nov., sp. nov. 39T, a Novel Rare Actinobacteria from Barrientos Island, Antarctica. ENA Browser. PRJEB44986. Retrieved from https://www.ebi.ac.uk/ena/browser/view/PRJEB44986"]}
The OHEJP BeONE Project – Escherichia coli genome assembly dataset
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jul 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Escherichia coli genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802728
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7802728
Dataset updated
Jul 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset comprises the genome assemblies of 308 Escherichia coli samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7120057), comprising genome assemblies of 1,999 E. coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

File “BeONE_Ec_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

The archive “BeONE_Ec_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

Dataset selection and curation

This anonymized dataset of E. coli genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57098. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 308 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2.

Funding

This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

Acknowledgements

We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
The OHEJP BeONE Project – Salmonella enterica genome assembly dataset
zenodo.org
bin, zip
Updated Jul 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Salmonella enterica genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802723
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7802723
Dataset updated
Jul 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset comprises the genome assemblies of 1,540 Salmonella enterica samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7119735), comprising genome assemblies of 1,434 S. enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

File “BeONE_Se_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

The archive “BeONE_Se_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

Dataset selection and curation

This anonymized dataset of S. enterica genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57179. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,540 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019).

Funding

This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

Acknowledgements

We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
n
EBI Genomes
neuinfo.org
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). EBI Genomes [Dataset]. http://identifiers.org/RRID:SCR_002426/resolver/mentions
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002426 https://identifiers.org/RRID:SCR_002426/resolver/mentions
Dataset updated
Sep 29, 2024
Description
The EBI genomes pages give access to a large number of complete genomes including bacteria, archaea, viruses, phages, plasmids, viroids and eukaryotes. Methods using whole genome shotgun data are used to gain a large amount of genome coverage for an organism. WGS data for a growing number of organisms are being submitted to DDBJ/EMBL/GenBank. Genome entries have been listed in their appropriate category which may be browsed using the website navigation tool bar on the left. While organelles are all listed in a separate category, any from Eukaryota with chromosome entries are also listed in the Eukaryota page. Within each page, entries are grouped and sorted at the species level with links to the taxonomy page for that species separating each group. Within each species, entries whose source organism has been categorized further are grouped and numbered accordingly. Links are made to: * taxonomy * complete EMBL flatfile * CON files * lists of CON segments * Project * Proteomes pages * FASTA file of Proteins * list of Proteins
The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset
zenodo.org
bin, zip
Updated Jul 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7267487
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7267487
Dataset updated
Jul 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset comprises the genome assemblies of 1,426 Listeria monocytogenes samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7116878), comprising genome assemblies of 1,874 L. monocytogenes samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

File “BeONE_Lm_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers and in-silico Multi Locus Sequence Type.

The archive “BeONE_Lm_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

Dataset selection and curation

This anonymized dataset of L. monocytogenes genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57166. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,426 isolates passed the dataset curation step and were included in the final dataset.

Funding

This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.
d
Data from: Populations restored using regional seed are genetically diverse...
datadryad.org
search.dataone.org
zip
Updated Nov 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes Höfner; Theresa Klein-Raufhake; Christian Lampei; Ondrej Mudrak; Anna Bucharova; Walter Durka; Theresa Klein‐Raufhake (2021). Populations restored using regional seed are genetically diverse and similar to natural populations in the region [Dataset]. http://doi.org/10.5061/dryad.qbzkh18j0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.qbzkh18j0
Dataset updated
Nov 2, 2021
Dataset provided by
Dryad
Authors
Johannes Höfner; Theresa Klein-Raufhake; Christian Lampei; Ondrej Mudrak; Anna Bucharova; Walter Durka; Theresa Klein‐Raufhake
Time period covered
Oct 19, 2021
Description
Please refer to the methods section and supplementary information of: Höfner, J., Klein-Raufhake, T., Lampei, C., Mudrak, O., Bucharova, A. and Durka, A. (2021) ‘Populations restored using regional seed are genetically diverse and similar to natural populations in the region’, accepted in Journal of Applied Ecology
s
RNA sequencing data from: Aberrant expression of SLAMF6 constitutes a...
figshare.scilifelab.se
application/gzip
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl Sandén; Henrik Lilljebjörn; Thoas Fioretos (2025). RNA sequencing data from: Aberrant expression of SLAMF6 constitutes a targetable immune escape mechanism in acute myeloid leukemia [Dataset]. http://doi.org/10.17044/scilifelab.28033754.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.28033754.v2
Dataset updated
Oct 2, 2025
Dataset provided by
Lund University
Authors
Carl Sandén; Henrik Lilljebjörn; Thoas Fioretos
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
This dataset includes RNA sequencing (RNA-seq) data from the HNT-34 AML (acute myeloid leukemia) cell line after knockout of the SLAMF6 gene by CRISPR/Cas9 (SLAMF6-KO) or mock-knockout with a construct targeting the firefly luciferase gene (SLAMF6-WT). Libraries were produced using the Illumina stranded mRNA prep kit and sequenced on an Illumina Novaseq 6000 system (Illumina). The dataset is available as merged transcripts per million (TPM) data for all cases generated using Salmon (salmon.merged.gene_tpm.tsv.gz). Raw sequencing reads (fastq) are available at the European Nucleotide Archive (ENA) under accession ID PRJEB90909: https://www.ebi.ac.uk/ena/browser/view/PRJEB90909. Published in: Sandén et al, Nature Cancer, 2025: https://www.nature.com/articles/s43018-025-01054-6
f
A list of accession number for samples included in this study.
datasetcatalog.nlm.nih.gov
Updated Aug 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hall, Matthew; House, Thomas; Lee, Mark R.; Ferretti, Luca; Fraser, Christophe; Piazza, Paolo; Harthern-Flint, Sarah; Fryer, Helen R.; Xhang, Xin; Elstob, Claire J.; Bonsall, David; Hinch, Robert; Dos Santos, Rui Nunes; Lonie, Lorne J; Chapman, Isobel; Richards, Zack; MacIntyre-Cockett, George; Crown, Matthew; Bashton, Matthew; Trebes, Amy; Nurtay, Anel; Tariq, Mohammed Adnan; Green, Angie; Thomson, Laura; Smith, Darren; Hawley, Joseph; Pellis, Lorenzo; Golubchik, Tanya; Nelson, Andrew; Buck, David; Lythgoe, Katrina A.; Carrillo-Barragan, Priscilla; McCann, Clare M. (2023). A list of accession number for samples included in this study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001099459
Explore at:
Dataset updated
Aug 14, 2023
Authors
Hall, Matthew; House, Thomas; Lee, Mark R.; Ferretti, Luca; Fraser, Christophe; Piazza, Paolo; Harthern-Flint, Sarah; Fryer, Helen R.; Xhang, Xin; Elstob, Claire J.; Bonsall, David; Hinch, Robert; Dos Santos, Rui Nunes; Lonie, Lorne J; Chapman, Isobel; Richards, Zack; MacIntyre-Cockett, George; Crown, Matthew; Bashton, Matthew; Trebes, Amy; Nurtay, Anel; Tariq, Mohammed Adnan; Green, Angie; Thomson, Laura; Smith, Darren; Hawley, Joseph; Pellis, Lorenzo; Golubchik, Tanya; Nelson, Andrew; Buck, David; Lythgoe, Katrina A.; Carrillo-Barragan, Priscilla; McCann, Clare M.
Description
Sequences can be accessed via the the European Nucleotide Archive (ENA) at https://www.ebi.ac.uk/ena/browser/home. (TXT)
Benchmark of 5S, 16S and 23S rRNA Secondary Structures
figshare.com
zip
Updated Aug 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michela Quadrini; Luca Tesei; Emanuela Merelli (2022). Benchmark of 5S, 16S and 23S rRNA Secondary Structures [Dataset]. http://doi.org/10.6084/m9.figshare.20731783.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20731783.v1
Dataset updated
Aug 30, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Michela Quadrini; Luca Tesei; Emanuela Merelli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Benchmark of 5S, 16S, 23S rRNA
secondary structures taken from the CRW database https://crw-site.chemistry.gatech.edu/

Each molecule is available in bpseq, ct and dot-bracket-letter (db) format. For each format a version without header/additional information/comments is available in the corresponding bpseq-nH, ct-nH, db-nH folders.

In the files Archaea.xlsx, Bacteria.xlsx and Eukaryota.xslx the molecules in the benchmark are listed together with their Organism Name, ID and Phylogenetic classification (up to Order) according to the European Nucleotide Archive (ENA) taxonomy https://www.ebi.ac.uk/ena/browser/home

The accession number is available from the headers of the bpseq and ct formats.
d
Data from: A de novo chromosome-level genome assembly of Coregonus sp....
datadryad.org
zip
Updated May 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philine Feulner; Rishi De-Kayne; Stefan Zoller (2020). A de novo chromosome-level genome assembly of Coregonus sp. “Balchen”: one representative of the Swiss Alpine whitefish radiation [Dataset]. http://doi.org/10.5061/dryad.xd2547ddf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.xd2547ddf
Dataset updated
May 15, 2020
Dataset provided by
Dryad
Authors
Philine Feulner; Rishi De-Kayne; Stefan Zoller
Time period covered
Apr 30, 2020
Area covered
Switzerland
Description
For detailed methods please see the associated publication.
u
Data from: Reference transcriptomics of porcine peripheral immune cells...
agdatacommons.nal.usda.gov
zip
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. http://doi.org/10.15482/USDA.ADC/1522411
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1522411
Dataset updated
Nov 21, 2025
Dataset provided by
Ag Data Commons
Authors
Juber Herrera-Uribe; Jayne Wiarda; Sathesh K. Sivasankaran; Lance Daharsh; Haibo Liu; Kristen A. Byrne; Timothy P. L. Smith; Joan K. Lunney; Crystal L. Loving; Christopher K. Tuggle
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:

matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)

*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:

nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Genome Assembly Mycobacterium Bovis GCA_000195835.3
figshare.com
txt
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexessander Couto Alves (2025). Genome Assembly Mycobacterium Bovis GCA_000195835.3 [Dataset]. http://doi.org/10.6084/m9.figshare.29066618.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29066618.v1
Dataset updated
May 21, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Alexessander Couto Alves
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Genome sequence of the bovine tuberculosis bacillus Mycobacterium bovis AF2122/97https://www.ebi.ac.uk/ena/browser/view/GCA_000195835.3?show=chromosomes
Dataset supporting the tool 'delfies: a Python package for the detection of...
zenodo.org
application/gzip
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brice Letcher; Brice Letcher (2024). Dataset supporting the tool 'delfies: a Python package for the detection of DNA breakpoints with neo-telomere addition' [Dataset]. http://doi.org/10.5281/zenodo.14282333
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14282333
Dataset updated
Dec 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Brice Letcher; Brice Letcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Purpose

These data can be used to test my tool delfies on real data, to get a concrete sense of its inputs/outputs and test that it is
properly installed.

Description

Genome

I downloaded the genome of Oscheius onirici, accession: GCA_932521025.

I subsampled the genome to the last 2kbp of chromosome I, which contains an elimination breakpoint,
using `seqkit` v2.8.2, giving the FASTA file in this release.

Sequencing data

I then downloaded the following sequencing data for *O. onirici*, from the European Nucleotide Archive:

ERR5967937: Illumina NovaSeq 6000 paired end short reads. Reads are 2x150bp with average per-base quality of Q27.

ERR10796202: Oxford Nanopore PromethION long reads. Reads have average length 11.9kbp and average per-base quality Q11.4.

ERR7979900: Pacific Biosciences (PacBio) Sequel II long reads. Reads have average length 11.1kbp and average per-base quality Q28.

And aligned them to the above genome with `minimap2` version 2.26-r1175, using the following presets:
"map-ont" for the Nanopore data, "map-hifi" for the PacBio data, "sr" for the Illumina data.

After sorting with `samtools`, this gives the BAM files in this release.

Running delfies

I then ran `delfies` version 0.6.0 on each BAM and genome, as:

```sh
delfies --threads 16 \
--telo_forward_seq TTAGGC \
--breakpoint_type all \
--min_mapq 20 \
--min_supporting_reads 6 \
\${genome} \${bam} \${odirname}
```

The three resulting output directories are in this release, prefixed with `delfies_`.

A single, identical breakpoint is found using all three BAMs (see files '*breakpoint_locations.bed').

Data source

The above raw data were produced and released by the Wellcome Sanger Institute as part of projects
PRJEB51305 and PRJEB59023.
Z
Extended rat miRNA repertoire
data.niaid.nih.gov
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Canzler, Sebastian (2024). Extended rat miRNA repertoire [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12626179
Explore at:
Dataset updated
Jul 3, 2024
Dataset provided by
Helmholtz-Zentrum für Umweltforschung UFZ
Authors
Canzler, Sebastian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generally, Rattus norvegicus' miRNA repertoire falls short compared to the other rodent model organism, Mus musculus.

To extend the miRNA catalogue in Rattus norvegicus, we utilized Infernal v1.1 (Nawrocki and Eddy, 2013) to derive potential rat miRNA candidates starting from all available mammalian miRNA families in miRBase. We utilized MIRfix (Yazbeck et al., 2019) to curate the extended miRNA datasets automatically. Subsequent manual inspection and curation of miRNA alignments resulted in a reliable and comprehensive update to the rat miRNA annotation.

Key facts of the extended miRNA repertoire

342 miRNA families (40 novel families)

549 miRNA sequences (56 novel miRNAs)

11 corrected annotated miRNAs

European Nucleotide Archive

The 56 novel sequences not listed in miRBase before have been submitted to the European Nucleotide Archive at EMBL-EBI.They are accessible with the accession numbers OZ078105 - OZ078160.The sequences will be permanently available from the ENA browser at http://www.ebi.ac.uk/ena/data/view/.

An overview of all sequences is given here: http://www.ebi.ac.uk/ena/data/view/OZ078105-OZ078160.
Bacterial diversity (16S rRNA gene) in participant collected household...
hosted-metadata.bgs.ac.uk
Updated Nov 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
British Geological Survey (2021). Bacterial diversity (16S rRNA gene) in participant collected household vacuum dust from homes across two bioclimatic regions (UK and Greece), with associated participant questionnaire and trace element data. (NERC Grant NE/T004401/1) [Dataset]. https://hosted-metadata.bgs.ac.uk/geonetwork/srv/api/records/d0869679-6d34-14ce-e054-002128a47908?language=all
Explore at:
www:download-1.0-http--downloadAvailable download formats
Dataset updated
Nov 1, 2021
Dataset authored and provided by
British Geological Surveyhttps://www.bgs.ac.uk/
License
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
Time period covered
Oct 1, 2019 - Jul 1, 2021
Area covered
Greece, United Kingdom
Description
The <250um fraction of 28 household vacuum dust samples were extracted using high throughput isolation of microbial genomic DNA (21 samples from a national campaign within the UK and 7 samples from Greece, providing samples from two contrasting bioclimatic zones). Both positive and negative reagent controls were included to ensure sterility throughout the processing and sequencing steps, and a randomly selected sample was run in triplicate (DSUK179). These data (raw fastq files: Target_gene 16S and Target_subfragment V4) are available from the European Nucleotide Archive via the study accession PRJEB46920 with individual sample accession numbers ERX6130460 to ERX6130493; https://www.ebi.ac.uk/ena/browser/view/PRJEB46920). A wide range of anthropogenic factors are likely to affect the indoor microbiome and to capture some of this heterogeneity participants were asked to complete a questionnaire. In addition, trace element data were generated using an X-Ray fluorescence spectrometry on the <250um sieved fraction of the household vacuum dust. Sample location data are provided at town/city, Country level. Indoor dust serves as a reservoir for environmental exposure to microbial communities, many of which are benign, some are beneficial, whilst some exhibit pathogenicity. Whilst non-occupational exposure to a range of trace elements and organic contaminants in house dust are a known risk factor for a range of diseases and poor health outcomes, we know far less about the microbial communities associated with our indoor home environments, and their interaction/impacts on human health. Our knowledge of indoor residential bacterial biodiversity, biogeography and their associated drivers are still poorly understood. The data were collected to improve our understanding of the home microbiome.
h
jolma_subset
huggingface.co
Updated Aug 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
thewall (2023). jolma_subset [Dataset]. https://huggingface.co/datasets/thewall/jolma_subset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2023
Authors
thewall
License
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Description
PRJEB3289 https://www.ebi.ac.uk/ena/browser/view/PRJEB3289 Data that has been generated by HT-SELEX experiments (see Jolma et al. 2010. PMID: 20378718 for description of method) that has been now used to generate transcription factor binding specificity models for most of the high confidence human transcription factors. Sequence data is composed of reads generated with Illumina Genome Analyzer IIX and HiSeq2000 instruments. Samples are composed of single read sequencing of synthetic DNA fragments with a fixed length randomized region or samples derived from such a initial library by selection with a sequence specific DNA binding protein. Originally multiple samples with different "barcode" tag sequences were run on the same Illumina sequencing lane but the released files have been already de-multiplexed, and the constant regions and "barcodes" of each sequence have been cut out of the sequencing reads to facilitate the use of data. Some of the files are composed of reads from multiple different sequencing lanes and due to this each of the names of the individual reads have been edited to show the flowcell and lane that was used to generate it. Barcodes and oligonucleotide designs are indicated in the names of individual entries. Depending of the selection ligand design, the sequences in each of these fastq-files are either 14, 20, 30 or 40 bases long and had different flanking regions in both sides of the sequence. Each run entry is named in either of the following ways: Example 1) "BCL6B_DBD_AC_TGCGGG20NGA_1", where name is composed of following fields ProteinName_CloneType_Batch_BarcodeDesign_SelectionCycle. This experiment used barcode ligand TGCGGG20NGA, where both of the variable flanking constant regions are indicated as they were on the original sequence-reads. This ligand has been selected for one round of HT-SELEX using recombinant protein that contained the DNA binding domain of human transcription factor BCL6B. It also tells that the experiment was performed on batch of experiments named as "AC". Example 2) 0_TGCGGG20NGA_0 where name is composed of (zero)_BarcodeDesign_(zero) These sequences have been generated from sequencing of the initial non-selected pool. Same initial pools have been used in multiple experiments that were on different batches, thus for example this background sequence pool is the shared background for all of the following samples. BCL6B_DBD_AC_TGCGGG20NGA_1, ZNF784_full_AE_TGCGGG20NGA_3, DLX6_DBD_Y_TGCGGG20NGA_4 and MSX2_DBD_W_TGCGGG20NGA_2
h
tg2
huggingface.co
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
thewall (2023). tg2 [Dataset]. https://huggingface.co/datasets/thewall/tg2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2023
Authors
thewall
License
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Description
PRJDB9110 https://www.ebi.ac.uk/ena/browser/view/PRJDB9110 To generate RNA aptamers against human transglutaminase 2, we have performed the high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). Of the eight performed rounds, the rounds 0 to 8 have been sequenced.
Z
Dataset underlying the study "Enhanced Susceptibility to Tomato Chlorosis...
data.niaid.nih.gov
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ontiveros, Irene; Fernández-Pozo, Noé; Esteve-Codina, Anna; López-Moya, Juan José; Díaz-Pendón, Juan Antonio (2023). Dataset underlying the study "Enhanced Susceptibility to Tomato Chlorosis Virus (ToCV) in Hsp90- and Sgt1-Silenced Plants: Insights from Gene Expression Dynamics" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10362110
Explore at:
Dataset updated
Dec 12, 2023
Dataset provided by
Centro Nacional de Análisis Genómico
Center for Research in Agricultural Genomics
Consejo Superior de Investigaciones Científicas
Authors
Ontiveros, Irene; Fernández-Pozo, Noé; Esteve-Codina, Anna; López-Moya, Juan José; Díaz-Pendón, Juan Antonio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is underlying the scientific publication titled "Enhanced Susceptibility to Tomato Chlorosis Virus (ToCV) in Hsp90- and Sgt1-Silenced Plants: Insights from Gene Expression Dynamics", published in the Viruses journal. The dataset includes a time-course transcriptome analysis using RNA-seq of naïve (no whitefly and no virus), mock (non-viruliferous whiteflies) and ToCV (ToCV_viruliferous whiteflies)-treated tomato samples at 2, 7, and 14 days post-infection (dpi) and viral small RNAs derived from Tomato plants infected with ToCV at 14 dpi. The dataset provided here has been deposited in full by the authors in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB67704 (https://www.ebi.ac.uk/ena/browser/view/PRJEB67704The provided information in the dataset are further discussed and interpreted in detail, as well as their subsequent results, in the scientific publication. This research was conducted within the VIRTIGATION project, which is part of the EU Open Research Data pilot. This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 101000570.

Facebook

Twitter

Click to copy link

Link copied

Cite

European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515/resolver?q=&i=rrid

European Nucleotide Archive (ENA)

RRID:SCR_006515, OMICS_01029, r3d100010527, nif-0000-32981, European Nucleotide Archive (ENA) (RRID:SCR_006515), ENA, ENA, European Nucleotide Archive

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_006515 https://identifiers.org/RRID:SCR_006515/resolver?q=&i=rrid

Description

Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

Clear search

Close search

Google apps

Main menu

European Nucleotide Archive (ENA)

Supplemental data from the genome assembly and annotation of the Clouded...

Data from: Whole genome sequence and annotation dataset of rare...

The OHEJP BeONE Project – Escherichia coli genome assembly dataset

The OHEJP BeONE Project – Salmonella enterica genome assembly dataset

EBI Genomes

The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset

Data from: Populations restored using regional seed are genetically diverse...

RNA sequencing data from: Aberrant expression of SLAMF6 constitutes a...

A list of accession number for samples included in this study.

Benchmark of 5S, 16S and 23S rRNA Secondary Structures

Data from: A de novo chromosome-level genome assembly of Coregonus sp....

Data from: Reference transcriptomics of porcine peripheral immune cells...

Genome Assembly Mycobacterium Bovis GCA_000195835.3

Dataset supporting the tool 'delfies: a Python package for the detection of...

Purpose

Description

Genome

Sequencing data

Running delfies

Data source

Extended rat miRNA repertoire

Bacterial diversity (16S rRNA gene) in participant collected household...

jolma_subset

tg2

Dataset underlying the study "Enhanced Susceptibility to Tomato Chlorosis...

European Nucleotide Archive (ENA)See More Versions

RRID:SCR_006515, OMICS_01029, r3d100010527, nif-0000-32981, European Nucleotide Archive (ENA) (RRID:SCR_006515), ENA, ENA, European Nucleotide Archive

European Nucleotide Archive (ENA)