46 datasets found
  1. n

    European Nucleotide Archive (ENA)

    • neuinfo.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515
    Explore at:
    Description

    Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

  2. r

    Supplemental data from the genome assembly and annotation of the Clouded...

    • researchdata.se
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Höglund; Guilherme Dias; Remi-André Olsen; André Soares; Ignas Bunikis; Venkat Talla; Niclas Backström (2024). Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne) [Dataset]. http://doi.org/10.17044/SCILIFELAB.25908748
    Explore at:
    Dataset updated
    Jun 26, 2024
    Dataset provided by
    Uppsala University
    Authors
    Jacob Höglund; Guilherme Dias; Remi-André Olsen; André Soares; Ignas Bunikis; Venkat Talla; Niclas Backström
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:

    Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031

    Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269) .

    The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ).

    Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.

    The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISweden).

    • pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).

    • pmne_mtdna.gff.gz contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).

    • pmne_ncRNAs.gff.gz contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.

    • pmne_tRNAs_and_pseudogenes.gff.gz contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).

    • pmne_PacBio_isoseq.sorted.bam contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436) ) aligned to the primary genome assembly.

    • pmne_repeat_library.fa.gz contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).

    Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats.

    The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

    The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdf

    The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformat/

    ContactFor questions about this dataset, please contact: jacob.hoglund@ebc.uu.se niclas.backstrom@ebc.uu.se

  3. Z

    Data from: Whole genome sequence and annotation dataset of rare...

    • data.niaid.nih.gov
    Updated Aug 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azmi, Aida Azrina (2023). Whole genome sequence and annotation dataset of rare actinobacteria, Barrientosiimonas humi gen. nov., sp. nov. 39T from Antarctica [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8265495
    Explore at:
    Dataset updated
    Aug 20, 2023
    Dataset provided by
    Chong, Sin Yee
    Azmi, Aida Azrina
    Cheah, Yoke Kqueen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Antarctica
    Description

    The present data files are the source files of the annotation output from the whole genome sequencing of rare actinobacteria, Barrientosiimonas humi gen. nov., sp. nov. 39T from Antarctica.

    The dataset of the whole-genome sequence of B. humi had been deposited in European Nucleotide Archive (ENA) repository under the accession number PRJEB44986 / ERP129097, direct URL to data: https://www.ebi.ac.uk/ena/browser/view/PRJEB44986

  4. E

    snRNA-seq in white matter post-mortem tissue from MS and controls

    • ega-archive.org
    Updated Dec 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). snRNA-seq in white matter post-mortem tissue from MS and controls [Dataset]. https://ega-archive.org/datasets/EGAD00001004544
    Explore at:
    Dataset updated
    Dec 31, 2018
    License

    https://ega-archive.org/dacs/EGAC00001001105https://ega-archive.org/dacs/EGAC00001001105

    Description

    This Dataset is currently hosted by the European Nucleotide Archive. To access the data contained within the Dataset please follow the link below: https://www.ebi.ac.uk/ena/browser/view/PRJEB39323 Dataset consists of 20 snRNA-seq bam files from 10X v2. 5 samples from postmortem white matter tissue from non-neurological controls and15 samples from different MS lesions from the white matter tissue of 4 postmortem progressive MS patients.

  5. m

    The chloroplast and mitochondrial genome sequences of worldwide collection...

    • data.mendeley.com
    Updated Nov 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongfang Liu (2021). The chloroplast and mitochondrial genome sequences of worldwide collection of B. napus, B. rapa and B. oleracea accessions [Dataset]. http://doi.org/10.17632/9g7kxvgnyr.1
    Explore at:
    Dataset updated
    Nov 19, 2021
    Authors
    Hongfang Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains organelle genome sequences of globally collected Brassica accesions, in which the chloroplast genomes consists of 1,327 natural and 31 synthetic B. napus, 90 B.rapa and 107 B. oleracea accessions, and the mitochondrial genomes consists of 1,457 natural and 31 synthetic B. napus, 183 B.rapa and 104 B. oleracea accessions. The genome sequencing data of natural rapeseed accessions were obtained from the NCBI database under SRP155312, PRJNA430009 and PRJNA358784. The raw sequnceing data of 20 synthetic B. napus accessions can be found in European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home) under PRJEB5974 and PRJEB6069. The raw sequences of B. rapa and B. oleracea can be found in the NCBI database under BioProject accession PRJNA312457. After quality checking, we first mapped reads to the published cp and mt genomes of six Brassica species. The mapped paired-end reads were next extracted and de novo assembled for the cp and mt genomes by NOVOPlasty and ARC software (http://ibest.github.io/ARC/), respectively.

  6. d

    EBI Genomes

    • dknet.org
    • rrid.site
    • +2more
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). EBI Genomes [Dataset]. http://identifiers.org/RRID:SCR_002426
    Explore at:
    Dataset updated
    Jun 28, 2025
    Description

    The EBI genomes pages give access to a large number of complete genomes including bacteria, archaea, viruses, phages, plasmids, viroids and eukaryotes. Methods using whole genome shotgun data are used to gain a large amount of genome coverage for an organism. WGS data for a growing number of organisms are being submitted to DDBJ/EMBL/GenBank. Genome entries have been listed in their appropriate category which may be browsed using the website navigation tool bar on the left. While organelles are all listed in a separate category, any from Eukaryota with chromosome entries are also listed in the Eukaryota page. Within each page, entries are grouped and sorted at the species level with links to the taxonomy page for that species separating each group. Within each species, entries whose source organism has been categorized further are grouped and numbered accordingly. Links are made to: * taxonomy * complete EMBL flatfile * CON files * lists of CON segments * Project * Proteomes pages * FASTA file of Proteins * list of Proteins

  7. The OHEJP BeONE Project – Salmonella enterica genome assembly dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Salmonella enterica genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802723
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 1,540 Salmonella enterica samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7119735), comprising genome assemblies of 1,434 S. enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Se_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

    The archive “BeONE_Se_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of S. enterica genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57179. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,540 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019).

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  8. D-BeONE.1.2 BeONE dataset

    • zenodo.org
    • openagrar.de
    pdf
    Updated Jul 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Holger Brendebach; Simon Tausch; Simon Tausch; Miguel Pinto; Miguel Pinto; Carlus Deneke; Carlus Deneke; Karin Lagesen; Karin Lagesen; Vítor Borges; Vítor Borges (2024). D-BeONE.1.2 BeONE dataset [Dataset]. http://doi.org/10.5281/zenodo.7335590
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Holger Brendebach; Holger Brendebach; Simon Tausch; Simon Tausch; Miguel Pinto; Miguel Pinto; Carlus Deneke; Carlus Deneke; Karin Lagesen; Karin Lagesen; Vítor Borges; Vítor Borges
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    JRP24-FBZSH9-BEONE WP1 deliverable 1.2.

    WP Leader: Vítor Borges (INSA)

    Other contributors: Verónica Mixão (INSA), Miguel Pinto (INSA), Holger Brendebach (BfR), Simon Tausch (BfR), Carlus Deneke (BfR), Karin Lagesen (NVI)

    In order to contribute to the accomplishment of specific objectives of the BeOne project, WP1-T2 compiled an anonymized dataset (including sequencing reads and respective metadata) aiming to capture the genomic diversity within the populations of Listeria monocytogenes, Salmonella enterica, Escherichia coli (STEC) and Campylobacter jejuni. This dataset counts with data shared by the BeOne partners and comprises a total of 3,884 isolates, from which the anonymized sequencing reads were released in the European Nucleotide Archive (ENA) and the anonymized genome assemblies in the Zenodo repository [1,426 L. monocytogenes (accession: PRJEB57166 and 10.5281/zenodo.7267486); 1,540 S. enterica (accession: PRJEB57179 and 10.5281/zenodo.7267785); 308 E. coli (accession: PRJEB57098 and10.5281/zenodo.7267844); 610 C. jejuni (accession: PRJEB57119 and 10.5281/zenodo.7267879)].

    As a complement to the BeOne dataset, additional samples were carefully selected among the WGS data publicly available at the beginning of the analysis (November 2021) in ENA or the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), in order to ensure the representativeness of the genomic diversity within public databases (assessed in terms of sequence type or serotype, depending on the species). In the end, a so-called “public dataset” with the 8,383 samples that passed the curation step was released in Zenodo repository [1,874 L. monocytogenes (accession: 10.5281/zenodo.7116878); 1,434 S. enterica (accession: 10.5281/zenodo.7119735), 1,999 E. coli (accession: 10.5281/zenodo.7120057); 3,076 C. jejuni (accession: 10.5281/zenodo.7120166)].

  9. d

    Data from: Populations restored using regional seed are genetically diverse...

    • search.dataone.org
    • explore.openaire.eu
    • +2more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes Höfner; Theresa Klein-Raufhake; Christian Lampei; Ondrej Mudrak; Anna Bucharova; Walter Durka; Theresa Klein†Raufhake (2023). Populations restored using regional seed are genetically diverse and similar to natural populations in the region [Dataset]. http://doi.org/10.5061/dryad.qbzkh18j0
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Johannes Höfner; Theresa Klein-Raufhake; Christian Lampei; Ondrej Mudrak; Anna Bucharova; Walter Durka; Theresa Klein†Raufhake
    Time period covered
    Jan 1, 2021
    Description

    Ecological restoration and plant re-introductions aim to create plant populations that are genetically similar to natural populations to preserve the regional gene pool, yet genetically diverse to allow adaptation to a changing environment. For this purpose, seeds for restoration are increasingly sourced from multiple populations in the target region. However, it has only rarely been tested whether using regional seed indeed leads to genetically diverse restored populations which are genetically similar to natural populations. We used single nucleotide polymorphism (SNP) markers to investigate genetic diversity within and differentiation among populations of Centaurea jacea and Betonica officinalis on restored and natural meadows in the White Carpathians, Czech Republic. The restoration took place 20 years ago using regional seeds propagated from a mix of multiple regional source populations. We included original regional seeds in our analysis to compare the restored populations with th..., Please refer to the methods section and supplementary information of: Höfner, J., Klein-Raufhake, T., Lampei, C., Mudrak, O., Bucharova, A. and Durka, A. (2021) ‘Populations restored using regional seed are genetically diverse and similar to natural populations in the region’, accepted in Journal of Applied Ecology, These .vcf files represent the stage after filtering with 'vcftools' and tools from the 'vcflib' library and before import into R. These vcfs are derived from the raw sequencing data available on EMBL's European Nucleotide Archive (ENA) under accession number PRJEB45358 (https://www.ebi.ac.uk/ena/browser/view/PRJEB45358) and are thought to facilitate work with this data set.

  10. ICB_Riaz_RNAseq

    • zenodo.org
    zip
    Updated Aug 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Haibe-Kains; Benjamin Haibe-Kains (2022). ICB_Riaz_RNAseq [Dataset]. http://doi.org/10.5281/zenodo.6968453
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Haibe-Kains; Benjamin Haibe-Kains
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  11. d

    Data from: Reference transcriptomics of porcine peripheral immune cells...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +3more
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().

  12. f

    Benchmark of 5S, 16S and 23S rRNA Secondary Structures

    • figshare.com
    zip
    Updated Aug 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michela Quadrini; Luca Tesei; Emanuela Merelli (2022). Benchmark of 5S, 16S and 23S rRNA Secondary Structures [Dataset]. http://doi.org/10.6084/m9.figshare.20731783.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 30, 2022
    Dataset provided by
    figshare
    Authors
    Michela Quadrini; Luca Tesei; Emanuela Merelli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Benchmark of 5S, 16S, 23S rRNA
    secondary structures taken from the CRW database https://crw-site.chemistry.gatech.edu/

    Each molecule is available in bpseq, ct and dot-bracket-letter (db) format. For each format a version without header/additional information/comments is available in the corresponding bpseq-nH, ct-nH, db-nH folders.

    In the files Archaea.xlsx, Bacteria.xlsx and Eukaryota.xslx the molecules in the benchmark are listed together with their Organism Name, ID and Phylogenetic classification (up to Order) according to the European Nucleotide Archive (ENA) taxonomy https://www.ebi.ac.uk/ena/browser/home

    The accession number is available from the headers of the bpseq and ct formats.

  13. Z

    Extended rat miRNA repertoire

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Canzler, Sebastian (2024). Extended rat miRNA repertoire [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12626179
    Explore at:
    Dataset updated
    Jul 3, 2024
    Dataset authored and provided by
    Canzler, Sebastian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generally, Rattus norvegicus' miRNA repertoire falls short compared to the other rodent model organism, Mus musculus.

    To extend the miRNA catalogue in Rattus norvegicus, we utilized Infernal v1.1 (Nawrocki and Eddy, 2013) to derive potential rat miRNA candidates starting from all available mammalian miRNA families in miRBase. We utilized MIRfix (Yazbeck et al., 2019) to curate the extended miRNA datasets automatically. Subsequent manual inspection and curation of miRNA alignments resulted in a reliable and comprehensive update to the rat miRNA annotation.

    Key facts of the extended miRNA repertoire

    342 miRNA families (40 novel families)

    549 miRNA sequences (56 novel miRNAs)

    11 corrected annotated miRNAs

    European Nucleotide Archive

    The 56 novel sequences not listed in miRBase before have been submitted to the European Nucleotide Archive at EMBL-EBI.They are accessible with the accession numbers OZ078105 - OZ078160.The sequences will be permanently available from the ENA browser at http://www.ebi.ac.uk/ena/data/view/.

    An overview of all sequences is given here: http://www.ebi.ac.uk/ena/data/view/OZ078105-OZ078160.

  14. R

    Merlot Genome Assembly

    • entrepot.recherche.data.gouv.fr
    bin, txt
    Updated Nov 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautier Sarah; Gautier Sarah (2024). Merlot Genome Assembly [Dataset]. http://doi.org/10.57745/OJ07SN
    Explore at:
    bin(5382), bin(5026), txt(495745145), bin(3139), txt(499477870), bin(77747680), bin(77569928), txt(519298121), bin(77738660), txt(501053725), txt(537523085), bin(77572047), txt(593496959), txt(550617790), bin(4284), txt(500441798)Available download formats
    Dataset updated
    Nov 20, 2024
    Dataset provided by
    Recherche Data Gouv
    Authors
    Gautier Sarah; Gautier Sarah
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    Assembly of Merlot PacBio Hifi reads using hifiasm 0.13 software with the trio binning option. The reads are stored at ENA here: https://www.ebi.ac.uk/ena/browser/view/PRJEB59893 The run ERR10930361 is PacBio Hifi reads from the Merlot mother Magdeleine noire des Charentes The run ERR10930362 is PacBio Hifi reads from the Merlot father Cabernet franc The run ERR10930363 is PacBio Hifi reads from Merlot leaves The run ERR10930364 is PacBio Hifi reads from Merlot roots

  15. h

    alphaVbeta3

    • huggingface.co
    Updated May 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    thewall (2023). alphaVbeta3 [Dataset]. https://huggingface.co/datasets/thewall/alphaVbeta3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2023
    Authors
    thewall
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    PRJDB9111 https://www.ebi.ac.uk/ena/browser/view/PRJDB9111 To generate RNA aptamers against human integrin alphaV beta3, we have performed the high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). Of the six performed rounds, the rounds 3 to 6 have been sequenced.

  16. Sample Input Data (TRIP tool)

    • figshare.com
    application/x-gzip
    Updated Feb 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fotis Psomopoulos (2020). Sample Input Data (TRIP tool) [Dataset]. http://doi.org/10.6084/m9.figshare.11881713.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Feb 21, 2020
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Fotis Psomopoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This entry includes the IMGT High-VQuest output files that were used as input to the TRIP tool regarding1. The scalability experiments (IDs are BC23-OSR052411, BC23-OSR052411-OSR081811, OSR052311-OSR081811 and OSR052411-OSR052311-OSR081811). The corresponding raw FASTQ files are available here (https://www.ebi.ac.uk/ena/browser/view/PRJEB29674).2. The comparison experiments (IDs are T3304, T3396 and T3397). Raw TR sequence data can be found under accession number SRR3737053 in GenBank sequence database (www.ncbi.nlm.nih.gov/genbank/).

  17. n

    Example dataset input for IgIDivA

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zaragoza-Infante Laura (2022). Example dataset input for IgIDivA [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6616045
    Explore at:
    Dataset updated
    Jun 6, 2022
    Dataset authored and provided by
    Zaragoza-Infante Laura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example dataset input for the Immunoglobulin Intraclonal Diversification Analysis (IgIDivA) tool. (Publication of IgIDivA under revision)

    The data was retrieved from ENA (https://www.ebi.ac.uk/ena/browser/view/PRJEB36589?show=reads) under the accession number PRJEB36589, and subsequently processed with IMGT/HighV-QUEST (https://www.imgt.org/HighV-QUEST/home.action) and tripr (https://bioconductor.org/packages/release/bioc/html/tripr.html).

  18. Z

    Dataset underlying the study "Enhanced Susceptibility to Tomato Chlorosis...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernández-Pozo, Noé (2023). Dataset underlying the study "Enhanced Susceptibility to Tomato Chlorosis Virus (ToCV) in Hsp90- and Sgt1-Silenced Plants: Insights from Gene Expression Dynamics" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10362110
    Explore at:
    Dataset updated
    Dec 12, 2023
    Dataset provided by
    Fernández-Pozo, Noé
    Díaz-Pendón, Juan Antonio
    Esteve-Codina, Anna
    Ontiveros, Irene
    López-Moya, Juan José
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is underlying the scientific publication titled "Enhanced Susceptibility to Tomato Chlorosis Virus (ToCV) in Hsp90- and Sgt1-Silenced Plants: Insights from Gene Expression Dynamics", published in the Viruses journal. The dataset includes a time-course transcriptome analysis using RNA-seq of naïve (no whitefly and no virus), mock (non-viruliferous whiteflies) and ToCV (ToCV_viruliferous whiteflies)-treated tomato samples at 2, 7, and 14 days post-infection (dpi) and viral small RNAs derived from Tomato plants infected with ToCV at 14 dpi. The dataset provided here has been deposited in full by the authors in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB67704 (https://www.ebi.ac.uk/ena/browser/view/PRJEB67704The provided information in the dataset are further discussed and interpreted in detail, as well as their subsequent results, in the scientific publication. This research was conducted within the VIRTIGATION project, which is part of the EU Open Research Data pilot. This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 101000570.

  19. h

    jolma_subset

    • huggingface.co
    Updated Aug 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    thewall (2023). jolma_subset [Dataset]. https://huggingface.co/datasets/thewall/jolma_subset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2023
    Authors
    thewall
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    PRJEB3289 https://www.ebi.ac.uk/ena/browser/view/PRJEB3289 Data that has been generated by HT-SELEX experiments (see Jolma et al. 2010. PMID: 20378718 for description of method) that has been now used to generate transcription factor binding specificity models for most of the high confidence human transcription factors. Sequence data is composed of reads generated with Illumina Genome Analyzer IIX and HiSeq2000 instruments. Samples are composed of single read sequencing of synthetic DNA fragments with a fixed length randomized region or samples derived from such a initial library by selection with a sequence specific DNA binding protein. Originally multiple samples with different "barcode" tag sequences were run on the same Illumina sequencing lane but the released files have been already de-multiplexed, and the constant regions and "barcodes" of each sequence have been cut out of the sequencing reads to facilitate the use of data. Some of the files are composed of reads from multiple different sequencing lanes and due to this each of the names of the individual reads have been edited to show the flowcell and lane that was used to generate it. Barcodes and oligonucleotide designs are indicated in the names of individual entries. Depending of the selection ligand design, the sequences in each of these fastq-files are either 14, 20, 30 or 40 bases long and had different flanking regions in both sides of the sequence. Each run entry is named in either of the following ways: Example 1) "BCL6B_DBD_AC_TGCGGG20NGA_1", where name is composed of following fields ProteinName_CloneType_Batch_BarcodeDesign_SelectionCycle. This experiment used barcode ligand TGCGGG20NGA, where both of the variable flanking constant regions are indicated as they were on the original sequence-reads. This ligand has been selected for one round of HT-SELEX using recombinant protein that contained the DNA binding domain of human transcription factor BCL6B. It also tells that the experiment was performed on batch of experiments named as "AC". Example 2) 0_TGCGGG20NGA_0 where name is composed of (zero)_BarcodeDesign_(zero) These sequences have been generated from sequencing of the initial non-selected pool. Same initial pools have been used in multiple experiments that were on different batches, thus for example this background sequence pool is the shared background for all of the following samples. BCL6B_DBD_AC_TGCGGG20NGA_1, ZNF784_full_AE_TGCGGG20NGA_3, DLX6_DBD_Y_TGCGGG20NGA_4 and MSX2_DBD_W_TGCGGG20NGA_2

  20. Screening of AMR-related genes in the genomes of Vibrio parahaemolyticus...

    • zenodo.org
    bin, csv, pdf
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaime Martinez-Urtaza; Jaime Martinez-Urtaza; Jordi Manuel Cabrera-Gumbau; Jordi Manuel Cabrera-Gumbau (2024). Screening of AMR-related genes in the genomes of Vibrio parahaemolyticus strains isolated in Europe from clinical, environmental and other sources [Dataset]. http://doi.org/10.5281/zenodo.12514500
    Explore at:
    bin, csv, pdfAvailable download formats
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jaime Martinez-Urtaza; Jaime Martinez-Urtaza; Jordi Manuel Cabrera-Gumbau; Jordi Manuel Cabrera-Gumbau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The distribution of antimicrobial resistance (AMR) genes for the EU and European Free Trade Association (EFTA) countries data was obtained from the global Vibrio parahaemolyticus genomes based on a collection of nearly 10,000 genomes. Some of the strains are from the collection of prof. Jaime Martinez-Urtaza (Department of Genetics and Microbiology, Universitat Autònoma de Barcelona) or are part of ongoing studies to expand the genome collection; other genomes were retrieved from the European Nucleotide Archive (ENA at https://www.ebi.ac.uk/ena/browser/home) and the National Center for Biotechnology Information (NCBI) [GenBank at https://www.ncbi.nlm.nih.gov/genbank/; RefSeq at https://www.ncbi.nlm.nih.gov/refseq/; SRA at https://www.ncbi.nlm.nih.gov/sra]. For detection of AMR genes, a resistance genes detection pipeline based on one of the standard databases (CARD database at https://card.mcmaster.ca/) was used. The phylogenetic tree was prepared and includes the reference genome from Japan "Osaka" as reference. The RIMD 2210633 strain has been added as the global reference strain which has been historically used for all the phylogenetic analysis of V. parahaemolyticus. The metadata includes the source of the strain, i.e., country, origin (clinical, environmental or unclear), date of isolation, and subtype. The antibiotic-resistant genes are shown as present, absent or not applicable. To build the ARGs European V. parahaemolyticus tree, the Parsnp tool, a fast core-genome multi-aligner and SNP detector, from the Harvest suite was used (Treangen et al., 2014). Parsnp calculates the MUMi distances between the reference genome (RIMD_2210633) and each one of the 152 genomes used in this study. The resulting Newick formatted core genome SNP tree was then uploaded onto the webtool I-Tol (Letunic and Bork, 2021), midpoint rooted and the metadata of the samples was incorporated.

    The accession IDs for the genomes included in the metadata are accessible in the following databases according to the first characters:
    * GCA: GenBank (https://www.ncbi.nlm.nih.gov/genbank/)
    * GCF: RefSeq (https://www.ncbi.nlm.nih.gov/refseq/)
    * ERR: ENA (https://www.ebi.ac.uk/ena/browser/home)
    * SRR: SRA (https://www.ncbi.nlm.nih.gov/sra)

    References

    Letunic I and Bork P, 2021. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res, 49:W293-w296. doi: 10.1093/nar/gkab301

    Treangen TJ, Ondov BD, Koren S and Phillippy AM, 2014. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol, 15:524. doi: 10.1186/s13059-014-0524-x

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515

European Nucleotide Archive (ENA)

RRID:SCR_006515, OMICS_01029, nif-0000-32981, European Nucleotide Archive (ENA) (RRID:SCR_006515), ENA, ENA, European Nucleotide Archive

Explore at:
Description

Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

Search
Clear search
Close search
Google apps
Main menu