Facebook
TwitterPublic archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:
Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031
Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269 (https://www.ebi.ac.uk/ena/browser/view/PRJEB76269) .
The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1 (https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1) ), and the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ).
Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.
The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISweden).
pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451 (https://www.ebi.ac.uk/ena/browser/view/ERX11559451) ). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).
pmne_mtdna.gff.gz contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1 (https://www.ebi.ac.uk/ena/browser/view/OZ075093.1) ). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).
pmne_ncRNAs.gff.gz contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.
pmne_tRNAs_and_pseudogenes.gff.gz contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).
pmne_PacBio_isoseq.sorted.bam contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436 (https://www.ebi.ac.uk/ena/browser/view/ERX11559436) ) aligned to the primary genome assembly.
pmne_repeat_library.fa.gz contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).
Available variablesFor a description of the column headers of the files, please see the following links to the documentation of the different file formats.
The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdf
The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformat/
ContactFor questions about this dataset, please contact: jacob.hoglund@ebc.uu.se niclas.backstrom@ebc.uu.se
Facebook
TwitterThe present data files are the source files of the annotation output from the whole genome sequencing of rare actinobacteria, Barrientosiimonas humi gen. nov., sp. nov. 39T from Antarctica. The dataset of the whole-genome sequence of B. humi had been deposited in European Nucleotide Archive (ENA) repository under the accession number PRJEB44986 / ERP129097, direct URL to data: https://www.ebi.ac.uk/ena/browser/view/PRJEB44986 {"references": ["European Nucleotide Archive. (2021). Project PRJEB44986: Whole-genome Sequencing and Annotation of Barrientosiimonas humi gen. nov., sp. nov. 39T, a Novel Rare Actinobacteria from Barrientos Island, Antarctica. ENA Browser. PRJEB44986. Retrieved from https://www.ebi.ac.uk/ena/browser/view/PRJEB44986"]}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
This dataset comprises the genome assemblies of 308 Escherichia coli samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7120057), comprising genome assemblies of 1,999 E. coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).
File “BeONE_Ec_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.
The archive “BeONE_Ec_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.
Dataset selection and curation
This anonymized dataset of E. coli genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57098. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 308 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2.
Funding
This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.
Acknowledgements
We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
This dataset comprises the genome assemblies of 1,540 Salmonella enterica samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7119735), comprising genome assemblies of 1,434 S. enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).
File “BeONE_Se_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.
The archive “BeONE_Se_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.
Dataset selection and curation
This anonymized dataset of S. enterica genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57179. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,540 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019).
Funding
This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.
Acknowledgements
We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
Facebook
TwitterThe EBI genomes pages give access to a large number of complete genomes including bacteria, archaea, viruses, phages, plasmids, viroids and eukaryotes. Methods using whole genome shotgun data are used to gain a large amount of genome coverage for an organism. WGS data for a growing number of organisms are being submitted to DDBJ/EMBL/GenBank. Genome entries have been listed in their appropriate category which may be browsed using the website navigation tool bar on the left. While organelles are all listed in a separate category, any from Eukaryota with chromosome entries are also listed in the Eukaryota page. Within each page, entries are grouped and sorted at the species level with links to the taxonomy page for that species separating each group. Within each species, entries whose source organism has been categorized further are grouped and numbered accordingly. Links are made to: * taxonomy * complete EMBL flatfile * CON files * lists of CON segments * Project * Proteomes pages * FASTA file of Proteins * list of Proteins
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
This dataset comprises the genome assemblies of 1,426 Listeria monocytogenes samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7116878), comprising genome assemblies of 1,874 L. monocytogenes samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).
File “BeONE_Lm_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers and in-silico Multi Locus Sequence Type.
The archive “BeONE_Lm_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.
Dataset selection and curation
This anonymized dataset of L. monocytogenes genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57166. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,426 isolates passed the dataset curation step and were included in the final dataset.
Funding
This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.
Facebook
TwitterPlease refer to the methods section and supplementary information of: Höfner, J., Klein-Raufhake, T., Lampei, C., Mudrak, O., Bucharova, A. and Durka, A. (2021) ‘Populations restored using regional seed are genetically diverse and similar to natural populations in the region’, accepted in Journal of Applied Ecology
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset includes RNA sequencing (RNA-seq) data from the HNT-34 AML (acute myeloid leukemia) cell line after knockout of the SLAMF6 gene by CRISPR/Cas9 (SLAMF6-KO) or mock-knockout with a construct targeting the firefly luciferase gene (SLAMF6-WT). Libraries were produced using the Illumina stranded mRNA prep kit and sequenced on an Illumina Novaseq 6000 system (Illumina). The dataset is available as merged transcripts per million (TPM) data for all cases generated using Salmon (salmon.merged.gene_tpm.tsv.gz). Raw sequencing reads (fastq) are available at the European Nucleotide Archive (ENA) under accession ID PRJEB90909: https://www.ebi.ac.uk/ena/browser/view/PRJEB90909. Published in: Sandén et al, Nature Cancer, 2025: https://www.nature.com/articles/s43018-025-01054-6
Facebook
TwitterSequences can be accessed via the the European Nucleotide Archive (ENA) at https://www.ebi.ac.uk/ena/browser/home. (TXT)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benchmark of 5S, 16S, 23S rRNA
secondary structures taken from the CRW database https://crw-site.chemistry.gatech.edu/
Each molecule is available in bpseq, ct and dot-bracket-letter (db) format. For each format a version without header/additional information/comments is available in the corresponding bpseq-nH, ct-nH, db-nH folders.
In the files Archaea.xlsx, Bacteria.xlsx and Eukaryota.xslx the molecules in the benchmark are listed together with their Organism Name, ID and Phylogenetic classification (up to Order) according to the European Nucleotide Archive (ENA) taxonomy https://www.ebi.ac.uk/ena/browser/home
The accession number is available from the headers of the bpseq and ct formats.
Facebook
TwitterFor detailed methods please see the associated publication.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows:
matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz)
*The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include:
nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Genome sequence of the bovine tuberculosis bacillus Mycobacterium bovis AF2122/97https://www.ebi.ac.uk/ena/browser/view/GCA_000195835.3?show=chromosomes
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data can be used to test my tool delfies on real data, to get a concrete sense of its inputs/outputs and test that it is
properly installed.
I downloaded the genome of Oscheius onirici, accession: GCA_932521025.
I subsampled the genome to the last 2kbp of chromosome I, which contains an elimination breakpoint,
using `seqkit` v2.8.2, giving the FASTA file in this release.
I then downloaded the following sequencing data for *O. onirici*, from the European Nucleotide Archive:
And aligned them to the above genome with `minimap2` version 2.26-r1175, using the following presets:
"map-ont" for the Nanopore data, "map-hifi" for the PacBio data, "sr" for the Illumina data.
After sorting with `samtools`, this gives the BAM files in this release.
I then ran `delfies` version 0.6.0 on each BAM and genome, as:
```sh
delfies --threads 16 \
--telo_forward_seq TTAGGC \
--breakpoint_type all \
--min_mapq 20 \
--min_supporting_reads 6 \
\${genome} \${bam} \${odirname}
```
The three resulting output directories are in this release, prefixed with `delfies_`.
A single, identical breakpoint is found using all three BAMs (see files '*breakpoint_locations.bed').
The above raw data were produced and released by the Wellcome Sanger Institute as part of projects
PRJEB51305 and PRJEB59023.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generally, Rattus norvegicus' miRNA repertoire falls short compared to the other rodent model organism, Mus musculus.
To extend the miRNA catalogue in Rattus norvegicus, we utilized Infernal v1.1 (Nawrocki and Eddy, 2013) to derive potential rat miRNA candidates starting from all available mammalian miRNA families in miRBase. We utilized MIRfix (Yazbeck et al., 2019) to curate the extended miRNA datasets automatically. Subsequent manual inspection and curation of miRNA alignments resulted in a reliable and comprehensive update to the rat miRNA annotation.
Key facts of the extended miRNA repertoire
342 miRNA families (40 novel families)
549 miRNA sequences (56 novel miRNAs)
11 corrected annotated miRNAs
European Nucleotide Archive
The 56 novel sequences not listed in miRBase before have been submitted to the European Nucleotide Archive at EMBL-EBI.They are accessible with the accession numbers OZ078105 - OZ078160.The sequences will be permanently available from the ENA browser at http://www.ebi.ac.uk/ena/data/view/.
An overview of all sequences is given here: http://www.ebi.ac.uk/ena/data/view/OZ078105-OZ078160.
Facebook
Twitterhttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
The <250um fraction of 28 household vacuum dust samples were extracted using high throughput isolation of microbial genomic DNA (21 samples from a national campaign within the UK and 7 samples from Greece, providing samples from two contrasting bioclimatic zones). Both positive and negative reagent controls were included to ensure sterility throughout the processing and sequencing steps, and a randomly selected sample was run in triplicate (DSUK179). These data (raw fastq files: Target_gene 16S and Target_subfragment V4) are available from the European Nucleotide Archive via the study accession PRJEB46920 with individual sample accession numbers ERX6130460 to ERX6130493; https://www.ebi.ac.uk/ena/browser/view/PRJEB46920). A wide range of anthropogenic factors are likely to affect the indoor microbiome and to capture some of this heterogeneity participants were asked to complete a questionnaire. In addition, trace element data were generated using an X-Ray fluorescence spectrometry on the <250um sieved fraction of the household vacuum dust. Sample location data are provided at town/city, Country level. Indoor dust serves as a reservoir for environmental exposure to microbial communities, many of which are benign, some are beneficial, whilst some exhibit pathogenicity. Whilst non-occupational exposure to a range of trace elements and organic contaminants in house dust are a known risk factor for a range of diseases and poor health outcomes, we know far less about the microbial communities associated with our indoor home environments, and their interaction/impacts on human health. Our knowledge of indoor residential bacterial biodiversity, biogeography and their associated drivers are still poorly understood. The data were collected to improve our understanding of the home microbiome.
Facebook
Twitterhttps://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
PRJEB3289 https://www.ebi.ac.uk/ena/browser/view/PRJEB3289 Data that has been generated by HT-SELEX experiments (see Jolma et al. 2010. PMID: 20378718 for description of method) that has been now used to generate transcription factor binding specificity models for most of the high confidence human transcription factors. Sequence data is composed of reads generated with Illumina Genome Analyzer IIX and HiSeq2000 instruments. Samples are composed of single read sequencing of synthetic DNA fragments with a fixed length randomized region or samples derived from such a initial library by selection with a sequence specific DNA binding protein. Originally multiple samples with different "barcode" tag sequences were run on the same Illumina sequencing lane but the released files have been already de-multiplexed, and the constant regions and "barcodes" of each sequence have been cut out of the sequencing reads to facilitate the use of data. Some of the files are composed of reads from multiple different sequencing lanes and due to this each of the names of the individual reads have been edited to show the flowcell and lane that was used to generate it. Barcodes and oligonucleotide designs are indicated in the names of individual entries. Depending of the selection ligand design, the sequences in each of these fastq-files are either 14, 20, 30 or 40 bases long and had different flanking regions in both sides of the sequence. Each run entry is named in either of the following ways: Example 1) "BCL6B_DBD_AC_TGCGGG20NGA_1", where name is composed of following fields ProteinName_CloneType_Batch_BarcodeDesign_SelectionCycle. This experiment used barcode ligand TGCGGG20NGA, where both of the variable flanking constant regions are indicated as they were on the original sequence-reads. This ligand has been selected for one round of HT-SELEX using recombinant protein that contained the DNA binding domain of human transcription factor BCL6B. It also tells that the experiment was performed on batch of experiments named as "AC". Example 2) 0_TGCGGG20NGA_0 where name is composed of (zero)_BarcodeDesign_(zero) These sequences have been generated from sequencing of the initial non-selected pool. Same initial pools have been used in multiple experiments that were on different batches, thus for example this background sequence pool is the shared background for all of the following samples. BCL6B_DBD_AC_TGCGGG20NGA_1, ZNF784_full_AE_TGCGGG20NGA_3, DLX6_DBD_Y_TGCGGG20NGA_4 and MSX2_DBD_W_TGCGGG20NGA_2
Facebook
Twitterhttps://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
PRJDB9110 https://www.ebi.ac.uk/ena/browser/view/PRJDB9110 To generate RNA aptamers against human transglutaminase 2, we have performed the high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). Of the eight performed rounds, the rounds 0 to 8 have been sequenced.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is underlying the scientific publication titled "Enhanced Susceptibility to Tomato Chlorosis Virus (ToCV) in Hsp90- and Sgt1-Silenced Plants: Insights from Gene Expression Dynamics", published in the Viruses journal. The dataset includes a time-course transcriptome analysis using RNA-seq of naïve (no whitefly and no virus), mock (non-viruliferous whiteflies) and ToCV (ToCV_viruliferous whiteflies)-treated tomato samples at 2, 7, and 14 days post-infection (dpi) and viral small RNAs derived from Tomato plants infected with ToCV at 14 dpi. The dataset provided here has been deposited in full by the authors in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB67704 (https://www.ebi.ac.uk/ena/browser/view/PRJEB67704The provided information in the dataset are further discussed and interpreted in detail, as well as their subsequent results, in the scientific publication. This research was conducted within the VIRTIGATION project, which is part of the EU Open Research Data pilot. This project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No. 101000570.
Facebook
TwitterPublic archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.