Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The classification system for source biological organisms for all INSDC records is the NCBI Taxonomy. The ENA team work alongside taxonomists at the NCBI to ensure that all ENA records display the accepted organism name and classification hierarchy. NCBI Taxonomy covers the complete tree of life and also includes other types, such as synthetic constructs and environmental samples. However, it is an incomplete classification system in that it only considers taxa for data that are represented in INSDC records. Users should note that taxa are only displayed if at least one associated ENA record is available.
Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.
The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. ENA is made up of a number of distinct databases that includes EMBL-Bank, the Sequence Read Archive (SRA) and the Trace Archive each with their own data formats and standards. This collection references Embl-Bank identifiers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set comprise extracted and linked records of the European Nucleotide Archive to citations in open-access publications that aggregated at Europe PubMed Central. Doing so, ENA records were parsed and filtered for valid country tag and fed into ePMC RestFull API to extract matching secondary publication by ENA accession or project accession numbers. The resulting data sets are normalized as tables ENA_SEQUENCES, PMC_REFERENCES alongside a curated list of world's countries in table CONTRIES and economics groups in table COUNTRY2GRP. This tables are the basis for a data warehouse and a web application It enables to join literature and sequence databases in multidimensional fashion. A concrete use case in the context of the United Nations convention on Biological Diversity is the analysis of countries in respect of nucleotide sequence use and contribution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ansys archive file
description:
The BioProject database provides an organizational framework to access information about research projects with links to data that have been or will be deposited into archival databases maintained at members of the International Nucleotide Sequence Database Consortium (INSDC, which comprises the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive at European Molecular Biology Laboratory (ENA), and GenBank at the National Center for Biotechnology Information (NCBI)).
; abstract:The BioProject database provides an organizational framework to access information about research projects with links to data that have been or will be deposited into archival databases maintained at members of the International Nucleotide Sequence Database Consortium (INSDC, which comprises the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive at European Molecular Biology Laboratory (ENA), and GenBank at the National Center for Biotechnology Information (NCBI)).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
of ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data
NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
This dataset comprises the genome assemblies of 308 Escherichia coli samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7120057), comprising genome assemblies of 1,999 E. coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).
File “BeONE_Ec_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.
The archive “BeONE_Ec_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.
Dataset selection and curation
This anonymized dataset of E. coli genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57098. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 308 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2.
Funding
This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.
Acknowledgements
We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Curated sample information for drones processed for AmelHap. Details include:
This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from Australian Oedura gracilis. Other information about this group:
The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.
The identification of species in Oedura gracilis as Australian dwelling organisms has been achieved by accessing the Australian Plant Census (APC) or Australian Faunal Directory (AFD) through the Atlas of Living Australia.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4: Archival copy of software. (ZIP 3 MB)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
This dataset comprises the genome assemblies of 1,426 Listeria monocytogenes samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7116878), comprising genome assemblies of 1,874 L. monocytogenes samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).
File “BeONE_Lm_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers and in-silico Multi Locus Sequence Type, and information regarding year of sampling, country and source.
The archive “BeONE_Lm_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.
Dataset selection and curation
This anonymized dataset of L. monocytogenes genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57166. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,426 isolates passed the dataset curation step and were included in the final dataset.
Funding
This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.
Acknowledgements
We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
The dataset consists of whole genome DNA sequences, generated from invertebrate species from the Gulf of Mexico during the Benthic Invertebrate Taxonomy, Metagenomics, and Bioinformatics Workshop (BITMaB) in 2017 in Corpus Christi, Texas, USA. All genomic data sets were deposited in and distributed by GenBank (NCBI), the European Nucleotide Archive (ENA)- European Bioinformatics Institute (EMBL-EBI), DNA Data Bank of Japan, NemATOL, the Global Genome Initiative, and Ocean Genome Legacy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database that aggregates sample information for reference samples (e.g. Coriell Cell lines) and samples for which data exist in one of the EBI''''s assay databases such as ArrayExpress, the European Nucleotide Archive or PRoteomics Identificates DatabasE. It provides links to assays for specific samples, and accepts direct submissions of sample information. The goals of the BioSample Database include: # recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; # minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and # supporting cross database queries by sample characteristics. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples (at) ebi.ac.uk. Current data sources: * European Nucleotide Archive (424,811 samples) * PRIDE (17,001 samples) * ArrayExpress (1,187,884 samples) * ENCODE cell lines (119 samples) * CORIELL cell lines (27,002 samples) * Thousand Genome (2,628 samples) * HapMap (1,417 samples) * IMSR (248,660 samples)
This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from the Australian research institution,Western Australian Institute for Medical Research.The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.
This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from the Australian research institution University of Notre Dame.The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
This dataset comprises the genome assemblies of 1,540 Salmonella enterica samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7119735), comprising genome assemblies of 1,434 S. enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).
File “BeONE_Se_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.
The archive “BeONE_Se_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.
Dataset selection and curation
This anonymized dataset of S. enterica genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57179. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,540 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019).
Funding
This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.
Acknowledgements
We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.
This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from the Australian dwelling organism Loligo chinensis.
The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.
The identification of the species Loligo chinensis as an Australian dwelling organism has been achieved by accessing the Australian Plant Census (APC) or Australian Faunal Directory (AFD) through the Atlas of Living Australia.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The classification system for source biological organisms for all INSDC records is the NCBI Taxonomy. The ENA team work alongside taxonomists at the NCBI to ensure that all ENA records display the accepted organism name and classification hierarchy. NCBI Taxonomy covers the complete tree of life and also includes other types, such as synthetic constructs and environmental samples. However, it is an incomplete classification system in that it only considers taxa for data that are represented in INSDC records. Users should note that taxa are only displayed if at least one associated ENA record is available.