100+ datasets found
  1. The European Nucleotide Archive (ENA) taxonomy

    • gbif.org
    Updated Mar 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephane Riviere; Stephane Riviere (2021). The European Nucleotide Archive (ENA) taxonomy [Dataset]. http://doi.org/10.15468/avkgwm
    Explore at:
    Dataset updated
    Mar 11, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    Stephane Riviere; Stephane Riviere
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The classification system for source biological organisms for all INSDC records is the NCBI Taxonomy. The ENA team work alongside taxonomists at the NCBI to ensure that all ENA records display the accepted organism name and classification hierarchy. NCBI Taxonomy covers the complete tree of life and also includes other types, such as synthetic constructs and environmental samples. However, it is an incomplete classification system in that it only considers taxa for data that are represented in INSDC records. Users should note that taxa are only displayed if at least one associated ENA record is available.

  2. r

    European Nucleotide Archive (ENA)

    • rrid.site
    • scicrunch.org
    • +1more
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). European Nucleotide Archive (ENA) [Dataset]. http://identifiers.org/RRID:SCR_006515
    Explore at:
    Dataset updated
    Feb 9, 2025
    Description

    Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

  3. b

    European Nucleotide Archive

    • bioregistry.io
    Updated Sep 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). European Nucleotide Archive [Dataset]. http://identifiers.org/re3data:r3d100010527
    Explore at:
    Dataset updated
    Sep 29, 2023
    Description

    The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. ENA is made up of a number of distinct databases that includes EMBL-Bank, the Sequence Read Archive (SRA) and the Trace Archive each with their own data formats and standards. This collection references Embl-Bank identifiers.

  4. i

    Quantitative monitoring of nucleotide information from genetic resources in...

    • doi.ipk-gatersleben.de
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guy Cochrane; Blaise Alako; Matthias Lange; Mehmood Ghaffar; Jens Freitag; Amber Scholz; Upneet Hillebrand; Guy Cochrane; Blaise Alako (2021). Quantitative monitoring of nucleotide information from genetic resources in context of their citation in the scientific literature [Dataset]. https://doi.ipk-gatersleben.de/DOI/8d5e2634-88ac-4f0f-9859-cb2006091775/a6ca2009-3cce-4dc4-bdf0-c7cb54f5156f/2
    Explore at:
    Dataset updated
    Apr 16, 2021
    Dataset provided by
    e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
    Authors
    Guy Cochrane; Blaise Alako; Matthias Lange; Mehmood Ghaffar; Jens Freitag; Amber Scholz; Upneet Hillebrand; Guy Cochrane; Blaise Alako
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set comprise extracted and linked records of the European Nucleotide Archive to citations in open-access publications that aggregated at Europe PubMed Central. Doing so, ENA records were parsed and filtered for valid country tag and fed into ePMC RestFull API to extract matching secondary publication by ENA accession or project accession numbers. The resulting data sets are normalized as tables ENA_SEQUENCES, PMC_REFERENCES alongside a curated list of world's countries in table CONTRIES and economics groups in table COUNTRY2GRP. This tables are the basis for a data warehouse and a web application It enables to join literature and sequence databases in multidimensional fashion. A concrete use case in the context of the United Nations convention on Biological Diversity is the analysis of countries in respect of nucleotide sequence use and contribution.

  5. m

    Modal

    • data.mendeley.com
    Updated Feb 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radu Constantin Parpala (2020). Modal [Dataset]. http://doi.org/10.17632/93j24wpf55.1
    Explore at:
    Dataset updated
    Feb 5, 2020
    Authors
    Radu Constantin Parpala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ansys archive file

  6. d

    Data from: BioProject.

    • datadiscoverystudio.org
    Updated Jul 14, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). BioProject. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/59ce5dee4011447b946870a0c1d89424/html
    Explore at:
    Dataset updated
    Jul 14, 2017
    Description

    description:

    The BioProject database provides an organizational framework to access information about research projects with links to data that have been or will be deposited into archival databases maintained at members of the International Nucleotide Sequence Database Consortium (INSDC, which comprises the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive at European Molecular Biology Laboratory (ENA), and GenBank at the National Center for Biotechnology Information (NCBI)).

    ; abstract:

    The BioProject database provides an organizational framework to access information about research projects with links to data that have been or will be deposited into archival databases maintained at members of the International Nucleotide Sequence Database Consortium (INSDC, which comprises the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive at European Molecular Biology Laboratory (ENA), and GenBank at the National Center for Biotechnology Information (NCBI)).

  7. of ExpressionPlot: a web-based framework for analysis of RNA-Seq and...

    • springernature.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Friedman; Tom Maniatis (2023). of ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data [Dataset]. http://doi.org/10.6084/m9.figshare.10035245.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Brad Friedman; Tom Maniatis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    of ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data

  8. r

    GenBank

    • rrid.site
    • neuinfo.org
    • +1more
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). GenBank [Dataset]. http://identifiers.org/RRID:SCR_002760
    Explore at:
    Dataset updated
    Feb 11, 2025
    Description

    NIH genetic sequence database that provides annotated collection of all publicly available DNA sequences for almost 280 000 formally described species (Jan 2014) .These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. It is part of International Nucleotide Sequence Database Collaboration and daily data exchange with European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through NCBI Entrez retrieval system, which integrates data from major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of GenBank database are available by FTP.

  9. The OHEJP BeONE Project – Escherichia coli genome assembly dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Escherichia coli genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802728
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 308 Escherichia coli samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7120057), comprising genome assemblies of 1,999 E. coli samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Ec_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

    The archive “BeONE_Ec_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of E. coli genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57098. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 308 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with seq_typing v2.2.

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  10. AmelHap Metadata

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Aug 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melanie Parejo; Melanie Parejo; Andrea Talenti; Andrea Talenti; Matthew Richardson; Alain Vignal; Alain Vignal; Mark Barnett; David Wragg; David Wragg; Matthew Richardson; Mark Barnett (2022). AmelHap Metadata [Dataset]. http://doi.org/10.5281/zenodo.7030888
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Melanie Parejo; Melanie Parejo; Andrea Talenti; Andrea Talenti; Matthew Richardson; Alain Vignal; Alain Vignal; Mark Barnett; David Wragg; David Wragg; Matthew Richardson; Mark Barnett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Curated sample information for drones processed for AmelHap. Details include:

    • ENA_Run (European Nucleotide Archive accession number)
    • ENA_Project (European Nucleotide Archive accession number)
    • ENA_Sample (European Nucleotide Archive accession number)
    • Sample_ID (Sample name given by ENA accession)
    • Sample_Locale (Sampling location given by either ENA accession or published study)
    • Sample_Country (Sample country of origin given by either ENA accession or published study)
    • Sample_Continent (Sample continent of origin given by either ENA accession or published study)
    • Reported_Sibling (Drones sampled from the same colony are labelled with the same ENA_Run accession for one of their siblings)
    • Reported_Type (Reported type or subspecies based on details from ENA or published study)
    • Reported_Lineage (Reported lineage based on details from ENA or published study)
    • Notes (Miscellaneous details from published study that may be of interest)

  11. r

    Australian Nucleotide (DNA/RNA) and Protein sequences from Australian...

    • researchdata.edu.au
    Updated Jun 4, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QFAB Bioinformatics (2014). Australian Nucleotide (DNA/RNA) and Protein sequences from Australian organisms in the species Oedura gracilis () [Dataset]. https://researchdata.edu.au/australian-nucleotide-dnarna-oedura-gracilis/442223
    Explore at:
    Dataset updated
    Jun 4, 2014
    Dataset provided by
    QFAB
    Authors
    QFAB Bioinformatics
    Area covered
    Australia
    Description

    This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from Australian Oedura gracilis. Other information about this group:

    The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.

    The identification of species in Oedura gracilis as Australian dwelling organisms has been achieved by accessing the Australian Plant Census (APC) or Australian Faunal Directory (AFD) through the Atlas of Living Australia.

  12. MOESM4 of ExpressionPlot: a web-based framework for analysis of RNA-Seq and...

    • springernature.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Friedman; Tom Maniatis (2023). MOESM4 of ExpressionPlot: a web-based framework for analysis of RNA-Seq and microarray gene expression data [Dataset]. http://doi.org/10.6084/m9.figshare.10035281.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Brad Friedman; Tom Maniatis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4: Archival copy of software. (ZIP 3 MB)

  13. Z

    The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset

    • data.niaid.nih.gov
    • openagrar.de
    • +2more
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mixão, Verónica (2023). The OHEJP BeONE Project – Listeria monocytogenes genome assembly dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7267486
    Explore at:
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Brendebach, Holger
    Linde, Jörg
    RKI
    NVI
    Sobral, Daniel
    Sommer Kaas, Rolf
    BfR
    PIWET
    Lagesen, Karin
    RIVM
    Nielsen, Sofie
    Simon, Sandra
    INSA
    Di Pasquale, Adriano
    Gomes, João Paulo
    Borges, Vítor
    IZSAM
    DTU
    Joensen, Katrine
    Kiil, Kristoffer
    Petrovska, Liljana
    Umaer Naseer, Mohammed
    Mixão, Verónica
    Tausch, Simon
    SSI
    Pinto, Miguel
    APHA
    Iwan, Ewelina
    Deneke, Carlus
    NIPH
    Swart-Coipan, Claudia
    FLI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 1,426 Listeria monocytogenes samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7116878), comprising genome assemblies of 1,874 L. monocytogenes samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Lm_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers and in-silico Multi Locus Sequence Type, and information regarding year of sampling, country and source.

    The archive “BeONE_Lm_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of L. monocytogenes genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57166. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,426 isolates passed the dataset curation step and were included in the final dataset.

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  14. d

    Whole genome DNA sequences of Gulf of Mexico invertebrates

    • search.dataone.org
    • data.griidc.org
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas, W. Kelley (2025). Whole genome DNA sequences of Gulf of Mexico invertebrates [Dataset]. https://search.dataone.org/view/sha256%3A28ab03182a5617a0f40abd9a2e370b764e570c7776bc2afba257064205d466f2
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    GRIIDC
    Authors
    Thomas, W. Kelley
    Area covered
    Gulf of Mexico (Gulf of America)
    Description

    The dataset consists of whole genome DNA sequences, generated from invertebrate species from the Gulf of Mexico during the Benthic Invertebrate Taxonomy, Metagenomics, and Bioinformatics Workshop (BITMaB) in 2017 in Corpus Christi, Texas, USA. All genomic data sets were deposited in and distributed by GenBank (NCBI), the European Nucleotide Archive (ENA)- European Bioinformatics Institute (EMBL-EBI), DNA Data Bank of Japan, NemATOL, the Global Genome Initiative, and Ocean Genome Legacy.

  15. E

    SUPERSEDED - Human, yeast and pig genomics: sequence submissions and first...

    • find.data.gov.scot
    • dtechtive.com
    csv, txt
    Updated Jun 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Science, Technology and Innovation Studies. University of Edinburgh (2018). SUPERSEDED - Human, yeast and pig genomics: sequence submissions and first sequence descriptions in the literature (1980-2015) [Dataset]. http://doi.org/10.7488/ds/2358
    Explore at:
    csv(0.1938 MB), csv(11.83 MB), csv(0.8656 MB), csv(0.056 MB), csv(0.5938 MB), csv(1.918 MB), txt(0.0166 MB)Available download formats
    Dataset updated
    Jun 6, 2018
    Dataset provided by
    Science, Technology and Innovation Studies. University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2589 ## This data collection is derived from two sources: 1) Submissions of DNA sequences of S. cerevisiae (yeast), Sus scrofa (pig) and Homo sapiens (human) to the European Nucleotide Archive, and 2) First description of these sequences in the scientific literature. The time range of the records is 1980-2000 (yeast), 1985-2005 (human) and 1990-2015 (pig). In total, each species has two associated datasets: 1) A .csv file documenting the PubMed ID of each article describing new sequences, all paper authors, all institutional affiliations of each author, country of institution, year of first submission to the European Nucleotide Archive, and the year of article publication, and 2) A .csv file documenting all institutions submitting to the European Nucleotide Archive, number of nucleotides sequenced, number of submissions per institution, and year of submission to the database. The approximate number of records is 28,000 publications and over 2 million sequence submissions. Some data about submitting institutions is not fully cleaned.

  16. n

    BioSample Database at EBI

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). BioSample Database at EBI [Dataset]. http://identifiers.org/RRID:SCR_004856
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database that aggregates sample information for reference samples (e.g. Coriell Cell lines) and samples for which data exist in one of the EBI''''s assay databases such as ArrayExpress, the European Nucleotide Archive or PRoteomics Identificates DatabasE. It provides links to assays for specific samples, and accepts direct submissions of sample information. The goals of the BioSample Database include: # recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; # minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and # supporting cross database queries by sample characteristics. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples (at) ebi.ac.uk. Current data sources: * European Nucleotide Archive (424,811 samples) * PRIDE (17,001 samples) * ArrayExpress (1,187,884 samples) * ENCODE cell lines (119 samples) * CORIELL cell lines (27,002 samples) * Thousand Genome (2,628 samples) * HapMap (1,417 samples) * IMSR (248,660 samples)

  17. r

    Australian Nucleotide (DNA/RNA) and Protein sequences from the Australian...

    • researchdata.edu.au
    Updated Jul 20, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QFAB Bioinformatics (2012). Australian Nucleotide (DNA/RNA) and Protein sequences from the Australian research institution,Western Australian Institute for Medical Research [Dataset]. https://researchdata.edu.au/australian-nucleotide-dnarna-medical-research/79925
    Explore at:
    Dataset updated
    Jul 20, 2012
    Dataset provided by
    QFAB
    Authors
    QFAB Bioinformatics
    Area covered
    Western Australia, Australia
    Description

    This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from the Australian research institution,Western Australian Institute for Medical Research.The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.

  18. r

    Nucleotide (DNA / RNA) and Protein sequences from the Australian research...

    • researchdata.edu.au
    Updated Jul 20, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QFAB Bioinformatics (2012). Nucleotide (DNA / RNA) and Protein sequences from the Australian research institution University of Notre Dame [Dataset]. https://researchdata.edu.au/nucleotide-dna-rna-notre-dame/56616
    Explore at:
    Dataset updated
    Jul 20, 2012
    Dataset provided by
    QFAB
    Authors
    QFAB Bioinformatics
    Description

    This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from the Australian research institution University of Notre Dame.The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.

  19. The OHEJP BeONE Project – Salmonella enterica genome assembly dataset

    • zenodo.org
    bin, zip
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI (2023). The OHEJP BeONE Project – Salmonella enterica genome assembly dataset [Dataset]. http://doi.org/10.5281/zenodo.7802723
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Mixão; Verónica Mixão; Miguel Pinto; Miguel Pinto; João Paulo Gomes; João Paulo Gomes; Daniel Sobral; Daniel Sobral; Holger Brendebach; Holger Brendebach; Carlus Deneke; Carlus Deneke; Simon Tausch; Simon Tausch; Adriano Di Pasquale; Adriano Di Pasquale; Claudia Swart-Coipan; Claudia Swart-Coipan; Ewelina Iwan; Jörg Linde; Jörg Linde; Karin Lagesen; Karin Lagesen; Liljana Petrovska; Liljana Petrovska; Mohammed Umaer Naseer; Rolf Sommer Kaas; Rolf Sommer Kaas; Sandra Simon; Katrine Joensen; Katrine Joensen; Kristoffer Kiil; Sofie Nielsen; Sofie Nielsen; Vítor Borges; Vítor Borges; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI; Ewelina Iwan; Mohammed Umaer Naseer; Sandra Simon; Kristoffer Kiil; INSA; APHA; BfR; DTU; FLI; IZSAM; NIPH; NVI; PIWET; RIVM; RKI; SSI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset comprises the genome assemblies of 1,540 Salmonella enterica samples collected by the BeONE Consortium on behalf of the One Health European Joint Programme “BeONE: Building Integrative Tools for One Health Surveillance” (https://onehealthejp.eu/jrp-beone/). Additionally, a complementary dataset is also made available (https://zenodo.org/record/7119735), comprising genome assemblies of 1,434 S. enterica samples selected among the Whole-Genome Sequencing (WGS) data publicly available in the European Nucleotide Archive (ENA) or in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA).

    File “BeONE_Se_metadata.xlsx” contains the genome assembly statistics for each isolate, including European Nucleotide Archive accession numbers, in-silico Multi Locus Sequence Type and Serotype, and information regarding year of sampling, country and source.

    The archive “BeONE_Se_assemblies.zip” contains all the genome assemblies (.fasta format) of each isolate presented in the metadata file.

    Dataset selection and curation

    This anonymized dataset of S. enterica genome assemblies was generated using Next Generation Sequencing data collected within the BeONE Consortium available at the European Nucleotide Archive under BioProject Accession Number PRJEB57179. Read quality control, trimming and assembly were performed with Aquamis v1.3.9 (Deneke et al. 2021) using default parameters. Assembly quality control (QC), including contamination assessment, as well as MLST ST determination were performed with the same pipeline. All genome assemblies passing the QC were included in the final dataset. Among the others, we noticed that a considerable proportion of assemblies was flagged as “QC fail” exclusively due to the “NumContamSNVs” parameter, suggesting that this setting might have been too strict. After manual inspection of a random subset, assemblies for which the percentage of reads corresponding to the correct species was >98% were recovered and integrated in the final dataset (those samples are labeled in the Metadata file). In total, 1,540 isolates passed the dataset curation step and were included in the final dataset. In-silico serotyping was performed with SeqSero2 v1.2.1 (Zhang et al. 2019).

    Funding

    This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 773830: One Health European Joint Programme.

    Acknowledgements

    We thank the National Distributed Computing Infrastructure of Portugal (INCD) for providing the necessary resources to run the genome assemblies. INCD was funded by FCT and FEDER under the project 22153-01/SAICT/2016.

  20. r

    Nucleotide (DNA / RNA) and Protein sequences from the Australian dwelling...

    • researchdata.edu.au
    Updated Jul 20, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QFAB Bioinformatics (2012). Nucleotide (DNA / RNA) and Protein sequences from the Australian dwelling species Loligo chinensis [Dataset]. https://researchdata.edu.au/nucleotide-dna-rna-loligo-chinensis/56392
    Explore at:
    Dataset updated
    Jul 20, 2012
    Dataset provided by
    QFAB
    Authors
    QFAB Bioinformatics
    Area covered
    Australia
    Description

    This data collection contains all currently published nucleotide (DNA/RNA) and protein sequences from the Australian dwelling organism Loligo chinensis.

    The nucleotide (DNA/RNA) and protein sequences have been sourced through the European Nucleotide Archive (ENA) and Universal Protein Resource (UniProt), databases that contains comprehensive sets of nucleotide (DNA/RNA) and protein sequences from all organisms that have been published by the International Research Community.

    The identification of the species Loligo chinensis as an Australian dwelling organism has been achieved by accessing the Australian Plant Census (APC) or Australian Faunal Directory (AFD) through the Atlas of Living Australia.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stephane Riviere; Stephane Riviere (2021). The European Nucleotide Archive (ENA) taxonomy [Dataset]. http://doi.org/10.15468/avkgwm
Organization logoOrganization logo

The European Nucleotide Archive (ENA) taxonomy

Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 11, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
European Bioinformatics Institutehttp://www.ebi.ac.uk/
Authors
Stephane Riviere; Stephane Riviere
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The classification system for source biological organisms for all INSDC records is the NCBI Taxonomy. The ENA team work alongside taxonomists at the NCBI to ensure that all ENA records display the accepted organism name and classification hierarchy. NCBI Taxonomy covers the complete tree of life and also includes other types, such as synthetic constructs and environmental samples. However, it is an incomplete classification system in that it only considers taxa for data that are represented in INSDC records. Users should note that taxa are only displayed if at least one associated ENA record is available.

Search
Clear search
Close search
Google apps
Main menu