100+ datasets found
  1. f

    GenBank™/EMBL Database entries.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose C. Jimenez-Lopez; Sonia Morales; Antonio J. Castro; Dieter Volkmann; María I. Rodríguez-García; Juan de D. Alché (2023). GenBank™/EMBL Database entries. [Dataset]. http://doi.org/10.1371/journal.pone.0030878.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Jose C. Jimenez-Lopez; Sonia Morales; Antonio J. Castro; Dieter Volkmann; María I. Rodríguez-García; Juan de D. Alché
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accession numbers of the profilin cDNA sequences obtained after RT-PCR from pollen of five plant species: Olea europaea, Betula pendula, Corylus avellana, Phleum pratense and Zea mays.

  2. s

    EMBL European Bioinformatics Institute - Nucleotide Sequencing Data

    • geonetwork.soosmap.aq
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). EMBL European Bioinformatics Institute - Nucleotide Sequencing Data [Dataset]. https://geonetwork.soosmap.aq/geonetwork/srv/search
    Explore at:
    Dataset updated
    Apr 21, 2025
    Description

    The European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) is international, innovative and interdisciplinary, and a champion of open data in the life sciences. The EMBL-EBI captures and presents globally comprehensive sequence data as part of the International Nucleotide Sequence Database Collaboration. Data provided to GBIF include geotagged environmental sequences with user-provided taxonomic identifications. This dataset contains INSDC sequences associated with environmental sample identifiers. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with the search parameters: environmental_sample=True & host="" EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230). The data was then processed as follows: 1. Human sequences were excluded. 2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number. 3. Contigs and whole genome shotgun (WGS) records were added individually. 4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept. 5. The records associated with the same vouchers are aggregated together. 6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by scientific_name, collection_date, location, country, identified_by, collected_by and sample_accession (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: Deduplication v2 gbif/embl-adapter#10 (comment) 7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip More information available here: https://github.com/gbif/embl-adapter#readme You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md

  3. d

    Data from: Accessing and distributing EMBL data using CORBA (common object...

    • catalog.data.gov
    • odgavaprod.ogopendata.com
    • +1more
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Accessing and distributing EMBL data using CORBA (common object request broker architecture) [Dataset]. https://catalog.data.gov/dataset/accessing-and-distributing-embl-data-using-corba-common-object-request-broker-architecture
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background: The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. Results: A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. Conclusions: The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.

  4. e

    SMART

    • ebi.ac.uk
    Updated Feb 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). SMART [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 14, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. SMART is based at EMBL, Heidelberg, Germany.

  5. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  6. INSDC Environment Sample Sequences

    • gbif.org
    • demo.gbif.org
    • +1more
    Updated Aug 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Bioinformatics Institute (EMBL-EBI); European Bioinformatics Institute (EMBL-EBI) (2025). INSDC Environment Sample Sequences [Dataset]. http://doi.org/10.15468/mcmd5g
    Explore at:
    Dataset updated
    Aug 30, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    European Bioinformatics Institute (EMBL-EBI); European Bioinformatics Institute (EMBL-EBI)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This dataset contains INSDC sequences associated with environmental sample identifiers. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with the search parameters: `environmental_sample=True & host=""`

    EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230).

    The data was then processed as follows:

    1. Human sequences were excluded.

    2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number.

    3. Contigs and whole genome shotgun (WGS) records were added individually.

    4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept.

    5. The records associated with the same vouchers are aggregated together.

    6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by `scientific_name`, `collection_date`, `location`, `country`, `identified_by`, `collected_by` and `sample_accession` (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978

    7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip

    More information available here: https://github.com/gbif/embl-adapter#readme

    You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md

  7. e

    PRINTS

    • ebi.ac.uk
    Updated Jun 14, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). PRINTS [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Jun 14, 2012
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family or domain. PRINTS is based at the University of Manchester, UK.

  8. s

    EBI Dbfetch

    • scicrunch.org
    • dknet.org
    • +2more
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). EBI Dbfetch [Dataset]. http://identifiers.org/RRID:SCR_004393
    Explore at:
    Dataset updated
    Aug 30, 2025
    Description

    Dbfetch is an acronym for database fetch. Dbfetch provides an easy way to retrieve entries from various databases at the EBI in a consistent manner and allows you to retrieve up to 50 entries at a time from various up-to-date biological databases. It can be used from any browser as well as well as within a web-aware scripting tool that uses wget, lynx or similar. From the browser, follow these instructions... * Select a database: If you are using the first form to paste your search items: choose a database name from this form. If you are using the second form to upload your search items: the database name is included at the beginning of each line line of the upload file followed by a colon. * Enter search terms: These MUST BE in the appropriate database format, up to 200 search items can be queried in one run. If you are using the first form: separate search items with a comma or space. If you are using the second form: separate search items with a new line. * Choose an output format: Here you can choose the simpler fasta format, or the databases'''' default format for the chosen database. * Style: You can get your results as text or html. * Retrieve! - You are now ready to fetch your results, by pressing the Retrieve button.

  9. f

    The 20 mammals from the EMBL database used in the phylogenetic experiments....

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liviu P. Dinu; Radu Tudor Ionescu; Alexandru I. Tomescu (2023). The 20 mammals from the EMBL database used in the phylogenetic experiments. The accession number is given on the last column. [Dataset]. http://doi.org/10.1371/journal.pone.0104006.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Liviu P. Dinu; Radu Tudor Ionescu; Alexandru I. Tomescu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The 20 mammals from the EMBL database used in the phylogenetic experiments. The accession number is given on the last column.

  10. r

    Australian Collections in the EMBL Australia Bioinformatics Resource

    • researchdata.edu.au
    Updated Jun 13, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QFAB (2013). Australian Collections in the EMBL Australia Bioinformatics Resource [Dataset]. https://researchdata.edu.au/australian-collections-embl-bioinformatics-resource/124829
    Explore at:
    Dataset updated
    Jun 13, 2013
    Dataset provided by
    QFAB
    Area covered
    Description

    https://dl.dropboxusercontent.com/u/120673642/BRAEMBL.jpg" alt="" />

    The EMBL Australia Bioinformatics Resource, located at The University of Queensland, provides an Australian-based entry to many of the data services of the European Bioinformatics Institute (EBI). The EBI is an institute within the European Molecular Biology Laboratory (EMBL) and is the world's premier life sciences data resource.

    Five collections of nucleotide and protein sequences derived from Australian dwelling plants and animals (identified through the Atlas of Living Australia's instances of the Australian Faunal Directory and the Australian Plant Census) are available from within the EMBL-EBI database.

    These data collections contain all current internationally published nucleotide and protein sequences derived from Australian dwelling (native and common/significant introduced (e.g. crop/weed/feral)) organisms and are structured using a taxonomical hierarchy to facilitate searching by species.

  11. e

    SUPERFAMILY

    • ebi.ac.uk
    Updated Nov 8, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). SUPERFAMILY [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Nov 8, 2010
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY is based at the University of Bristol, UK.

  12. Proteomics IDEntification Database (PRIDE)

    • covid19dataportal.org
    Updated Apr 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EMBL-EBI (2020). Proteomics IDEntification Database (PRIDE) [Dataset]. https://www.covid19dataportal.org/expression
    Explore at:
    Dataset updated
    Apr 20, 2020
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Authors
    EMBL-EBI
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Proteomics Identifications Database (PRIDE, http://www.ebi.ac.uk/pride) is one of the main repositories of MS derived proteomics data.

  13. r

    AlphaFold Protein Structure Database

    • rrid.site
    • scicrunch.org
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). AlphaFold Protein Structure Database [Dataset]. http://identifiers.org/RRID:SCR_023662
    Explore at:
    Dataset updated
    Jul 26, 2025
    Description

    Database of protein structure predictions by AlphaFold that are freely and openly available to global scientific community. Included are nearly all catalogued proteins known to science. Provides programmatic access to and interactive visualization of predicted atomic coordinates, per residue and pairwise model confidence estimates and predicted aligned errors.

  14. b

    BioStudies database

    • bioregistry.io
    Updated Apr 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). BioStudies database [Dataset]. http://identifiers.org/re3data:r3d100012627
    Explore at:
    Dataset updated
    Apr 28, 2021
    Description

    The BioStudies database holds descriptions of biological studies, links to data from these studies in other databases at EMBL-EBI or outside, as well as data that do not fit in the structured archives at EMBL-EBI. The database can accept a wide range of types of studies described via a simple format. It also enables manuscript authors to submit supplementary information and link to it from the publication.

  15. BioSamples RDF

    • data.wu.ac.at
    api/sparql, meta/void +1
    Updated Jul 30, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EMBL European Bioinformatics Institute (2016). BioSamples RDF [Dataset]. https://data.wu.ac.at/odso/datahub_io/OWU5ZjQwN2ItN2E3Mi00OGMyLWE0OTYtZGI1MTkzMjI5YjQw
    Explore at:
    api/sparql, meta/void, ttlAvailable download formats
    Dataset updated
    Jul 30, 2016
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Description

    The BioSamples database aggregates sample information for reference samples (e.g. Coriell Cell lines) and samples for which data exist in one of the EBI's assay databases such as ArrayExpress, the European Nucleotide Archive or PRoteomics Identificates DatabasE. It provides links to assays an specific samples, and accepts direct submissions of sample information.

  16. f

    Porcine primer sequences: forward (For.) and reverse (Rev.), with transcript...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Zoerner; Lars Wiklund; Adriana Miclescu; Cecile Martijn (2023). Porcine primer sequences: forward (For.) and reverse (Rev.), with transcript (RT-PCR product) length and EMBL database accession number. [Dataset]. http://doi.org/10.1371/journal.pone.0064792.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Frank Zoerner; Lars Wiklund; Adriana Miclescu; Cecile Martijn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (ET-1) Endothelin-1; (ETAR) Endothelin A-receptor; (ETBR) Endothelin B-receptor; (ECE-1) Endothelin-Converting-Enzyme-1; (SDH) Succinate Dehydrogenase.*GeneBank accession number at www.ncbi.nlm.nih.gov.

  17. b

    European Nucleotide Archive

    • bioregistry.io
    Updated Feb 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). European Nucleotide Archive [Dataset]. http://identifiers.org/re3data:r3d100010527
    Explore at:
    Dataset updated
    Feb 10, 2022
    Description

    The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. ENA is made up of a number of distinct databases that includes EMBL-Bank, the Sequence Read Archive (SRA) and the Trace Archive each with their own data formats and standards. This collection references Embl-Bank identifiers.

  18. r

    ENA Sequence Version Archive

    • rrid.site
    • neuinfo.org
    • +1more
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ENA Sequence Version Archive [Dataset]. http://identifiers.org/RRID:SCR_000929
    Explore at:
    Dataset updated
    Jul 26, 2025
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE.Documented on September 23,2022. The ENA Sequence Version Archive is a repository of all entries which have ever appeared in EMBL-Bank Sequence Database. You can browse the archive or use the batch retrieval form.

  19. s

    EMBL-EBI COVID-19 Portal

    • scicrunch.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EMBL-EBI COVID-19 Portal [Dataset]. http://identifiers.org/RRID:SCR_018337
    Explore at:
    Description

    EMBL-EBI portal to enable researchers to upload, access and analyse COVID-19 related reference data and specialist datasets submitted to EMBL-EBI and other major centers for biomedical data. Used to facilitate data sharing and analysis to accelerate coronavirus research.

    The aim of the COVID-19 Data Portal is to facilitate data sharing and analysis, and to accelerate coronavirus research. EMBL-EBI and partners have set up the COVID-19 Data Portal, which will bring together relevant datasets submitted to EMBL-EBI and other major centres for biomedical data. The aim is to facilitate data sharing and analysis, and to accelerate coronavirus research. The COVID-19 Data Portal will enable researchers to upload, access and analyse COVID-19 related reference data and specialist datasets. The COVID-19 Data Portal will be the primary entry point into the functions of a wider project, the European COVID-19 Data Platform.

  20. e

    CATH-Gene3D

    • ebi.ac.uk
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Oct 21, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jose C. Jimenez-Lopez; Sonia Morales; Antonio J. Castro; Dieter Volkmann; María I. Rodríguez-García; Juan de D. Alché (2023). GenBank™/EMBL Database entries. [Dataset]. http://doi.org/10.1371/journal.pone.0030878.t001

GenBank™/EMBL Database entries.

Related Article
Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
xlsAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Jose C. Jimenez-Lopez; Sonia Morales; Antonio J. Castro; Dieter Volkmann; María I. Rodríguez-García; Juan de D. Alché
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Accession numbers of the profilin cDNA sequences obtained after RT-PCR from pollen of five plant species: Olea europaea, Betula pendula, Corylus avellana, Phleum pratense and Zea mays.

Search
Clear search
Close search
Google apps
Main menu